OK, I think I know what the problem with NFS - to a certain extent anyway,
for my next trick, I need to work out how we fix it!
Basically it looks like the stateless nature of NFS is the killer here,
at the end of each RPC call we drop the reference count on the inode,
this causes us to truncate out extra space beyond the end of file. On
the next write we go do it again...... There is code in the irix version
to deal with this, but I am not sure if it needs NFS changes or not.
The nfsd threads end up in places like this most of the time:
[1]kdb> btp 1270
EBP EIP Function(args)
0xc74f3d84 0xc0127c15 truncate_list_pages+0x21
kernel .text 0xc0100000 0xc0127bf4 0xc0127dec
0xc0127e48 truncate_inode_pages+0x5c (0xc33b6c64, 0x1006e000, 0x0)
kernel .text 0xc0100000 0xc0127dec 0xc0127e80
0xc018f6ce pagebuf_inval+0x1a (0xc33b6bc0, 0x1006e000, 0x0, 0x0)
kernel .text 0xc0100000 0xc018f6b4 0xc018f6d4
0xc01ece41 fs_tosspages+0x29 (0xc17a3d30, 0x1006e000, 0x0,
0xffffffff, 0xffffffff)
kernel .text 0xc0100000 0xc01ece18 0xc01ece48
0xc01cd21f xfs_itruncate_start+0x8f (0xc17a3d18, 0x1, 0x1006e000,
0x0, 0xc17a3d18)
kernel .text 0xc0100000 0xc01cd190 0xc01cd228
0xc01e55f1 xfs_inactive_free_eofblocks+0x1e5 (0xc75bac00, 0xc17a3d18)
kernel .text 0xc0100000 0xc01e540c 0xc01e56dc
0xc01e5d54 xfs_release+0x74 (0xc17a3d30)
kernel .text 0xc0100000 0xc01e5ce0 0xc01e5db8
0xc01ecbb8 linvfs_release+0x24 (0xc33b6bc0, 0xc74f3ec4)
kernel .text 0xc0100000 0xc01ecb94 0xc01ecbc0
0xc015f35e nfsd_close+0x1e (0xc74f3ec4)
kernel .text 0xc0100000 0xc015f340 0xc015f390
0xc015f97d nfsd_write+0x295 (0xc5ff1600, 0xc64416e0, 0x1006c000,
0x0, 0xc64500ec)
kernel .text 0xc0100000 0xc015f6e8 0xc015f990
0xc015cba4 nfsd_proc_write+0xb4 (0xc5ff1600, 0xc64415e0, 0xc64416e0)
[1]more>
kernel .text 0xc0100000 0xc015caf0 0xc015cbac
0xc015c213 nfsd_dispatch+0xcb (0xc5ff1600, 0xc1d10014)
kernel .text 0xc0100000 0xc015c148 0xc015c2b0
0xc0298948 svc_process+0x2ac (0xc3a11b60, 0xc5ff1600)
kernel .text 0xc0100000 0xc029869c 0xc0298be0
0xc015bfba nfsd+0x1ca
kernel .text 0xc0100000 0xc015bdf0 0xc015c148
0xc010750f kernel_thread+0x23
kernel .text 0xc0100000 0xc01074ec 0xc010751c
The slower with more memory thing I am not sure about though.
Steve
>
> More about slow NFS writes... sorry it's a bit long.
> It gets more interesting towards the end :)
>
> I'm running cvs from today on a dual celeron 500, 512M memory. RedHat
> 7.1beta. I'm using a uni-processor kernel and no RAID of any sort.
> The disk is a 7200RPM Maxtor ATA100 UDMA mode 5 and is mounted with
> logbufs=8,kiocluster. The fs was made with -l size=32768b. I test it
> with
> time dd if=/dev/zero of=bigFile bs=1024k count=2500
> as this is close to the real application it'll be used for (multi-
> gigabyte writes of simulation data).
>
> the summary is:
> writes to local disk run at ~30MByte/s. writes over NFSv3 or v2 run
> at ~1 MByte/s after the memory cache on the server fils up, and the
> cpu load is 90%+ with all of it being system time. The light on the
> drive is on maybe 5% of the time. The expected write speed is ~10MByte/s
> as this is a 100Mbit network. Throughputs for various count= sizes
> are here:
>
> write size (MByte) throughput time
> same as count= (MBytes/s)
> 50 7 7.2s
> 100 6 17.7s
> 200 4 53s
> 400 2.3 2mins 48s
> 800 1.6 8mins 14s
>
> using ext2 instead of XFS (same disk, just umount, mkfs, remount, re-export):
> 800 9.6 1min 23s
>
> When using XFS the machine appears to be basically using all its CPU
> running nfsd's. A machine with a faster CPU could well write faster???
> 'top' says that there is consistant 90%+ system time usage. When
> using ext2 there's about 50% cpu system time being used by the nfsd's.
>
> Like I've reported before, the behaviour is empirically the same on my
> machine at home with completely different hardware. It's also the
> same asymptotic ~1MB/s bandwidth whether kio or kiocluster or neither
> are used. The logbufs option doesn't matter, nor does NFS v2 or v3 on
> the server, or whether I'm writing from a Linux box, SGI or Tru64 machine.
>
> When using kiocluster I did once get messages like these logged about
> one every few seconds:
> ll_rw_kio: request size [10752] not a multiple of device [21:01] block-size
> [4096]
> ll_rw_kio: request size [17920] not a multiple of device [21:01] block-size
> [4096]
> ll_rw_kio: request size [25088] not a multiple of device [21:01] block-size
> [4096]
> is this a clue???
>
>
>
> Ok, I did the smart thing to enable bonnie++ runs in non-infinite time
> and pulled all except 128M of memory out of the machine :-) for all
> of the above tests it was 512MB. Now I get:
>
> write size (MByte) throughput time
> same as count= (MBytes/s)
> 50 7 7.2s
> 100 5 18.6s
> 200 4 45.5s
> 400 4 1min 39s
> 800 3.8 3mins 31s
>
> using ext2 instead of XFS (same blurb as before):
> 800 9.6 1min 23s
>
> Ok, so this is MASSIVELY weird. Why should LESS memory in the server
> boost speed of NFS writes to XFS??!?!?!?!?! The nfsd load on the
> server is still 90%+ in system time. Are the buffers in memory
> thrashing themselves and thrashing nfsd?
>
> Anyway, this is lots better but still a factor of 2.5 away from ext2 :-(
>
> I had another thought - I turned off the nfs locking daemons with:
> /etc/rc.d/init.d/nfslock stop
> but this makes no difference.
>
> Ok, bonnie++ results on NFS mounted dirs. This is from an IRIX64 O2k with
> 4G of memory to the celeron500 with 128M memory:
>
> Version 1.00h ------Sequential Output------ --Sequential Input- --Random-
> -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
> to XFS 256M 1217 30 1300 7 2380 18 4107 98 107168 98 1222.4 31
> to ext2 256M 2790 73 4807 27 8107 35 4110 99 104429 99 1228.3 31
>
> This shows some of the same behaviour as all my dd tests I guess - XFS
> writes being down by a factor of 2.5 over ext2.
> Load on the server is the usual 80%+ for XFS and maybe 30% for ext2.
> The O2k has 4G of memory so that's probably why some read results are
> way too high - but I don't care about them really - only interested in
> writes for now. I'll try out reads later!! :)
>
> Anyway - enough - 3am and time to go home after 10 hours on this stuff.
>
> Please let me know if anyone else sees these same effects and if there
> are any explanations, suggestions, solutions, or if there's anything
> else you'd like me to try out :-)
>
> cheers,
> robin
|