xfs
[Top] [All Lists]

Re: more on NFS performance

To: Robin Humble <rjh@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: more on NFS performance
From: Steve Lord <lord@xxxxxxx>
Date: Tue, 06 Mar 2001 10:17:54 -0600
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: Message from Robin Humble <rjh@xxxxxxxxxxxxxxxxxxxxxxxxxxx> of "Tue, 06 Mar 2001 19:18:36 +1100." <200103060818.IAA02777@xxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
OK, I think I know what the problem with NFS - to a certain extent anyway,
for my next trick, I need to work out how we fix it!

Basically it looks like the stateless nature of NFS is the killer here,
at the end of each RPC call we drop the reference count on the inode,
this causes us to truncate out extra space beyond the end of file. On
the next write we go do it again...... There is code in the irix version
to deal with this, but I am not sure if it needs NFS changes or not.

The nfsd threads end up in places like this most of the time:

[1]kdb> btp 1270
    EBP       EIP         Function(args)
0xc74f3d84 0xc0127c15 truncate_list_pages+0x21
                               kernel .text 0xc0100000 0xc0127bf4 0xc0127dec
           0xc0127e48 truncate_inode_pages+0x5c (0xc33b6c64, 0x1006e000, 0x0)
                               kernel .text 0xc0100000 0xc0127dec 0xc0127e80
           0xc018f6ce pagebuf_inval+0x1a (0xc33b6bc0, 0x1006e000, 0x0, 0x0)
                               kernel .text 0xc0100000 0xc018f6b4 0xc018f6d4
           0xc01ece41 fs_tosspages+0x29 (0xc17a3d30, 0x1006e000, 0x0, 
0xffffffff, 0xffffffff)
                               kernel .text 0xc0100000 0xc01ece18 0xc01ece48
           0xc01cd21f xfs_itruncate_start+0x8f (0xc17a3d18, 0x1, 0x1006e000, 
0x0, 0xc17a3d18)
                               kernel .text 0xc0100000 0xc01cd190 0xc01cd228
           0xc01e55f1 xfs_inactive_free_eofblocks+0x1e5 (0xc75bac00, 0xc17a3d18)
                               kernel .text 0xc0100000 0xc01e540c 0xc01e56dc
           0xc01e5d54 xfs_release+0x74 (0xc17a3d30)
                               kernel .text 0xc0100000 0xc01e5ce0 0xc01e5db8
           0xc01ecbb8 linvfs_release+0x24 (0xc33b6bc0, 0xc74f3ec4)
                               kernel .text 0xc0100000 0xc01ecb94 0xc01ecbc0
           0xc015f35e nfsd_close+0x1e (0xc74f3ec4)
                               kernel .text 0xc0100000 0xc015f340 0xc015f390
           0xc015f97d nfsd_write+0x295 (0xc5ff1600, 0xc64416e0, 0x1006c000, 
0x0, 0xc64500ec)
                               kernel .text 0xc0100000 0xc015f6e8 0xc015f990
           0xc015cba4 nfsd_proc_write+0xb4 (0xc5ff1600, 0xc64415e0, 0xc64416e0)
[1]more> 
                               kernel .text 0xc0100000 0xc015caf0 0xc015cbac
           0xc015c213 nfsd_dispatch+0xcb (0xc5ff1600, 0xc1d10014)
                               kernel .text 0xc0100000 0xc015c148 0xc015c2b0
           0xc0298948 svc_process+0x2ac (0xc3a11b60, 0xc5ff1600)
                               kernel .text 0xc0100000 0xc029869c 0xc0298be0
           0xc015bfba nfsd+0x1ca
                               kernel .text 0xc0100000 0xc015bdf0 0xc015c148
           0xc010750f kernel_thread+0x23
                               kernel .text 0xc0100000 0xc01074ec 0xc010751c

The slower with more memory thing I am not sure about though.

Steve


> 
> More about slow NFS writes... sorry it's a bit long.
> It gets more interesting towards the end :)
> 
> I'm running cvs from today on a dual celeron 500, 512M memory. RedHat
> 7.1beta.  I'm using a uni-processor kernel and no RAID of any sort.
> The disk is a 7200RPM Maxtor ATA100 UDMA mode 5 and is mounted with
> logbufs=8,kiocluster. The fs was made with -l size=32768b. I test it
> with
>   time dd if=/dev/zero of=bigFile bs=1024k count=2500
> as this is close to the real application it'll be used for (multi-
> gigabyte writes of simulation data).
> 
> the summary is:
> writes to local disk run at ~30MByte/s. writes over NFSv3 or v2 run
> at ~1 MByte/s after the memory cache on the server fils up, and the
> cpu load is 90%+ with all of it being system time. The light on the
> drive is on maybe 5% of the time. The expected write speed is ~10MByte/s
> as this is a 100Mbit network. Throughputs for various count= sizes
> are here:
> 
>  write size (MByte)   throughput    time
>   same as count=      (MBytes/s)
>         50               7         7.2s
>        100               6         17.7s
>        200               4         53s
>        400               2.3       2mins 48s
>        800               1.6       8mins 14s
> 
> using ext2 instead of XFS (same disk, just umount, mkfs, remount, re-export):
>        800               9.6       1min 23s
> 
> When using XFS the machine appears to be basically using all its CPU
> running nfsd's. A machine with a faster CPU could well write faster???
> 'top' says that there is consistant 90%+ system time usage.  When
> using ext2 there's about 50% cpu system time being used by the nfsd's.
> 
> Like I've reported before, the behaviour is empirically the same on my
> machine at home with completely different hardware. It's also the
> same asymptotic ~1MB/s bandwidth whether kio or kiocluster or neither
> are used. The logbufs option doesn't matter, nor does NFS v2 or v3 on
> the server, or whether I'm writing from a Linux box, SGI or Tru64 machine.
> 
> When using kiocluster I did once get messages like these logged about
> one every few seconds:
>   ll_rw_kio: request size [10752] not a multiple of device [21:01] block-size
>  [4096]
>   ll_rw_kio: request size [17920] not a multiple of device [21:01] block-size
>  [4096]
>   ll_rw_kio: request size [25088] not a multiple of device [21:01] block-size
>  [4096]
> is this a clue???
> 
> 
> 
> Ok, I did the smart thing to enable bonnie++ runs in non-infinite time
> and pulled all except 128M of memory out of the machine :-) for all
> of the above tests it was 512MB. Now I get:
> 
>  write size (MByte)   throughput   time
>   same as count=      (MBytes/s)
>         50               7         7.2s
>        100               5         18.6s
>        200               4         45.5s
>        400               4         1min 39s
>        800              3.8        3mins 31s
> 
> using ext2 instead of XFS (same blurb as before):
>        800               9.6       1min 23s
> 
> Ok, so this is MASSIVELY weird. Why should LESS memory in the server
> boost speed of NFS writes to XFS??!?!?!?!?! The nfsd load on the
> server is still 90%+ in system time. Are the buffers in memory
> thrashing themselves and thrashing nfsd?
> 
> Anyway, this is lots better but still a factor of 2.5 away from ext2 :-(
> 
> I had another thought - I turned off the nfs locking daemons with:
>  /etc/rc.d/init.d/nfslock stop
> but this makes no difference.
> 
> Ok, bonnie++ results on NFS mounted dirs. This is from an IRIX64 O2k with
> 4G of memory to the celeron500 with 128M memory:
> 
> Version 1.00h     ------Sequential Output------ --Sequential Input- --Random-
>                   -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
> Machine      Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
> to XFS       256M  1217  30  1300   7  2380  18  4107  98 107168 98 1222.4 31
> to ext2      256M  2790  73  4807  27  8107  35  4110  99 104429 99 1228.3 31
> 
> This shows some of the same behaviour as all my dd tests I guess - XFS
> writes being down by a factor of 2.5 over ext2.
> Load on the server is the usual 80%+ for XFS and maybe 30% for ext2.
> The O2k has 4G of memory so that's probably why some read results are
> way too high - but I don't care about them really - only interested in
> writes for now. I'll try out reads later!! :)
> 
> Anyway - enough - 3am and time to go home after 10 hours on this stuff.
> 
> Please let me know if anyone else sees these same effects and if there
> are any explanations, suggestions, solutions, or if there's anything
> else you'd like me to try out :-)
> 
> cheers,
> robin



<Prev in Thread] Current Thread [Next in Thread>