xfs-masters
[Top] [All Lists]

[xfs-masters] Re: Problems reading >1M files from same directory with nf

To: Mark Seger <Mark.Seger@xxxxxx>
Subject: [xfs-masters] Re: Problems reading >1M files from same directory with nfs
From: David Chinner <dgc@xxxxxxx>
Date: Tue, 19 Jun 2007 10:52:51 +1000
Cc: David Chinner <dgc@xxxxxxx>, xfs-masters@xxxxxxxxxxx, linux-xfs@xxxxxxxxxxx, Hank Jakiela <Hank.Jakiela@xxxxxx>
In-reply-to: <467724E1.6050309@hp.com>
References: <4676CFF9.8090805@hp.com> <20070618232559.GF85884050@sgi.com> <467724E1.6050309@hp.com>
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Mon, Jun 18, 2007 at 08:35:45PM -0400, Mark Seger wrote:
> Thanks for the reply.  I guess my general question is that if this is 
> indeed a memory issue, wouldn't you agree that it's a bug if the server 
> essentially becomes incapable of servicing data?

Yup.

> Maybe I wasn't clear 
> that as long as any of the clients are trying to do reads, the cpu 
> essentially locks up at 25% utilization across 4 cpus.  It's not until I 
> kill all the readers that the server returns to normal.

All nfsds reading a single directory - there's a single sleeping
lock for the directory that they are contending on (i_mutex). Hence
one CPU busy maximum.

> >Sounds like you are running out of memory to cache the workload in.
> >The readdir load indicates that you are probably running out of 
> >dentry/inode
> >cache space, and so every lookup is having to re-read the inodes
> >from disk. i.e. readdir and stat are necessary.
> >  
> I hear what you're saying, but why then isn't the original stat slower?  

Because memory reclaim can effectively put random holes the cache.
Hence the second read becomes a random I/O workload instead of a
more sequential workload where readahead can hide most latencies.

> After creating the 1M+ files I can umount/mount the file system or 
> simply reboot the server, assuring nothing is cached and can either stat 
> or read all the files in about 15 minutes so why would rereading inodes 
> from disk happen at a such a slow rate.

Because reading into empty memory in a sequential manner is much
faster than filling random holes in an already full cache that
may be thrashing....

> >I'd suggest looking at /proc/slabinfo (slabtop helps here) and
> >/proc/meminfo to determine how much of your working set of inodes
> >are being held in cache and how quickly they are being recycled.
> >  
> one of the things I do monitor is memory and slab info and can even send 
> you a detailed trace on a per slab basis.  are there any specific slabs 
> I should be looking at?

# egrep [xrdb][fanu][sdnf][i_f] /proc/slabinfo

> >perhaps fiddling with /proc/sys/vm/vfs_cache_pressure will help
> >keep inodes/dentryies in memory over page cache pages...
> >  
> any suggestions for settings?

Whatever is suggested in Documentation/filesystems/proc.txt for keeping
1.5-2x more dentries/inodes around under memory pressure.

Also, when it is thrashing, can you try these combinations:

# echo 1 > /proc/sys/vm/drop_caches
# echo 2 > /proc/sys/vm/drop_caches
# echo 3 > /proc/sys/vm/drop_caches

And see if any of them improve the throughput....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>