On Mon, Dec 12, 2005 at 09:30:32PM +0900, CHIKAMA masaki wrote:
> > > > > At that time, slabtop showed that the number of xfs_ili, xfs_inode,
> > > > > and linvfs_icache objects are becoming very large.
> > Looks to me like you haven't got enough memory to hold all the
> > active log items when chmod -R runs and so you run out of memory
> > before tail pushing occurs and the inode log items are released.
> > Because there is no memory available (all in slab and
> > unreclaimable(?) page cache), XFS may not be able to flush and free
> > the dirty inodes because it can require page cache allocation if the
> > backing pages for the inode were reclaimed before the tail was
> > pushed....
> I think this is not an acceptable reason.
> If I have a fast CPU, reasonable filesystem size to equipped memory
> and slow disk, then system can easily eat up all memory.
> This leads to local DoS.
Well, no. We'd have lots of reports of this problem if that
was the case.
You need a fast disk to enable the page cache to eat itself - a slow
disk can't bring in enough data to turn the page cache over fast
enough to cause this situation.
That's the reason we have never seen this before - not very many
people decide to put 10TB of fast disk behind a machine with very
> > There are two immediate solutions that I can see to your problem:
> > 1. Buy more RAM. If you can afford 10TB of disk, then you can
> > afford to buy at least a couple of GB of RAM to go with it.
> > 2. Remake your filesystem with a smaller log so that
> > it can't hold as many active items.
> I think the 2nd is questionable.
> The xfs_info said that the 10TB xfs filesystem's log size is = 4096 * 32768.
> But another 200GB xfs filesystem's logs size is = 4096 * 25600.
Yes, that is correct.
If you read the mkfs.xfs man page, you'll see that is says that the size of
the log is scaled with fs size and reaches it's maximum size at 1TB. So at
200GB, the log is still pretty large. Using:
mkfs.xfs -l size=64m <other options> <dev>
will give you a 64MB log in your 10TB filesystem rather than the default
of 128MB. That is what I meant when I said remake your filesystem with
a smaller log - I should have pointed out how to do that with the above
R&D Software Enginner
SGI Australian Software Group