Hello.
On Mon, 12 Dec 2005 12:46:33 +1100
David Chinner <dgc@xxxxxxx> wrote:
> > Machine spec.
> >
> > CPU : Pentium4 3.0G (512KB chache) HT enabled
> > MEM : 512MB (+ 1GB swap)
> > SCSI HA: Adaptec AHA-3960D
> > DISK: External RAID unit (10TB)
> > filesystem: xfs on lvm2
>
> Large filesystem, comparitively little RAM to speak of.
Yes, I know that. ;-)
> > > > At that time, slabtop showed that the number of xfs_ili, xfs_inode,
> > > > and linvfs_icache objects are becoming very large.
>
> Looks to me like you haven't got enough memory to hold all the
> active log items when chmod -R runs and so you run out of memory
> before tail pushing occurs and the inode log items are released.
>
> Because there is no memory available (all in slab and
> unreclaimable(?) page cache), XFS may not be able to flush and free
> the dirty inodes because it can require page cache allocation if the
> backing pages for the inode were reclaimed before the tail was
> pushed....
I think this is not an acceptable reason.
If I have a fast CPU, reasonable filesystem size to equipped memory
and slow disk, then system can easily eat up all memory.
This leads to local DoS.
> There are two immediate solutions that I can see to your problem:
>
> 1. Buy more RAM. If you can afford 10TB of disk, then you can
> afford to buy at least a couple of GB of RAM to go with it.
>
> 2. Remake your filesystem with a smaller log so that
> it can't hold as many active items.
I think the 2nd is questionable.
The xfs_info said that the 10TB xfs filesystem's log size is = 4096 * 32768.
meta-data=/raid/disk1 isize=256 agcount=32, agsize=85391104 blks
= sectsz=512
data = bsize=4096 blocks=2732515328, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=32768, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
But another 200GB xfs filesystem's logs size is = 4096 * 25600.
meta-data=/raid/disk0 isize=256 agcount=16, agsize=3276800 blks
= sectsz=512
data = bsize=4096 blocks=52428800, imaxpct=25
= sunit=0 swidth=0 blks, unwritten=1
naming =version 2 bsize=4096
log =internal bsize=4096 blocks=25600, version=1
= sectsz=512 sunit=0 blks
realtime =none extsz=65536 blocks=0, rtextents=0
32768 is not small enough compared to 25600.
Is my understanding correct?
Thanks.
--
CHIKAMA Masaki @ NICT
|