xfs
[Top] [All Lists]

Re: deep chmod|chown -R begin to start OOMkiller

To: David Chinner <dgc@xxxxxxx>
Subject: Re: deep chmod|chown -R begin to start OOMkiller
From: CHIKAMA masaki <masaki-c@xxxxxxxxxx>
Date: Mon, 12 Dec 2005 21:30:32 +0900
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20051212014633.GC19154461@melbourne.sgi.com>
References: <20051207183531.5c13e8c5.masaki-c@nict.go.jp> <20051208070841.GJ501696@melbourne.sgi.com> <20051209104148.346f2ff5.masaki-c@nict.go.jp> <20051212014633.GC19154461@melbourne.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Hello.

On Mon, 12 Dec 2005 12:46:33 +1100
David Chinner <dgc@xxxxxxx> wrote:

> > Machine spec.
> > 
> > CPU : Pentium4 3.0G (512KB chache) HT enabled
> > MEM : 512MB (+ 1GB swap)
> > SCSI HA: Adaptec AHA-3960D
> > DISK: External RAID unit (10TB)
> > filesystem: xfs on lvm2
> 
> Large filesystem, comparitively little RAM to speak of.

Yes, I know that. ;-)

> > > > At that time, slabtop showed that the number of xfs_ili, xfs_inode, 
> > > > and linvfs_icache objects are becoming very large.
> 
> Looks to me like you haven't got enough memory to hold all the
> active log items when chmod -R runs and so you run out of memory
> before tail pushing occurs and the inode log items are released.
> 
> Because there is no memory available (all in slab and
> unreclaimable(?) page cache), XFS may not be able to flush and free
> the dirty inodes because it can require page cache allocation if the
> backing pages for the inode were reclaimed before the tail was
> pushed....

I think this is not an acceptable reason.
If I have a fast CPU, reasonable filesystem size to equipped memory
and slow disk, then system can easily eat up all memory. 
This leads to local DoS.


> There are two immediate solutions that I can see to your problem:
> 
>       1. Buy more RAM. If you can afford 10TB of disk, then you can
>          afford to buy at least a couple of GB of RAM to go with it.
> 
>       2. Remake your filesystem with a smaller log so that
>            it can't hold as many active items.

I think the 2nd is questionable.
The xfs_info said that the 10TB xfs filesystem's log size is = 4096 * 32768.

meta-data=/raid/disk1            isize=256    agcount=32, agsize=85391104 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=2732515328, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

But another 200GB xfs filesystem's logs size is = 4096 * 25600.

meta-data=/raid/disk0            isize=256    agcount=16, agsize=3276800 blks
         =                       sectsz=512
data     =                       bsize=4096   blocks=52428800, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096
log      =internal               bsize=4096   blocks=25600, version=1
         =                       sectsz=512   sunit=0 blks
realtime =none                   extsz=65536  blocks=0, rtextents=0

32768 is not small enough compared to 25600.
Is my understanding correct?


Thanks.
-- 
CHIKAMA Masaki @ NICT


<Prev in Thread] Current Thread [Next in Thread>