xfs
[Top] [All Lists]

Re: Excessive xfs_inode allocations trigger OOM killer

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Excessive xfs_inode allocations trigger OOM killer
From: Florian Weimer <fw@xxxxxxxxxxxxx>
Date: Tue, 20 Sep 2016 22:56:31 +0200
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20160920203039.GI340@dastard> (Dave Chinner's message of "Wed, 21 Sep 2016 06:30:39 +1000")
References: <87a8f2pd2d.fsf@xxxxxxxxxxxxxxxxx> <20160920203039.GI340@dastard>
* Dave Chinner:

>>   OBJS ACTIVE  USE OBJ SIZE  SLABS OBJ/SLAB CACHE SIZE NAME 
>> 4121208 4121177  99%    0.88K 1030302        4   4121208K xfs_inode
>> 986286 985229  99%    0.19K  46966       21    187864K dentry
>> 723255 723076  99%    0.10K  18545       39     74180K buffer_head
>> 270263 269251  99%    0.56K  38609        7    154436K radix_tree_node
>> 140310  67409  48%    0.38K  14031       10     56124K mnt_cache
>
> That's not odd at all. It means your workload is visiting millions
> on inodes in your filesystem between serious memory pressure events.

Okay.

>> (I have attached the /proc/meminfo contents in case it offers further
>> clues.)
>> 
>> Confronted with large memory allocations (from âmake -j12â and
>> compiling GCC, so perhaps ~8 GiB of memory), the OOM killer kicks in
>> and kills some random process.  I would have expected that some
>> xfs_inodes are freed instead.
>
> The oom killer is unreliable and often behaves very badly, and
> that's typicaly not an XFS problem.
>
> What is the full output off the oom killer invocations from dmesg?

I've attached the dmesg output (two events).

>> The last time I saw
>> something like the slabtop output above, I could do âsysctl
>> vm.drop_caches = 3â, and the amount of memory allocated reported by
>> slabtop was reduced considerably.  (I have not checked if the memory
>> was actually returned to the system.)  I have not done this now so
>> that I can gather further data for debugging.i
>
> How long did the sysctl take to run to free those inodes? A few
> seconds, or minutes?

Seconds.  Very few of them.  Definitely not minutes.

>> I am not sure what
>> triggers this huge allocation.  It could be related to my Gnus mail
>> spool (which contains lots and lots of small files).
>
> OK - does that regularly dirty lots of those small files?

I use relatime for the file system, and the atime of old mail does not
show any recent access (certainly not since the last boot).

During normal operation, Gnus doesn't even read individual files or
list mail spool directories (which contain one message per file,
similar to Maildir).  I don't do much else on the box, and the GCC
compilation I started only involes 300,000 files or so.

> What sort of storage are you using, and what fs config?

It's a Samsung SATA SSD (MZ7WD960HAGP), supposedly enterprise-grade.

Mount flags:

/dev/sda1 on / type xfs (rw,relatime,attr2,inode64,noquota)

xfs_info:

meta-data=/dev/sda1              isize=256    agcount=4, agsize=57690368 blks
         =                       sectsz=512   attr=2, projid32bit=0
         =                       crc=0        finobt=0
data     =                       bsize=4096   blocks=230761472, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=0
log      =internal               bsize=4096   blocks=112676, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

Core i7 X980 CPU, 6 cores, 12 threads, hyperthreading enabled.

What is BBWC?

>> Dirty:                28 kB
>> Writeback:             0 kB
>
> There's no dirty data, and dropping caches makes progress, so this
> doesn't /sound/ like reclaim is getting stalled by dirty object
> writeback. More info needed.

Sorry, this is all new to me, I don't really know where to look.

Attachment: oom.txt
Description: OOM output

<Prev in Thread] Current Thread [Next in Thread>