xfs_iext_realloc_indirect and "XFS: possible memory allocation deadlock"

Alex Lyakas alex at zadarastorage.com
Sat Jun 27 16:01:30 CDT 2015


Hi Dave,

First, to answer your questions:
- this filesystem is accessed only through NFS, so I guess we can say this 
is a "dedicated" NFS server
- yes, both NFSv3 and NFSv4 are used (by different servers), and definitely 
they may attempt to access the same file

I did some further debugging by patching XFS to get some more info on how 
much memory is requested, and who requests it. I added a kmem_realloc 
variant that is called only by xfs_iext_realloc_indirect. This new function 
(kmem_realloc_xfs_iext_realloc_indirect) also prints the inode number, which 
needs the memory. This is after some fiddling to check that "ifp" is the one 
embedded in xfs_inode and not the one allocated through kmem_cache (attr 
fork). After memory allocation fails for the first time, I also call 
handle_sysrq('m'). Finally, I print after how many retries the allocation 
succeeded. I also added some info to the normal kmem_alloc function.

Results are following:
- memory allocation failures happened only on the 
kmem_realloc_xfs_iext_realloc_indirect path for now
- XFS hits memory re-allocation failures when it needs to allocate about 
35KB. Sometimes allocation succeeds after few retries, but sometimes it 
takes several thousands of retries.
- All allocation failures happened on NFSv3 paths
- Three inode numbers were reported as failing memory allocations. After 
several hours, "find -inum" is still searching for these inodes...this is a 
huge filesystem... Is there any other quicker (XFS-specific?) way to find 
the file based on inode number?

Please see a typical allocation failure in [1]

Any recommendation how to move forward with this issue?

Additional observation that I saw in my local system: writing files to XFS 
locally vs writing the same files via NFS (both 3 and 4), the amount of 
extents reported by "xfs_bmap" is much higher for the NFS case. For example, 
creating a new file and writing into it as follows:
- write 4KB
- skip 4KB (i.e., lseek to 4KB + 4KB)
- write 4KB
- skip 4KB
...
Create a file of say 50MB this way.

Locally it ends up with very few (1-5) extents. But same exact workload 
through NFS results in several thousands of extents. The filesystem is 
mounted as "sync" in both cases. This is currently just an observation, 
which I will try to debug further. Are you perhaps familiar with such 
behavior? Can you perhaps try and check whether it also happens in your 
environment?

Thanks for your help,
Alex.



[1]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.497033] XFS: pid=12642 
kmem_alloc failure inum=35646862875 size=35584 flags=0x4 lflags=0x250
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498407] Pid: 12642, comm: 
nfsd Tainted: GF          O 3.8.13-030813-generic #201305111843
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498410] Call Trace:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498450] 
[<ffffffffa07a832e>] kmem_alloc_xfs_iext_realloc_indirect+0x16e/0x1f0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498465] 
[<ffffffffa07a86b3>] kmem_realloc_xfs_iext_realloc_indirect+0x33/0x80 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498482] 
[<ffffffffa07db2a0>] xfs_iext_realloc_indirect+0x40/0x60 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498498] 
[<ffffffffa07db2ff>] xfs_iext_irec_new+0x3f/0x180 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498516] 
[<ffffffffa07db5b9>] xfs_iext_add_indirect_multi+0x179/0x2b0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498532] 
[<ffffffffa07db9be>] xfs_iext_add+0xce/0x290 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498539] 
[<ffffffff81076634>] ? wake_up_worker+0x24/0x30
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498553] 
[<ffffffffa07dbbd2>] xfs_iext_insert+0x52/0x100 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498569] 
[<ffffffffa07b6ed3>] ? xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498583] 
[<ffffffffa07b6ed3>] xfs_bmap_add_extent_hole_delay+0xd3/0x6a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498598] 
[<ffffffffa07da3fc>] ? xfs_iext_bno_to_ext+0x8c/0x160 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498614] 
[<ffffffffa07b7719>] xfs_bmapi_reserve_delalloc+0x279/0x2a0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498628] 
[<ffffffffa07be742>] xfs_bmapi_delay+0x122/0x270 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498642] 
[<ffffffffa079d703>] xfs_iomap_write_delay+0x173/0x320 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498656] 
[<ffffffffa07be42c>] ? xfs_bmapi_read+0xfc/0x2f0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498660] 
[<ffffffff8135d8f3>] ? call_rwsem_down_write_failed+0x13/0x20
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498665] 
[<ffffffff81195bbc>] ? lookup_page_cgroup+0x4c/0x50
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498677] 
[<ffffffffa078ab40>] __xfs_get_blocks+0x280/0x550 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498688] 
[<ffffffffa078ae41>] xfs_get_blocks+0x11/0x20 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498695] 
[<ffffffff811cf77e>] __block_write_begin+0x1ae/0x4e0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498705] 
[<ffffffffa078ae30>] ? xfs_get_blocks_direct+0x20/0x20 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498708] 
[<ffffffff81135fff>] ? grab_cache_page_write_begin+0x8f/0xf0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498720] 
[<ffffffffa078a09f>] xfs_vm_write_begin+0x5f/0xe0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498725] 
[<ffffffff8113552a>] generic_perform_write+0xca/0x210
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498729] 
[<ffffffff811356cd>] generic_file_buffered_write+0x5d/0x90
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498740] 
[<ffffffffa07952d5>] xfs_file_buffered_aio_write+0x115/0x1c0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498744] 
[<ffffffff816159f4>] ? ip_finish_output+0x224/0x3b0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498758] 
[<ffffffffa079547c>] xfs_file_aio_write+0xfc/0x1b0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498769] 
[<ffffffffa0795380>] ? xfs_file_buffered_aio_write+0x1c0/0x1c0 [xfs]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498773] 
[<ffffffff8119b8c3>] do_sync_readv_writev+0xa3/0xe0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498776] 
[<ffffffff8119bb8d>] do_readv_writev+0xcd/0x1d0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498780] 
[<ffffffff810877e0>] ? set_groups+0x40/0x60
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498805] 
[<ffffffffa033a6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498808] 
[<ffffffff8119bccc>] vfs_writev+0x3c/0x50
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498815] 
[<ffffffffa0333dd2>] nfsd_vfs_write.isra.12+0x92/0x350 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498818] 
[<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498825] 
[<ffffffffa0336679>] nfsd_write+0xf9/0x110 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498834] 
[<ffffffffa0340dd1>] nfsd3_proc_write+0xb1/0x140 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498841] 
[<ffffffffa032fd62>] nfsd_dispatch+0x102/0x270 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498865] 
[<ffffffffa0103b48>] svc_process_common+0x328/0x5e0 [sunrpc]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498876] 
[<ffffffffa0104153>] svc_process+0x103/0x160 [sunrpc]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498882] 
[<ffffffffa032f72f>] nfsd+0xbf/0x130 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498888] 
[<ffffffffa032f670>] ? nfsd_destroy+0x80/0x80 [nfsd]
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498892] 
[<ffffffff8107f050>] kthread+0xc0/0xd0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498895] 
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498900] 
[<ffffffff816f61ec>] ret_from_fork+0x7c/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498903] 
[<ffffffff8107ef90>] ? flush_kthread_worker+0xb0/0xb0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.498905] SysRq : Show Memory
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499385] Mem-Info:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499388] Node 0 DMA per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499391] CPU    0: hi:    0, 
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499392] CPU    1: hi:    0, 
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499394] CPU    2: hi:    0, 
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499395] CPU    3: hi:    0, 
btch:   1 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499396] Node 0 DMA32 
per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499399] CPU    0: hi:  186, 
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499400] CPU    1: hi:  186, 
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499402] CPU    2: hi:  186, 
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499403] CPU    3: hi:  186, 
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499404] Node 0 Normal 
per-cpu:
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499406] CPU    0: hi:  186, 
btch:  31 usd:  23
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499407] CPU    1: hi:  186, 
btch:  31 usd:  25
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499409] CPU    2: hi:  186, 
btch:  31 usd:   0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499410] CPU    3: hi:  186, 
btch:  31 usd:   7
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415] active_anon:2143 
inactive_anon:44181 isolated_anon:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  active_file:913373 
inactive_file:1464930 isolated_file:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  unevictable:8449 
dirty:6742 writeback:115 unstable:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free:159418 
slab_reclaimable:146857 slab_unreclaimable:66681
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  mapped:5561 
shmem:383 pagetables:2195 bounce:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499415]  free_cma:0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499418] Node 0 DMA 
free:15884kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:0kB 
active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB 
isolated(file):0kB present:15628kB managed:15884kB mlocked:0kB dirty:0kB 
writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB 
slab_unreclaimable:0kB kernel_stack:0kB pagetables:0kB unstable:0kB 
bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? 
yes
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499423] lowmem_reserve[]: 0 
3512 12080 12080
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499427] Node 0 DMA32 
free:240040kB min:19624kB low:24528kB high:29436kB active_anon:8kB
inactive_anon:15196kB active_file:1503256kB inactive_file:1624328kB 
unevictable:0kB isolated(anon):0kB isolated(file):0kB present:3596532kB
managed:3546428kB mlocked:0kB dirty:14652kB writeback:0kB mapped:988kB 
shmem:0kB slab_reclaimable:173044kB slab_unreclaimable:15988kB
kernel_stack:248kB pagetables:28kB unstable:0kB bounce:0kB free_cma:0kB 
writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499432] lowmem_reserve[]: 0 
0 8568 8568
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499435] Node 0 Normal 
free:381748kB min:47872kB low:59840kB high:71808kB active_anon:8564kB
inactive_anon:161528kB active_file:2150236kB inactive_file:4235392kB 
unevictable:33796kB isolated(anon):0kB isolated(file):0kB present:8773632kB
managed:8715068kB mlocked:33796kB dirty:12316kB writeback:460kB 
mapped:21256kB shmem:1532kB slab_reclaimable:414384kB
slab_unreclaimable:250736kB kernel_stack:8424kB pagetables:8752kB 
unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0
all_unreclaimable? no
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499443] lowmem_reserve[]: 0 
0 0 0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499446] Node 0 DMA: 1*4kB 
(U) 1*8kB (U) 0*16kB 0*32kB 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 
1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15884kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499458] Node 0 DMA32: 
27954*4kB (UEM) 12312*8kB (UEM) 1713*16kB (UEM) 17*32kB (UMR) 2*64kB (R) 
2*128kB (R) 1*256kB (R) 1*512kB (R) 1*1024kB (R) 0*2048kB 0*4096kB = 
240440kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499469] Node 0 Normal: 
31971*4kB (UEM) 31254*8kB (UEM) 2*16kB (EM) 5*32kB (R) 4*64kB (R) 2*128kB 
(R) 0*256kB 0*512kB 1*1024kB (R) 1*2048kB (R) 0*4096kB = 381692kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499480] 2380174 total 
pagecache pages
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499482] 0 pages in swap 
cache
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499483] Swap cache stats: 
add 0, delete 0, find 0/0
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499485] Free swap  = 
522236kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.499486] Total swap 
=522236kB
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3145712 pages RAM
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 69093 pages 
reserved
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 3401898 pages 
shared
Jun 26 14:19:08 vsa-00000003-vc-1 kernel: [91809.528135] 1067069 pages 
non-shared
...
34501    Jun 26 14:20:03 vsa-00000003-vc-1 kernel: [91864.156051] XFS: 
pid=12642 kmem_alloc success after 13443 retries inum=35646862875 size=35584 
flags=0x4 lflags=0x250

-----Original Message----- 
From: Dave Chinner
Sent: 23 June, 2015 10:18 PM
To: Alex Lyakas
Cc: xfs at oss.sgi.com ; hch at lst.de
Subject: Re: xfs_iext_realloc_indirect and "XFS: possible memory allocation
deadlock"

On Tue, Jun 23, 2015 at 10:18:21AM +0200, Alex Lyakas wrote:
> Greetings,
>
> We are hitting an issue with XFS printing messages like “XFS:
> possible memory allocation deadlock in kmem_alloc
> (mode:0x250)” and stack trace like in [1]. Eventually,
> hung-task panic kicks in with stack traces like [2].
>
> We are running kernel 3.8.13. I see that in
> http://oss.sgi.com/archives/xfs/2012-01/msg00341.html a similar
> issue has been discussed, but no code changes followed comparing
> to what we have in 3.8.13.
>
> Any suggestion on how to move forward with this problem? For
> example, does this memory has to be really allocated with kmalloc
> (i.e., physically continuous) or vmalloc can be used?

We left it alone because it is relatively rare for people to hit it,
and generally it indicates a severe fragmentation problem when they
do hit it (i.e. a file with millions of extents in it). Can you
track down the file that this is occurring against and see how many
extents it has?

i.e. you may be much better off by taking measures to avoid excessive
fragmentation than removing the canary from the mine...

> [109626.075483] nfsd            D 0000000000000002     0 20042      2
> 0x00000000

Hmmm - it's also a file written by the NFS server - this is is on an
a dedicated NFS server?

> [109626.075483]  [<ffffffffa01be6b0>] ? nfsd_setuser+0x120/0x2b0 [nfsd]
> [109626.075483]  [<ffffffff8119bccc>] ? vfs_writev+0x3c/0x50
> [109626.075483]  [<ffffffffa01b7dd2>] ? nfsd_vfs_write.isra.12+0x92/0x350
> [nfsd]
> [109626.075483]  [<ffffffff8119a6cb>] ? dentry_open+0x6b/0xd0
> [109626.075483]  [<ffffffffa01ba679>] ? nfsd_write+0xf9/0x110 [nfsd]
> [109626.075483]  [<ffffffffa01c4dd1>] ? nfsd3_proc_write+0xb1/0x140 [nfsd]

Interesting that this is an NFSv3 write...

> [87303.976119] INFO: task nfsd:5684 blocked for more than 180 seconds.
> [87303.976976] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
> this message.
> [87303.978012] nfsd            D 0000000000000003     0  5684      2
> 0x00000000
....
> [87303.978174]  [<ffffffffa0269623>] nfsd_write+0xa3/0x110 [nfsd]
> [87303.978182]  [<ffffffffa027794c>] nfsd4_write+0x1cc/0x250 [nfsd]
> [87303.978189]  [<ffffffffa027746c>] nfsd4_proc_compound+0x5ac/0x7a0
> [nfsd]

And that is a NFsv4 write. You have multiple clients writing to the
same file using different versions of the NFS protocol?

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com 



More information about the xfs mailing list