xfs
[Top] [All Lists]

xfs_iread_extents high order allocation failure (with fix idea)

To: linux-xfs@xxxxxxxxxxx
Subject: xfs_iread_extents high order allocation failure (with fix idea)
From: Neil Bortnak <nbortnak@xxxxxxxxx>
Date: Mon, 25 Oct 2004 23:31:26 +0900
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.8 (X11/20040913)
Hi everyone,

First off, thanks for a great filesystem. I've been using it for years without a
hiccup.

Currently, I have a program which does a lot of simultaneous random accesses to
alot of big files on a fairly large LVM volume (1.1T). I think I have traced
the problem to a number of high order (5) allocations by xfs_iread_extents.
Eventually (due to memory fragmentation I guess) the system is unable to
process the request. After several failed attempt the OOM killer steps in and
kills the process.

This appears to be the same bug as the one in June at:
http://oss.sgi.com/archives/linux-xfs/2004-06/msg00079.html
There is a patch included, but Nathan Scott said that Christoph wasn't entirely
happy with it. I had a possible idea for a fix...

Since physically contiguous pages are only strictly needed by the hardware (I
understand it is a performance issue otherwise), and xfs is up a little higher
than that, there shouldn't technically be a reason I couldn't use vmalloc
instead of kmalloc in xfs's kmem_alloc routine, is there?

More appropriately I could add a flag to the kmem_alloc routine so that it will
use either one based on the preferences of the caller. Then I could use vmalloc
from xfs_iread_extents, or other places where I expect a large alloc. Also,
whatever is being allocated here seems to get allocated and then sticks around,
so if vmalloc doesn't use the slab, the alloc/dealloc overhead still might not
be too large.

I am happy to write up a patch to do this and test it, but before I do anything
really stupid, I thought I'd check in with you guys.

Good idea? Bad idea?

Neil

P.S. Didn't someone recently throw in an mm patch for new/better memory
defragmentation?

P.P.S. Here is the stack trace that got me here and some info about my box.
Also, why is there networking stuff in the trace? Pre-emption?

Oct  3 20:10:11 kryten kernel: java: page allocation failure. order:5, mode:0xd0
Oct  3 20:10:12 kryten kernel:  [<c0131e15>] __alloc_pages+0x335/0x400
Oct  3 20:10:12 kryten kernel:  [<c0131f05>] __get_free_pages+0x25/0x40
Oct  3 20:10:12 kryten kernel:  [<c0135040>] kmem_getpages+0x20/0xb0
Oct  3 20:10:12 kryten kernel:  [<c0135b56>] cache_grow+0x96/0x130
Oct  3 20:10:12 kryten kernel:  [<c0135d2e>] cache_alloc_refill+0x13e/0x200
Oct  3 20:10:12 kryten kernel:  [<c01361a0>] __kmalloc+0x70/0x80
Oct  3 20:10:12 kryten kernel:  [<c01f8b19>] kmem_alloc+0x59/0xc0
Oct  3 20:10:12 kryten kernel:  [<c01d508f>] xfs_iread_extents+0x4f/0x110
Oct  3 20:10:12 kryten kernel:  [<c0341da0>] ip_local_deliver_finish+0x0/0x1b0
Oct  3 20:10:12 kryten kernel:  [<c01aca76>] xfs_bmapi+0x246/0x15e0
Oct  3 20:10:12 kryten kernel:  [<c0341f50>] ip_rcv_finish+0x0/0x270
Oct  3 20:10:12 kryten kernel:  [<c0341f50>] ip_rcv_finish+0x0/0x270
Oct  3 20:10:12 kryten kernel:  [<c0334f29>] nf_hook_slow+0xc9/0x100
Oct  3 20:10:12 kryten kernel:  [<c0341f50>] ip_rcv_finish+0x0/0x270
Oct  3 20:10:12 kryten kernel:  [<c032c043>] netif_receive_skb+0x193/0x1f0
Oct  3 20:10:12 kryten kernel:  [<c025f238>] rtl8139_rx+0x188/0x2d0
Oct  3 20:10:12 kryten kernel:  [<c025f562>] rtl8139_poll+0x42/0xc0
Oct  3 20:10:12 kryten kernel:  [<c032c20a>] net_rx_action+0x6a/0xf0
Oct  3 20:10:12 kryten kernel:  [<c01d9146>] xfs_iomap+0x1a6/0x550
Oct  3 20:10:12 kryten kernel:  [<c0131e27>] __alloc_pages+0x347/0x400
Oct  3 20:10:12 kryten kernel:  [<c01fa0e9>] linvfs_get_block_core+0xa9/0x2c0
Oct  3 20:10:12 kryten kernel:  [<c0131f13>] __get_free_pages+0x33/0x40
Oct  3 20:10:12 kryten kernel:  [<c0135b9e>] cache_grow+0xde/0x130
Oct  3 20:10:12 kryten kernel:  [<c01fa347>] linvfs_get_block+0x47/0x50
Oct  3 20:10:12 kryten kernel:  [<c0168312>] do_mpage_readpage+0x132/0x480
Oct  3 20:10:12 kryten kernel:  [<c012db6f>] unlock_page+0x1f/0x30
Oct  3 20:10:12 kryten kernel:  [<c020ce9f>] radix_tree_node_alloc+0x1f/0x60
Oct  3 20:10:12 kryten kernel:  [<c020d112>] radix_tree_insert+0xe2/0x100
Oct  3 20:10:12 kryten kernel:  [<c012d942>] add_to_page_cache+0x52/0x70
Oct  3 20:10:12 kryten kernel:  [<c01687ab>] mpage_readpages+0x14b/0x180
Oct  3 20:10:12 kryten kernel:  [<c01fa300>] linvfs_get_block+0x0/0x50
Oct  3 20:10:12 kryten kernel:  [<c01d3d25>] xfs_iformat+0x495/0x5d0
Oct  3 20:10:12 kryten kernel:  [<c0134804>] read_pages+0x134/0x140
Oct  3 20:10:12 kryten kernel:  [<c01fa300>] linvfs_get_block+0x0/0x50
Oct  3 20:10:12 kryten kernel:  [<c0131e27>] __alloc_pages+0x347/0x400
Oct  3 20:10:12 kryten kernel:  [<c02da689>] dm_table_any_congested+0x19/0x60
Oct  3 20:10:12 kryten kernel:  [<c02d8250>] dm_any_congested+0x30/0x60
Oct  3 20:10:12 kryten kernel:  [<c0134a5f>] do_page_cache_readahead+0xcf/0x130
Oct  3 20:10:12 kryten kernel:  [<c0134bc3>] page_cache_readahead+0x103/0x1f0
Oct  3 20:10:12 kryten kernel:  [<c012e0fc>] do_generic_mapping_read+0xdc/0x470
Oct  3 20:10:12 kryten kernel:  [<c015fd45>] d_splice_alias+0x45/0xc0
Oct  3 20:10:12 kryten kernel:  [<c012e73f>] __generic_file_aio_read+0x1bf/0x1f0
Oct  3 20:10:12 kryten kernel:  [<c012e490>] file_read_actor+0x0/0xf0
Oct  3 20:10:12 kryten kernel:  [<c0200835>] xfs_read+0x155/0x270
Oct  3 20:10:12 kryten kernel:  [<c01fca8a>] linvfs_read+0x8a/0xa0
Oct  3 20:10:12 kryten kernel:  [<c01493a4>] do_sync_read+0x84/0xb0
Oct  3 20:10:12 kryten kernel:  [<c0152e87>] sys_fstat64+0x37/0x40
Oct  3 20:10:12 kryten kernel:  [<c0149488>] vfs_read+0xb8/0x130
Oct  3 20:10:12 kryten kernel:  [<c0149731>] sys_read+0x51/0x80
Oct  3 20:10:12 kryten kernel:  [<c0105a7b>] syscall_call+0x7/0xb
...
more of the same
...
Oct  3 20:32:00 kryten kernel: oom-killer: gfp_mask=0xd0
Oct  3 20:32:00 kryten kernel: DMA per-cpu:
Oct  3 20:32:00 kryten kernel: cpu 0 hot: low 2, high 6, batch 1
Oct  3 20:32:00 kryten kernel: cpu 0 cold: low 0, high 2, batch 1
Oct  3 20:32:00 kryten kernel: Normal per-cpu:
Oct  3 20:32:00 kryten kernel: cpu 0 hot: low 32, high 96, batch 16
Oct  3 20:32:00 kryten kernel: cpu 0 cold: low 0, high 32, batch 16
Oct  3 20:32:00 kryten kernel: HighMem per-cpu:
Oct  3 20:32:00 kryten kernel: cpu 0 hot: low 14, high 42, batch 7
Oct  3 20:32:00 kryten kernel: cpu 0 cold: low 0, high 14, batch 7
Oct  3 20:32:00 kryten kernel:
Oct  3 20:32:00 kryten kernel: Free pages:      389392kB (252kB HighMem)
Oct  3 20:32:00 kryten kernel: Active:125049 inactive:4968 dirty:0 writeback:5
unstable:0 free:97348 slab:23724 mapped:124183 pagetables:813
Oct  3 20:32:01 kryten kernel: DMA free:1932kB min:16kB low:32kB high:48kB
active:1512kB inactive:0kB present:16384kB
Oct  3 20:32:01 kryten kernel: protections[]: 8 476 540
Oct  3 20:32:01 kryten kernel: Normal free:387208kB min:936kB low:1872kB
high:2808kB active:393700kB inactive:1408kB present:901120kB
Oct  3 20:32:01 kryten kernel: protections[]: 0 468 532
Oct  3 20:32:01 kryten kernel: HighMem free:252kB min:128kB low:256kB high:384kB
active:104984kB inactive:18464kB present:131008kB
Oct  3 20:32:01 kryten kernel: protections[]: 0 0 64
Oct  3 20:32:01 kryten kernel: DMA: 55*4kB 18*8kB 4*16kB 15*32kB 6*64kB 1*128kB
0*256kB 1*512kB 0*1024kB 0*2048kB 0*4096kB = 1932kB
Oct  3 20:32:01 kryten kernel: Normal: 39944*4kB 18555*8kB 4213*16kB 336*32kB
13*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 387208kB
Oct  3 20:32:01 kryten kernel: HighMem: 1*4kB 1*8kB 1*16kB 5*32kB 1*64kB 0*128kB
0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 252kB
Oct  3 20:32:01 kryten kernel: Swap cache: add 7820, delete 7788, find
1504/1705, race 0+0

Using Con Kolivas' patchset
Linux kryten 2.6.8.1-ck7 #2 Tue Sep 14 23:12:49 JST 2004 i686 GNU/Linux

             total       used       free     shared    buffers     cached
Mem:       1035300     998228      37072          0       6884     329440
-/+ buffers/cache:     661904     373396
Swap:      2000080     143652    1856428


processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 10 model name : AMD Athlon(tm) stepping : 0 cpu MHz : 1111.224 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow bogomips : 2187.26


<Prev in Thread] Current Thread [Next in Thread>