Marcelo Tosatti wrote:
>
> On Tue, 26 Dec 2000, Marcelo Tosatti wrote:
>
> > The correct solution to your problem is to not pass __GFP_IO in the
> > allocation flag passed to __alloc_pages.
> >
> > This way the allocation routines will not try to do any kind of IO and
> > will not wait for kswapd.
> >
The problem still happens with the patch, now under a different scenario.
The issue is that _any_ memory allocation under a FS lock can wait for
kswapd ... and kswapd can, in turn, wait for a FS lock while pruning the dcache.
Following is a typical backtrace of (1) kswapd (2) a process waiting for
memory while trying to copy-in a part of user memory. Obviously, it will
be impossible to fix all these allocations to not have GFP_IO, so an alternate
strategy in __alloc_pages that does not wait for kswapd is one likely solution.
Another possibility is to have a seperate daemon for doing the pruning work.
A third possibility is to let FS's iput return failure when it can't get locks;
thus avoiding kswapd to wait indefinitely.
What do you think?
-----------------
[1]kdb> btp 3
EBP EIP Function(args)
0xc1147e34 0xc0112832 schedule+0x416
kernel .text 0xc0100000 0xc011241c 0xc0112a40
0xc486db0e [xfs_support]lock_wait+0xa6 (0xc1d535f0, 0xc1d53608, 0x0)
xfs_support .text 0xc486d060 0xc486da68
0xc486db3c
0xc486dbac [xfs_support]mraccessf_Rsmp_c4d68361+0x48 (0xc1d535e4,
0x288)
xfs_support .text 0xc486d060 0xc486db64
0xc486dbc4
0xc48ac176 [xfs]xfs_ilock_ra+0x8a (0xc1d53558, 0x8)
xfs .text 0xc4873060 0xc48ac0ec 0xc48ac180
0xc48ac193 [xfs]xfs_ilock+0x13 (0xc48c60d4, 0xc1d53558, 0x8)
xfs .text 0xc4873060 0xc48ac180 0xc48ac198
0xc48c60d4 [xfs]xfs_inactive_free_eofblocks+0xb8 (0xc2f41000,
0xc1d53558)
xfs .text 0xc4873060 0xc48c601c 0xc48c62ec
0xc48c6ad8 [xfs]xfs_inactive+0x110 (0xc1d53570, 0x0)
xfs .text 0xc4873060 0xc48c69c8 0xc48c6e48
0xc48d5558 [xfs]vn_put+0x44 (0xc3547bb4)
xfs .text 0xc4873060 0xc48d5514 0xc48d557c
0xc48d4693 [xfs]linvfs_put_inode+0x17 (0xc3547ac0)
xfs .text 0xc4873060 0xc48d467c 0xc48d4698
0xc014a10b iput+0x2b (0xc3547ac0)
kernel .text 0xc0100000 0xc014a0e0 0xc014a250
0xc014829d prune_dcache+0xb9 (0xd)
[1]more>
kernel .text 0xc0100000 0xc01481e4 0xc0148324
0xc01485e9 shrink_dcache_memory+0x21 (0x5, 0x4)
kernel .text 0xc0100000 0xc01485c8 0xc01485f8
0xc012d8ca refill_inactive+0xe2 (0x4, 0x0, 0x6, 0x4, 0x6)
kernel .text 0xc0100000 0xc012d7e8 0xc012d940
0xc012d9a2 do_try_to_free_pages+0x62 (0x4, 0x0, 0xc1161fb4)
kernel .text 0xc0100000 0xc012d940 0xc012d9c8
0xc012da56 kswapd+0x8e
kernel .text 0xc0100000 0xc012d9c8 0xc012db00
0xc01074cb kernel_thread+0x23
kernel .text 0xc0100000 0xc01074a8 0xc01074d8
------------------
A random process waiting for memory:
------------------
[1]kdb> btp 9759
EBP EIP Function(args)
0xc18c9c74 0xc0112832 schedule+0x416
kernel .text 0xc0100000 0xc011241c 0xc0112a40
0xc012dbbb wakeup_kswapd+0xbb (0x1)
kernel .text 0xc0100000 0xc012db00 0xc012dbd8
0xc012e8ee __alloc_pages+0x246
kernel .text 0xc0100000 0xc012e6a8 0xc012e9a0
0xc012e9b4 __get_free_pages+0x14
kernel .text 0xc0100000 0xc012e9a0 0xc012e9c4
0xc012f0b5 read_swap_cache_async+0x31 (0x1f3f00, 0x1, 0x1f3f00)
kernel .text 0xc0100000 0xc012f084 0xc012f120
0xc0122966 do_swap_page+0x4a (0xc21a4360, 0xc3141960, 0x804b3a0,
0xc23a012c, 0x1f3f00)
kernel .text 0xc0100000 0xc012291c 0xc0122a88
0xc0122d9b handle_mm_fault+0x143 (0xc21a4360, 0xc3141960, 0x804b3a0,
0x0, 0xc18c8000)
kernel .text 0xc0100000 0xc0122c58 0xc0122e00
0xc0111937 do_page_fault+0x14f (0xc18c9df4, 0x0, 0x0, 0x380,
0x804c1a0)
kernel .text 0xc0100000 0xc01117e8 0xc0111c00
0xc01090d8 error_code+0x34
kernel .text 0xc0100000 0xc01090a4 0xc01090e0
Interrupt registers:
eax = 0x00000000 ebx = 0x00000000 ecx = 0x00000380 edx = 0x0804c1a0
esi = 0x0804b3a0 edi = 0xc04dc200 esp = 0xc18c9e28 eip = 0xc01e97dc
[1]more>
ebp = 0x00000000 xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010202
xds = 0x08040018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc18c9df4
0xc01e97dc __generic_copy_from_user+0x30 (0xc04dc200, 0x804b3a0,
0xe00)
kernel .text 0xc0100000 0xc01e97ac 0xc01e97e8
0xc4865464 [pagebuf]pagebuf_generic_file_write_Rsmp_1a301b89+0x344
(0xc1f03d80, 0x804b3a0, 0x1000, 0xc18c9f88,
0xc18c9f3c)
pagebuf .text 0xc4860060 0xc4865120 0xc486560c
0xc48d026e [xfs]xfs_rdwr+0x6e (0xc3bd9bd4, 0xc1f03d80, 0x804b3a0,
0x1000, 0xc18c9f88)
xfs .text 0xc4873060 0xc48d0200 0xc48d027c
0xc48d111f [xfs]xfs_write+0x19b (0xc3bd9bd4, 0xc18c9f7c, 0x0, 0x0,
0x0)
xfs .text 0xc4873060 0xc48d0f84 0xc48d11f8
0xc48cd727 [xfs]linvfs_write+0xf3 (0xc1f03d80, 0x804b3a0, 0x1000,
0xc1f03da0)
xfs .text 0xc4873060 0xc48cd634 0xc48cd750
0xc013401a sys_write+0x8e (0xb, 0x804b3a0, 0x1000, 0x38, 0x10c3)
kernel .text 0xc0100000 0xc0133f8c 0xc0134050
0xc0108fa3 system_call+0x33
kernel .text 0xc0100000 0xc0108f70 0xc0108fa8
---------------------
ananth.
|