One of our production XFS servers went down today. The symptoms it was
exhibiting were that disk access stopped causing load-average to climb.
It came back after a reboot and seems to be running fine now.
Any advice gratefully received.
The Oops and stack-trace follow:
Oct 25 14:44:24 clark kernel: Call Trace: [xfs_alloc_lookup_ge+32/40] [xfs_alloc
_ag_vextent_near+93/2628] [xfs_alloc_ag_vextent+54/232] [xfs_alloc_vextent+910/1
084] [xfs_bmap_alloc+8066/8960]
Oct 25 14:44:24 clark kernel: Call Trace: [<c0178b68>] [<c017364d>] [<c01733c6>]
[<c017573e>] [<c018533e>]
Oct 25 14:44:24 clark kernel: [<c01ce929>] [<c01912e3>] [<c01912e3>] [<c01877
03>] [<c01893d9>] [<c01d6878>]
Oct 25 14:44:24 clark kernel: [<c01b55bf>] [<c01d86e4>] [<c01d6878>] [<c013db
ff>] [<c0132271>] [<c0132b80>]
Oct 25 14:44:24 clark kernel: [<c01d68fc>] [<c01d6878>] [<c01d2c21>] [<c01d1d
27>] [<c01d6878>] [<c01d6adf>]
Oct 25 14:44:24 clark kernel: [<c01d6878>] [<c013a849>] [<c013aa3c>] [<c01167
e6>] [<c013e014>] [<c0105000>]
Oct 25 14:44:24 clark kernel: [<c01057db>]
Oct 25 14:44:24 clark kernel: Code: 8b 52 30 89 54 24 58 51 55 8b 44 24 60 50 8b
54 24 78 52 e8
>>EIP; c0176fe9 <xfs_alloc_lookup+149/394> <=====
Trace; c0178b68 <xfs_alloc_lookup_ge+20/28>
Trace; c017364d <xfs_alloc_ag_vextent_near+5d/a44>
Trace; c01733c6 <xfs_alloc_ag_vextent+36/e8>
Trace; c017573e <xfs_alloc_vextent+38e/43c>
Trace; c018533e <xfs_bmap_alloc+1f82/2300>
Trace; c01ce929 <avl_remove+cd/dc>
Trace; c01912e3 <xfs_bmbt_get_state+33/3c>
Trace; c01912e3 <xfs_bmbt_get_state+33/3c>
Trace; c0187703 <xfs_bmap_do_search_extents+39b/3c4>
Trace; c01893d9 <xfs_bmapi+8c1/131c>
Trace; c01d6878 <linvfs_pb_bmap+0/e8>
Trace; c01b55bf <xlog_grant_push_ail+47/174>
Trace; c01d86e4 <xfs_strategy+648/8dc>
Trace; c01d6878 <linvfs_pb_bmap+0/e8>
Trace; c013dbff <try_to_free_buffers+c3/14c>
Trace; c0132271 <try_to_free_pages+35/54>
Trace; c0132b80 <balance_classzone+68/184>
Trace; c01d68fc <linvfs_pb_bmap+84/e8>
Trace; c01d6878 <linvfs_pb_bmap+0/e8>
Trace; c01d2c21 <pagebuf_delalloc_convert+65/c4>
Trace; c01d1d27 <pagebuf_write_full_page+8b/cc>
Trace; c01d6878 <linvfs_pb_bmap+0/e8>
Trace; c01d6adf <linvfs_write_full_page+43/6c>
Trace; c01d6878 <linvfs_pb_bmap+0/e8>
Trace; c013a849 <write_buffer_delay+69/70>
Trace; c013aa3c <write_some_buffers+6c/10c>
Trace; c01167e6 <schedule+4ca/584>
Trace; c013e014 <bdflush+8c/c4>
Trace; c0105000 <_stext+0/0>
Trace; c01057db <kernel_thread+23/30>
Code; c0176fe9 <xfs_alloc_lookup+149/394>
00000000 <_EIP>:
Code; c0176fe9 <xfs_alloc_lookup+149/394> <=====
0: 8b 52 30 mov 0x30(%edx),%edx <=====
Code; c0176fec <xfs_alloc_lookup+14c/394>
3: 89 54 24 58 mov %edx,0x58(%esp,1)
Code; c0176ff0 <xfs_alloc_lookup+150/394>
7: 51 push %ecx
Code; c0176ff1 <xfs_alloc_lookup+151/394>
8: 55 push %ebp
Code; c0176ff2 <xfs_alloc_lookup+152/394>
9: 8b 44 24 60 mov 0x60(%esp,1),%eax
Code; c0176ff6 <xfs_alloc_lookup+156/394>
d: 50 push %eax
Code; c0176ff7 <xfs_alloc_lookup+157/394>
e: 8b 54 24 78 mov 0x78(%esp,1),%edx
Code; c0176ffb <xfs_alloc_lookup+15b/394>
12: 52 push %edx
Code; c0176ffc <xfs_alloc_lookup+15c/394>
13: e8 00 00 00 00 call 18 <_EIP+0x18> c0177001 <xfs_alloc_look
up+161/394>
--
| Huw Lynes | The Moving Picture Company |
| System Administrator | 127 Wardour Street |
|.........................| London, W1F 0NL |
|