Hello, i have heavy-traffic server that is crashing every few days. When
it crashes i cannot login through ssh and no services are working. One
time it 'crashed' when i was logged in though (i had luck), and i saw
'Input/Output Error' when this happened as i tried to run any command
(like ps, ls or anything)
it is RAID 0 array made from two sata drives.
it happened yet another time today, after hard reset. machine was only
responding to pings, all other actions was not possible, it was likely
in this Input/Output error mode.
i saw this in logs:
Apr 7 10:24:52 alpha324 kernel: [<c0134b56>] find_get_pages_tag+0x46/0x90
Apr 7 10:24:52 alpha324 kernel: [<c025eb02>] linvfs_writepage+0x72/0x130
Apr 7 10:24:52 alpha324 kernel: [<c025ea90>] linvfs_writepage+0x0/0x130
Apr 7 10:24:52 alpha324 kernel: [<c0178c1c>] mpage_writepages+0x25c/0x440
Apr 7 10:24:52 alpha324 kernel: [<c0239761>] xfs_iflush+0x371/0x4e0
Apr 7 10:24:52 alpha324 kernel: [<c025ea90>] linvfs_writepage+0x0/0x130
Apr 7 10:24:52 alpha324 kernel: [<c013b119>] do_writepages+0x39/0x40
Apr 7 10:24:52 alpha324 kernel: [<c0176e15>]
__sync_single_inode+0x65/0x240
Apr 7 10:24:52 alpha324 kernel: [<c0177036>]
__writeback_single_inode+0x46/0x180
Apr 7 10:24:52 alpha324 kernel: [<c017733e>] sync_sb_inodes+0x1ce/0x2b0
Apr 7 10:24:52 alpha324 kernel: [<c017746d>] writeback_inodes+0x4d/0xa0
Apr 7 10:24:52 alpha324 kernel: [<c013aeb5>] wb_kupdate+0xb5/0x130
Apr 7 10:24:52 alpha324 kernel: [<c013b8b0>] pdflush+0x0/0x30
Apr 7 10:24:52 alpha324 kernel: [<c013b80d>] __pdflush+0x9d/0x140
Apr 7 10:24:52 alpha324 kernel: [<c013b8d8>] pdflush+0x28/0x30
Apr 7 10:24:52 alpha324 kernel: [<c013ae00>] wb_kupdate+0x0/0x130
Apr 7 10:24:52 alpha324 kernel: [<c01281b6>] kthread+0xb6/0xc0
Apr 7 10:24:52 alpha324 kernel: [<c0128100>] kthread+0x0/0xc0
Apr 7 10:24:52 alpha324 kernel: [<c0101009>] kernel_thread_helper+0x5/0xc
Apr 7 10:24:52 alpha324 kernel: XFS internal error
XFS_WANT_CORRUPTED_RETURN at line 298 of file fs/xfs/xfs_alloc.c.
Caller 0xc01f5091
Apr 7 10:24:52 alpha324 kernel: [<c01f42aa>]
xfs_alloc_fixup_trees+0x2ba/0x420
Apr 7 10:24:52 alpha324 kernel: [<c01f5091>]
xfs_alloc_ag_vextent_near+0x871/0xc80
Apr 7 10:24:52 alpha324 kernel: [<c0216658>]
xfs_btree_init_cursor+0x38/0x1d0
Apr 7 10:24:52 alpha324 kernel: [<c01f5091>]
xfs_alloc_ag_vextent_near+0x871/0xc80
Apr 7 10:24:52 alpha324 kernel: [<c01f454d>]
xfs_alloc_ag_vextent+0x7d/0x110
Apr 7 10:24:52 alpha324 kernel: [<c01f71aa>] xfs_alloc_vextent+0x25a/0x590
Apr 7 10:24:52 alpha324 kernel: [<c0208f40>] xfs_bmap_alloc+0x13f0/0x1a00
Apr 7 10:24:52 alpha324 kernel: [<c0280bb0>] kobject_release+0x0/0x10
Apr 7 10:24:52 alpha324 kernel: [<c02ee5f4>] scsi_finish_command+0x24/0xb0
Apr 7 10:24:52 alpha324 kernel: [<c020b0d5>]
xfs_bmap_do_search_extents+0xe5/0x470
This is longer, but it just mainly repeats itself. (at least it looks
like to me, if you want full output, please let me know)
When previous crashes happened, i ran xfs_repair and i thought it will
help, but apparently it didnt. Of course i'm going to run it anyways at
night, but i doubt this will help this time.
i'm using kernel 2.6.15.7 but i was also using 2.6.14 kernels and
2.6.16.1 for just a test few days ago, and that didnt help.
my xfs system is mounted like this:
/dev/md0 on / type xfs (rw,noatime)
on this server traffic is heavy, but not in terms of number of MB/s. It
is just like constant 2-3 MB/s.
It is rather number of I/O request heavy - i have like 200 apaches
running constantly, many pure-ftpds, postfix, mysql and such.
Although LA is usually around 2-3 maximum.
Is it possible that this is some kind of XFS bug?
(i don't have this list subscribed, if you dont mind replying to my mail...)
thanks in advance and please let me know if you need any more info
|