xfs
[Top] [All Lists]

server crashing

To: linux-xfs@xxxxxxxxxxx
Subject: server crashing
From: Artur Makówka <juice@xxxxxxxxxxxxx>
Date: Fri, 07 Apr 2006 10:49:53 +0200
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5 (Windows/20051201)
Hello, i have heavy-traffic server that is crashing every few days. When it crashes i cannot login through ssh and no services are working. One time it 'crashed' when i was logged in though (i had luck), and i saw 'Input/Output Error' when this happened as i tried to run any command (like ps, ls or anything)

it is RAID 0 array made from two sata drives.

it happened yet another time today, after hard reset. machine was only responding to pings, all other actions was not possible, it was likely in this Input/Output error mode.
i saw this in logs:

Apr  7 10:24:52 alpha324 kernel:  [<c0134b56>] find_get_pages_tag+0x46/0x90
Apr  7 10:24:52 alpha324 kernel:  [<c025eb02>] linvfs_writepage+0x72/0x130
Apr  7 10:24:52 alpha324 kernel:  [<c025ea90>] linvfs_writepage+0x0/0x130
Apr  7 10:24:52 alpha324 kernel:  [<c0178c1c>] mpage_writepages+0x25c/0x440
Apr  7 10:24:52 alpha324 kernel:  [<c0239761>] xfs_iflush+0x371/0x4e0
Apr  7 10:24:52 alpha324 kernel:  [<c025ea90>] linvfs_writepage+0x0/0x130
Apr  7 10:24:52 alpha324 kernel:  [<c013b119>] do_writepages+0x39/0x40
Apr 7 10:24:52 alpha324 kernel: [<c0176e15>] __sync_single_inode+0x65/0x240 Apr 7 10:24:52 alpha324 kernel: [<c0177036>] __writeback_single_inode+0x46/0x180
Apr  7 10:24:52 alpha324 kernel:  [<c017733e>] sync_sb_inodes+0x1ce/0x2b0
Apr  7 10:24:52 alpha324 kernel:  [<c017746d>] writeback_inodes+0x4d/0xa0
Apr  7 10:24:52 alpha324 kernel:  [<c013aeb5>] wb_kupdate+0xb5/0x130
Apr  7 10:24:52 alpha324 kernel:  [<c013b8b0>] pdflush+0x0/0x30
Apr  7 10:24:52 alpha324 kernel:  [<c013b80d>] __pdflush+0x9d/0x140
Apr  7 10:24:52 alpha324 kernel:  [<c013b8d8>] pdflush+0x28/0x30
Apr  7 10:24:52 alpha324 kernel:  [<c013ae00>] wb_kupdate+0x0/0x130
Apr  7 10:24:52 alpha324 kernel:  [<c01281b6>] kthread+0xb6/0xc0
Apr  7 10:24:52 alpha324 kernel:  [<c0128100>] kthread+0x0/0xc0
Apr  7 10:24:52 alpha324 kernel:  [<c0101009>] kernel_thread_helper+0x5/0xc
Apr 7 10:24:52 alpha324 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN at line 298 of file fs/xfs/xfs_alloc.c. Caller 0xc01f5091 Apr 7 10:24:52 alpha324 kernel: [<c01f42aa>] xfs_alloc_fixup_trees+0x2ba/0x420 Apr 7 10:24:52 alpha324 kernel: [<c01f5091>] xfs_alloc_ag_vextent_near+0x871/0xc80 Apr 7 10:24:52 alpha324 kernel: [<c0216658>] xfs_btree_init_cursor+0x38/0x1d0 Apr 7 10:24:52 alpha324 kernel: [<c01f5091>] xfs_alloc_ag_vextent_near+0x871/0xc80 Apr 7 10:24:52 alpha324 kernel: [<c01f454d>] xfs_alloc_ag_vextent+0x7d/0x110
Apr  7 10:24:52 alpha324 kernel:  [<c01f71aa>] xfs_alloc_vextent+0x25a/0x590
Apr  7 10:24:52 alpha324 kernel:  [<c0208f40>] xfs_bmap_alloc+0x13f0/0x1a00
Apr  7 10:24:52 alpha324 kernel:  [<c0280bb0>] kobject_release+0x0/0x10
Apr  7 10:24:52 alpha324 kernel:  [<c02ee5f4>] scsi_finish_command+0x24/0xb0
Apr 7 10:24:52 alpha324 kernel: [<c020b0d5>] xfs_bmap_do_search_extents+0xe5/0x470

This is longer, but it just mainly repeats itself. (at least it looks like to me, if you want full output, please let me know)

When previous crashes happened, i ran xfs_repair and i thought it will help, but apparently it didnt. Of course i'm going to run it anyways at night, but i doubt this will help this time.

i'm using kernel 2.6.15.7 but i was also using 2.6.14 kernels and 2.6.16.1 for just a test few days ago, and that didnt help.

my xfs system is mounted like this:

/dev/md0 on / type xfs (rw,noatime)

on this server traffic is heavy, but not in terms of number of MB/s. It is just like constant 2-3 MB/s.

It is rather number of I/O request heavy - i have like 200 apaches running constantly, many pure-ftpds, postfix, mysql and such.

Although LA is usually around 2-3 maximum.

Is it possible that this is some kind of XFS bug?

(i don't have this list subscribed, if you dont mind replying to my mail...)

thanks in advance and please let me know if you need any more info


<Prev in Thread] Current Thread [Next in Thread>