xfs
[Top] [All Lists]

Re: server crashing

To: David Chinner <dgc@xxxxxxx>
Subject: Re: server crashing
From: Artur Makówka <juice@xxxxxxxxxxxxx>
Date: Tue, 11 Apr 2006 09:55:20 +0200
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20060410015916.GK2732@melbourne.sgi.com>
References: <443627B1.5090100@ursynow.2a.pl> <20060410015916.GK2732@melbourne.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 1.5 (Windows/20051201)
David Chinner napisal(a):
On Fri, Apr 07, 2006 at 10:49:53AM +0200, Artur Makówka wrote:
Hello, i have heavy-traffic server that is crashing every few days. When it crashes i cannot login through ssh and no services are working. One time it 'crashed' when i was logged in though (i had luck), and i saw 'Input/Output Error' when this happened as i tried to run any command (like ps, ls or anything)

It's not crashing, a filesystem has shut down....

it is RAID 0 array made from two sata drives.

Any I/O errors in the logs? i.e. is it a SATA issue and XFS is shutting down to protect itself?

no, no I/O errors, HDs seems to be fine


my xfs system is mounted like this:

/dev/md0 on / type xfs (rw,noatime)

Well, that explains why you can't log in - your root filesystem has shutdown. You need to separate your root filesystem from the data filesystem so that when the data filesystem has a problem it doesn't take the entire machine down (as you are currently experiencing).

oh, i didnt know that. i have to find a cause for this though.


thanks in advance and please let me know if you need any more info

If there are no I/O errors being reported before the filesystem shuts down, can you provide more information of the type of I/O the system is executing when the shutdown occurs?

I see many similar output to one i already posted, but it happened just AFTER first sucessful mount. the one output i'm pasting right now is ( i think) from just BEFORE crash. Also, there is nothing particular the server is doing durning that time. Durning the time of last 2 crashes it was refreshing awstats for every account in the system, so doing awstats.pl on the list of accounts. But it 'crashed' many times also durning the day - when awstats was not running. From the 'after' logs i dont see why this shows: "Apr 11 09:47:53 alpha324 kernel: XFS internal error XFS_WANT_CORRUPTED_RETURN at line 298 of file fs/xfs/xfs_alloc.c. Caller 0xc01f5091"


what does it mean, and why xfs_repair didnt repaired it ?

Ok, this is output i got just before crash (at least i think it's before), and the one from file i'm attaching is after crash.

Apr 11 02:11:16 alpha324 kernel: c0134b03
Apr 11 02:11:16 alpha324 kernel: Modules linked in:
Apr 11 02:11:16 alpha324 kernel: CPU: 0
Apr 11 02:11:16 alpha324 kernel: EIP: 0060:[<c0134b03>] Not tainted VLI
Apr 11 02:11:16 alpha324 kernel: EFLAGS: 00010002 (2.6.15.7)
Apr 11 02:11:16 alpha324 kernel: EIP is at find_get_pages+0x53/0x60
Apr 11 02:11:16 alpha324 kernel: eax: 80010028 ebx: 00000001 ecx: c2affe88 edx: 20090000
Apr 11 02:11:16 alpha324 kernel: esi: 00000002 edi: 0000004f ebp: c2affe7c esp: c2affe34
Apr 11 02:11:16 alpha324 kernel: ds: 007b es: 007b ss: 0068
Apr 11 02:11:16 alpha324 kernel: Process kswapd0 (pid: 71, threadinfo=c2afe000 task=c2ac50b0)
Apr 11 02:11:16 alpha324 kernel: Stack: e7fea7c0 c2affe84 00000000 0000000e c2affe7c 00000000 c013f1fb e7fea7bc
Apr 11 02:11:16 alpha324 kernel: 00000000 0000000e c2affe84 e7fea724 c013f687 c2affe7c e7fea7bc 00000000
Apr 11 02:11:16 alpha324 kernel: 0000000e 00000000 00000000 00000000 c13ce900 20090000 c2440e20 c17437c0 Apr 11 02:11:16 alpha324 kernel: Call Trace:
Apr 11 02:11:16 alpha324 kernel: [<c013f1fb>] pagevec_lookup+0x2b/0x40
Apr 11 02:11:16 alpha324 kernel: [<c013f687>] invalidate_mapping_pages+0xa7/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c013f6ef>] invalidate_inode_pages+0x1f/0x30
Apr 11 02:11:16 alpha324 kernel: [<c016d763>] prune_icache+0x1a3/0x1b0
Apr 11 02:11:16 alpha324 kernel: [<c016d7b5>] shrink_icache_memory+0x45/0x50
Apr 11 02:11:16 alpha324 kernel: [<c013fc36>] shrink_slab+0x136/0x1c0
Apr 11 02:11:16 alpha324 kernel: [<c0140f12>] balance_pgdat+0x222/0x400
Apr 11 02:11:16 alpha324 kernel: [<c01411a4>] kswapd+0xb4/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c01285e0>] autoremove_wake_function+0x0/0x60
Apr 11 02:11:16 alpha324 kernel: [<c01410f0>] kswapd+0x0/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c0101009>] kernel_thread_helper+0x5/0xc
Apr 11 02:11:16 alpha324 kernel: Code: e8 e3 d9 14 00 85 c0 89 c6 75 0d fb 83 c4 10 89 f0 5b 5e c3 8d 74 26 00 89 d9 31 db eb 0b ff 42 04 43 83 c1 04 39 de 74 e2 8b 11 <8b> 02 f6 c4 40 74 ec 8b 52 0c eb e7 90 83 ec 24 89 7c 24 1c 89




Cheers,

Dave.

Attachment: kern.log.bz2
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>