David Chinner napisal(a):
On Fri, Apr 07, 2006 at 10:49:53AM +0200, Artur Makówka wrote:
Hello, i have heavy-traffic server that is crashing every few days. When
it crashes i cannot login through ssh and no services are working. One
time it 'crashed' when i was logged in though (i had luck), and i saw
'Input/Output Error' when this happened as i tried to run any command
(like ps, ls or anything)
It's not crashing, a filesystem has shut down....
it is RAID 0 array made from two sata drives.
Any I/O errors in the logs? i.e. is it a SATA issue and XFS is
shutting down to protect itself?
no, no I/O errors, HDs seems to be fine
my xfs system is mounted like this:
/dev/md0 on / type xfs (rw,noatime)
Well, that explains why you can't log in - your root filesystem has
shutdown. You need to separate your root filesystem from the data
filesystem so that when the data filesystem has a problem it doesn't
take the entire machine down (as you are currently experiencing).
oh, i didnt know that. i have to find a cause for this though.
thanks in advance and please let me know if you need any more info
If there are no I/O errors being reported before the filesystem shuts down,
can you provide more information of the type of I/O the system is executing
when the shutdown occurs?
I see many similar output to one i already posted, but it happened just
AFTER first sucessful mount. the one output i'm pasting right now is ( i
think) from just BEFORE crash. Also, there is nothing particular the
server is doing durning that time. Durning the time of last 2 crashes it
was refreshing awstats for every account in the system, so doing
awstats.pl on the list of accounts. But it 'crashed' many times also
durning the day - when awstats was not running. From the 'after' logs
i dont see why this shows: "Apr 11 09:47:53 alpha324 kernel: XFS
internal error XFS_WANT_CORRUPTED_RETURN at line 298 of file
fs/xfs/xfs_alloc.c. Caller 0xc01f5091"
what does it mean, and why xfs_repair didnt repaired it ?
Ok, this is output i got just before crash (at least i think it's
before), and the one from file i'm attaching is after crash.
Apr 11 02:11:16 alpha324 kernel: c0134b03
Apr 11 02:11:16 alpha324 kernel: Modules linked in:
Apr 11 02:11:16 alpha324 kernel: CPU: 0
Apr 11 02:11:16 alpha324 kernel: EIP: 0060:[<c0134b03>] Not
tainted VLI
Apr 11 02:11:16 alpha324 kernel: EFLAGS: 00010002 (2.6.15.7)
Apr 11 02:11:16 alpha324 kernel: EIP is at find_get_pages+0x53/0x60
Apr 11 02:11:16 alpha324 kernel: eax: 80010028 ebx: 00000001 ecx:
c2affe88 edx: 20090000
Apr 11 02:11:16 alpha324 kernel: esi: 00000002 edi: 0000004f ebp:
c2affe7c esp: c2affe34
Apr 11 02:11:16 alpha324 kernel: ds: 007b es: 007b ss: 0068
Apr 11 02:11:16 alpha324 kernel: Process kswapd0 (pid: 71,
threadinfo=c2afe000 task=c2ac50b0)
Apr 11 02:11:16 alpha324 kernel: Stack: e7fea7c0 c2affe84 00000000
0000000e c2affe7c 00000000 c013f1fb e7fea7bc
Apr 11 02:11:16 alpha324 kernel: 00000000 0000000e c2affe84
e7fea724 c013f687 c2affe7c e7fea7bc 00000000
Apr 11 02:11:16 alpha324 kernel: 0000000e 00000000 00000000
00000000 c13ce900 20090000 c2440e20 c17437c0 Apr 11 02:11:16 alpha324
kernel: Call Trace:
Apr 11 02:11:16 alpha324 kernel: [<c013f1fb>] pagevec_lookup+0x2b/0x40
Apr 11 02:11:16 alpha324 kernel: [<c013f687>]
invalidate_mapping_pages+0xa7/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c013f6ef>]
invalidate_inode_pages+0x1f/0x30
Apr 11 02:11:16 alpha324 kernel: [<c016d763>] prune_icache+0x1a3/0x1b0
Apr 11 02:11:16 alpha324 kernel: [<c016d7b5>]
shrink_icache_memory+0x45/0x50
Apr 11 02:11:16 alpha324 kernel: [<c013fc36>] shrink_slab+0x136/0x1c0
Apr 11 02:11:16 alpha324 kernel: [<c0140f12>] balance_pgdat+0x222/0x400
Apr 11 02:11:16 alpha324 kernel: [<c01411a4>] kswapd+0xb4/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c01285e0>]
autoremove_wake_function+0x0/0x60
Apr 11 02:11:16 alpha324 kernel: [<c01410f0>] kswapd+0x0/0xf0
Apr 11 02:11:16 alpha324 kernel: [<c0101009>] kernel_thread_helper+0x5/0xc
Apr 11 02:11:16 alpha324 kernel: Code: e8 e3 d9 14 00 85 c0 89 c6 75 0d
fb 83 c4 10 89 f0 5b 5e c3 8d 74 26 00 89 d9 31 db eb 0b ff 42 04 43 83
c1 04 39 de 74 e2 8b 11 <8b> 02 f6 c4 40 74 ec 8b 52 0c eb e7 90 83 ec
24 89 7c 24 1c 89
Cheers,
Dave.
kern.log.bz2
Description: Binary data
|