Quoting Stephen Lord <lord@xxxxxxx>:
> On Thu, 2002-10-10 at 17:31, I.D.Hardy@xxxxxxxxxxx wrote:
> >
> > On 10 Oct 2002 12:23:43 -0500: Steve Lord Wrote:
> >
> > >
> > > Ian, did you run xfs_check and repair before mounting the fs or
> > > after? You should mount again after reboot, then run check.
> > > The in memory corruption error means it failed an internal check
> > > on a memory buffer, not that it had found bad data on the disk.
> > >
> > > If you have the check/repair output, please send it, but there is
> > > no real way to tell if the issues in it were from running the
> > > commands with a dirty log or not.
> > >
> > > We really need to improve that particular trace message, there
> > > are 80 some places it could have originated. Lets see if we
> > > can do something about that.
> > >
> > > Steve
> > >
> > >
> >
> > Steve, thanks as always for your reply. Yes, the server was rebooted,
> and the
> > filesystem mounted and unmounted prior to running xfs_check/repair.
> Also,
> > a 'xfs_fsr' was run when we upgraded the kernel (on an idle system),
> the system
> > was then rebooted and 'xfs_check' ran on the filesystem (clean) before
> the
> > server was but back into service (and therefore prior to the 2
> filesystem
> > shutdowns); I can therefore be confident that any FS corruption
> occurred while
> > running the new (Wed 8th) XFS CVS kernel.
> >
> > Here's the output from the xfs_check/xfs_repair runs following the
> first
> > filesystem shutdown (xfs_clean was clean following the 2nd
> shutdown).
> >
> > ..... (sorry I missed capturing the top of this xfs_check session,
> though
> > I think this was near the start).
>
> Hmm, you have ascii data on top of inodes by the look of it. So this
> looks like a rogue write. You have 64 inodes here which look like they
> were completely overwritten. Working out where this came from is key
> here. I am not aware of anything which changed recently which might
> relate to this.
>
> Steve
> >
Steve, may be coincidence, but reverted back to a slightly earlier 2.4.19 XFS-
CVS (above shutdown/xfs_repair output from 'SGI XFS CVS-10/08/02:05 with quota,
no debug enabled' reverted to 'SGI XFS CVS-09/15/02:17 with quota, no debug
enabled) and the server seems to have reverted to kernel panics/crashes. Have
just got the following Ooops dump from it:
Oct 11 13:01:21 blue01 kernel: kernel BUG at slab.c:1263!
Oct 11 13:01:21 blue01 kernel: invalid operand: 0000
Oct 11 13:01:21 blue01 kernel: CPU: 0
Oct 11 13:01:21 blue01 kernel: EIP: 0010:[<c0132c26>] Not tainted
Oct 11 13:01:21 blue01 kernel: EOct 11 13:01:21 blue01 kernel: kernel BUG at
slab.c:1263!
Oct 11 13:01:21 blue01 kernel: invalid operand: 0000
Oct 11 13:01:21 blue01 kernel: CPU: 0
Oct 11 13:01:21 blue01 kernel: EIP: 0010:[<c0132c26>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
FLAGS: 00010013
Oct 11 13:01:21 blue01 kernel: eax: c8cf364b ebx: c8cf3628 ecx: 00000020
edx: c8cf362c
Oct 11 13:01:21 blue01 kernel: esi: 00020c00 edi: 00000000 ebp: c1c0f060
esp: f7eaddf0
Oct 11 13:01:21 blue01 kernel: ds: 0018 es: 0018 ss: 0018
Oct 11 13:01:21 blue01 kernel: Process pagebuf_io_CPU0 (pid: 9,
stackpage=f7ead000)
Oct 11 13:01:21 blue01 kernel: Stack: c8cf362c 00000020 00000000 00000040
c1c0f060 00000000 000000f0 c1c0f060
Oct 11 13:01:21 blue01 kernel: c0133192 c1c0f060 f7edf000 000000f0
00000246 c56e3cf4 00000000 0002cd1a
Oct 11 13:01:21 blue01 kernel: 00000246 d051146c 00000000 d05114cc
00000001 c01e4f64 0000000c 000000f0
Oct 11 13:01:21 blue01 kernel: Call Trace: [<c0133192>] [<c01e4f64>]
[<c01d973f>] [<c01e51dd>] [<c01e564c>]
Oct 11 13:01:21 blue01 kernel: [<c01e52d9>] [<c01cce44>] [<c01cd6c0>]
[<c01ced10>] [<c011f25d>] [<c01e58e5>]
Oct 11 13:01:21 blue01 kernel: [<c0107296>] [<c01e5770>]
Oct 11 13:01:21 blue01 kernel:
Oct 11 13:01:21 blue01 kernel: Code: 0f 0b ef 04 a0 54 2b c0 81 e6 00 04 00 00
74 37 b8 a5 c2 0f
Oct 11 13:01:21 blue01 kernel: EFLAGS: 00010013
Oct 11 13:01:21 blue01 kernel: eax: c8cf364b ebx: c8cf3628 ecx: 00000020
edx: c8cf362c
Oct 11 13:01:21 blue01 kernel: esi: 00020c00 edi: 00000000 ebp: c1c0f060
esp: f7eaddf0
Oct 11 13:01:21 blue01 kernel: ds: 0018 es: 0018 ss: 0018
Oct 11 13:01:21 blue01 kernel: Process pagebuf_io_CPU0 (pid: 9,
stackpage=f7ead000)
Oct 11 13:01:21 blue01 kernel: Stack: c8cf362c 00000020 00000000 00000040
c1c0f060 00000000 000000f0 c1c0f060
Oct 11 13:01:21 blue01 kernel: c0133192 c1c0f060 f7edf000 000000f0
00000246 c56e3cf4 00000000 0002cd1a
Oct 11 13:01:21 blue01 kernel: 00000246 d051146c 00000000 d05114cc
00000001 c01e4f64 0000000c 000000f0
Oct 11 13:01:21 blue01 kernel: Call Trace: [<c0133192>] [<c01e4f64>]
[<c01d973f>] [<c01e51dd>] [<c01e564c>]
Oct 11 13:01:21 blue01 kernel: [<c01e52d9>] [<c01cce44>] [<c01cd6c0>]
[<c01ced10>] [<c011f25d>] [<c01e58e5>]
Oct 11 13:01:21 blue01 kernel: [<c0107296>] [<c01e5770>]
Oct 11 13:01:21 blue01 kernel: Code: 0f 0b ef 04 a0 54 2b c0 81 e6 00 04 00 00
74 37 b8 a5 c2 0f
>>EIP; c0132c26 <kmem_cache_alloc_batch+f6/1a0> <=====
Trace; c0133192 <kmalloc+b2/250>
Trace; c01e4f64 <_pagebuf_page_io+304/450>
Trace; c01d973e <xfs_trans_chunk_committed+1be/1f4>
Trace; c01e51dc <_page_buf_page_apply+12c/140>
Trace; c01e564c <_pagebuf_segment_apply+ac/110>
Trace; c01e52d8 <pagebuf_iorequest+e8/140>
Trace; c01cce44 <xlog_bdstrat_cb+14/40>
Trace; c01cd6c0 <xlog_sync+210/400>
Trace; c01ced10 <xlog_sync_sched+10/20>
Trace; c011f25c <__run_task_queue+5c/70>
Trace; c01e58e4 <pagebuf_iodone_daemon+174/1c0>
Trace; c0107296 <kernel_thread+26/30>
Trace; c01e5770 <pagebuf_iodone_daemon+0/1c0>
Code; c0132c26 <kmem_cache_alloc_batch+f6/1a0>
00000000 <_EIP>:
Code; c0132c26 <kmem_cache_alloc_batch+f6/1a0> <=====
0: 0f 0b ud2a <=====
Code; c0132c28 <kmem_cache_alloc_batch+f8/1a0>
2: ef out %eax,(%dx)
Code; c0132c28 <kmem_cache_alloc_batch+f8/1a0>
3: 04 a0 add $0xa0,%al
Code; c0132c2a <kmem_cache_alloc_batch+fa/1a0>
5: 54 push %esp
Code; c0132c2c <kmem_cache_alloc_batch+fc/1a0>
6: 2b c0 sub %eax,%eax
Code; c0132c2e <kmem_cache_alloc_batch+fe/1a0>
8: 81 e6 00 04 00 00 and $0x400,%esi
Code; c0132c34 <kmem_cache_alloc_batch+104/1a0>
e: 74 37 je 47 <_EIP+0x47> c0132c6c
<kmem_cache_alloc_batch+13c/1a0>
Code; c0132c36 <kmem_cache_alloc_batch+106/1a0>
10: b8 a5 c2 0f 00 mov $0xfc2a5,%eax
Though this is different to previous/recent crashes from the same kernel that
have been more of the form:
Sep 25 20:36:50 blue01 kernel: kernel BUG at slab.c:1439!
Sep 25 20:36:50 blue01 kernel: invalid operand: 0000
Sep 25 20:36:50 blue01 kernel: CPU: 1
Sep 25 20:36:50 blue01 kernel: EIP: 0010:[<c0133a64>] Not tainted
Sep 25 20:36:50 blue01 kernel: ESep 25 20:36:50 blue01 kernel: kernel BUG at
slab.c:1439!
Sep 25 20:36:50 blue01 kernel: invalid operand: 0000
Sep 25 20:36:50 blue01 kernel: CPU: 1
Sep 25 20:36:50 blue01 kernel: EIP: 0010:[<c0133a64>] Not tainted
FLAGS: 00010016
Sep 25 20:36:50 blue01 kernel: eax: 5a2cf071 ebx: 00a59840 ecx: f7edec10
edx: c1c0f060
Sep 25 20:36:50 blue01 kernel: esi: f732c000 edi: f732cf5c ebp: f732cee8
esp: f7eddf48
Sep 25 20:36:50 blue01 kernel: ds: 0018 es: 0018 ss: 0018
Sep 25 20:36:50 blue01 kernel: Process kswapd (pid: 5, stackpage=f7edd000)
Sep 25 20:36:50 blue01 kernel: Stack: 00008842 00000002 f7edec10 f7edec00
00000000 00000002 00000000 00000000
Sep 25 20:36:50 blue01 kernel: 00000000 c1c0f060 00000020 000001d0
00000006 00000000 c0134e59 c036fa08
Sep 25 20:36:50 blue01 kernel: 00000006 000001d0 c036fa08 00000000
c0134f0c 00000020 c036fa08 00000002
Sep 25 20:36:50 blue01 kernel: Call Trace: [<c0134e59>] [<c0134f0c>]
[<c0134fb1>] [<c0135026>] [<c013515d>]
Sep 25 20:36:50 blue01 kernel: [<c0105000>] [<c0107296>] [<c01350c0>]
Sep 25 20:36:50 blue01 kernel:
Sep 25 20:36:50 blue01 kernel: Code: 0f 0b 9f 05 a0 54 2b c0 8b 44 24 24 89 ea
8b 48 18 b8 71 f0
Sep 25 20:36:50 blue01 kernel: EFLAGS: 00010016
Sep 25 20:36:50 blue01 kernel: eax: 5a2cf071 ebx: 00a59840 ecx: f7edec10
edx: c1c0f060
Sep 25 20:36:50 blue01 kernel: esi: f732c000 edi: f732cf5c ebp: f732cee8
esp: f7eddf48
Sep 25 20:36:50 blue01 kernel: ds: 0018 es: 0018 ss: 0018
Sep 25 20:36:50 blue01 kernel: Process kswapd (pid: 5, stackpage=f7edd000)
Sep 25 20:36:50 blue01 kernel: Stack: 00008842 00000002 f7edec10 f7edec00
00000000 00000002 00000000 00000000
Sep 25 20:36:50 blue01 kernel: 00000000 c1c0f060 00000020 000001d0
00000006 00000000 c0134e59 c036fa08
Sep 25 20:36:50 blue01 kernel: 00000006 000001d0 c036fa08 00000000
c0134f0c 00000020 c036fa08 00000002
Sep 25 20:36:50 blue01 kernel: Call Trace: [<c0134e59>] [<c0134f0c>]
[<c0134fb1>] [<c0135026>] [<c013515d>]
Sep 25 20:36:50 blue01 kernel: [<c0105000>] [<c0107296>] [<c01350c0>]
Sep 25 20:36:50 blue01 kernel: Code: 0f 0b 9f 05 a0 54 2b c0 8b 44 24 24 89 ea
8b 48 18 b8 71 f0
>>EIP; c0133a64 <kmem_cache_reap+1c4/490> <=====
Trace; c0134e58 <shrink_caches+18/90>
Trace; c0134f0c <try_to_free_pages+3c/60>
Trace; c0134fb0 <kswapd_balance_pgdat+50/a0>
Trace; c0135026 <kswapd_balance+26/40>
Trace; c013515c <kswapd+9c/b6>
Trace; c0105000 <_stext+0/0>
Trace; c0107296 <kernel_thread+26/30>
Trace; c01350c0 <kswapd+0/b6>
Code; c0133a64 <kmem_cache_reap+1c4/490>
00000000 <_EIP>:
Code; c0133a64 <kmem_cache_reap+1c4/490> <=====
0: 0f 0b ud2a <=====
Code; c0133a66 <kmem_cache_reap+1c6/490>
2: 9f lahf
Code; c0133a66 <kmem_cache_reap+1c6/490>
3: 05 a0 54 2b c0 add $0xc02b54a0,%eax
Code; c0133a6c <kmem_cache_reap+1cc/490>
8: 8b 44 24 24 mov 0x24(%esp,1),%eax
Code; c0133a70 <kmem_cache_reap+1d0/490>
c: 89 ea mov %ebp,%edx
Code; c0133a72 <kmem_cache_reap+1d2/490>
e: 8b 48 18 mov 0x18(%eax),%ecx
Code; c0133a74 <kmem_cache_reap+1d4/490>
11: b8 71 f0 00 00 mov $0xf071,%eax
Regards and thanks
Ian Hardy
Research Services
Information Systems Services
Southampton University
|