xfs
[Top] [All Lists]

Re: Xfs_force_shutdown on recent XFS CVS

To: Stephen Lord <lord@xxxxxxx>
Subject: Re: Xfs_force_shutdown on recent XFS CVS
From: <I.D.Hardy@xxxxxxxxxxx>
Date: Fri, 11 Oct 2002 14:32:25 +0100 (BST)
Cc: I.D.Hardy@xxxxxxxxxxx, linux-xfs@xxxxxxxxxxx, O.G.Parchment@xxxxxxxxxxx
In-reply-to: <1034295549.1073.10.camel@laptop.americas.sgi.com>
References: <E5CC9E66DAF2D411A0D700B0D079331BA994A0@exchange2.soton.ac.uk> <1034270623.1400.162.camel@jen.americas.sgi.com> <1034289105.3da5ffd17e21c@webmail.soton.ac.uk> <1034295549.1073.10.camel@laptop.americas.sgi.com>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: IMP/PHP IMAP webmail program 2.2.6
Quoting Stephen  Lord <lord@xxxxxxx>:

> On Thu, 2002-10-10 at 17:31, I.D.Hardy@xxxxxxxxxxx wrote:
> > 
> > On 10 Oct 2002 12:23:43 -0500: Steve Lord Wrote:
> > 
> > >
> > > Ian, did you run xfs_check and repair before mounting the fs or
> > > after? You should mount again after reboot, then run check.
> > > The in memory corruption error means it failed an internal check
> > > on a memory buffer, not that it had found bad data on the disk.
> > > 
> > > If you have the check/repair output, please send it, but there is
> > > no real way to tell if the issues in it were from running the
> > > commands with a dirty log or not.
> > > 
> > > We really need to improve that particular trace message, there
> > > are 80 some places it could have originated. Lets see if we
> > > can do something about that.
> > > 
> > > Steve
> > > 
> > > 
> > 
> > Steve, thanks as always for your reply. Yes, the server was rebooted,
> and the 
> > filesystem mounted and unmounted prior to running xfs_check/repair.
> Also, 
> > a 'xfs_fsr' was run when we upgraded the kernel (on an idle system),
> the system 
> > was then rebooted and 'xfs_check' ran on the filesystem (clean) before
> the 
> > server was but back into service (and therefore prior to the 2
> filesystem 
> > shutdowns); I can therefore be confident that any FS corruption
> occurred while 
> > running the new (Wed 8th) XFS CVS kernel.
> > 
> > Here's the output from the xfs_check/xfs_repair runs following the
> first 
> > filesystem shutdown (xfs_clean was clean following the 2nd
> shutdown).
> > 
> > ..... (sorry I missed capturing the top of this xfs_check session,
> though
> >        I think this was near the start).
> 
> Hmm, you have ascii data on top of inodes by the look of it. So this
> looks like a rogue write. You have 64 inodes here which look like they
> were completely overwritten. Working out where this came from is key
> here. I am not aware of anything which changed recently which might
> relate to this.
> 
> Steve
> > 

Steve, may be coincidence, but reverted back to a slightly earlier 2.4.19 XFS-
CVS (above shutdown/xfs_repair output from 'SGI XFS CVS-10/08/02:05 with quota, 
no debug enabled' reverted to 'SGI XFS CVS-09/15/02:17 with quota, no debug 
enabled) and the server seems to have reverted to kernel panics/crashes. Have 
just got the following Ooops dump from it: 

Oct 11 13:01:21 blue01 kernel: kernel BUG at slab.c:1263!
Oct 11 13:01:21 blue01 kernel: invalid operand: 0000
Oct 11 13:01:21 blue01 kernel: CPU:    0
Oct 11 13:01:21 blue01 kernel: EIP:    0010:[<c0132c26>]    Not tainted
Oct 11 13:01:21 blue01 kernel: EOct 11 13:01:21 blue01 kernel: kernel BUG at 
slab.c:1263!
Oct 11 13:01:21 blue01 kernel: invalid operand: 0000
Oct 11 13:01:21 blue01 kernel: CPU:    0
Oct 11 13:01:21 blue01 kernel: EIP:    0010:[<c0132c26>]    Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
FLAGS: 00010013
Oct 11 13:01:21 blue01 kernel: eax: c8cf364b   ebx: c8cf3628   ecx: 00000020   
edx: c8cf362c
Oct 11 13:01:21 blue01 kernel: esi: 00020c00   edi: 00000000   ebp: c1c0f060   
esp: f7eaddf0
Oct 11 13:01:21 blue01 kernel: ds: 0018   es: 0018   ss: 0018
Oct 11 13:01:21 blue01 kernel: Process pagebuf_io_CPU0 (pid: 9, 
stackpage=f7ead000)
Oct 11 13:01:21 blue01 kernel: Stack: c8cf362c 00000020 00000000 00000040 
c1c0f060 00000000 000000f0 c1c0f060 
Oct 11 13:01:21 blue01 kernel:        c0133192 c1c0f060 f7edf000 000000f0 
00000246 c56e3cf4 00000000 0002cd1a 
Oct 11 13:01:21 blue01 kernel:        00000246 d051146c 00000000 d05114cc 
00000001 c01e4f64 0000000c 000000f0 
Oct 11 13:01:21 blue01 kernel: Call Trace:    [<c0133192>] [<c01e4f64>] 
[<c01d973f>] [<c01e51dd>] [<c01e564c>]
Oct 11 13:01:21 blue01 kernel:   [<c01e52d9>] [<c01cce44>] [<c01cd6c0>] 
[<c01ced10>] [<c011f25d>] [<c01e58e5>]
Oct 11 13:01:21 blue01 kernel:   [<c0107296>] [<c01e5770>]
Oct 11 13:01:21 blue01 kernel: 
Oct 11 13:01:21 blue01 kernel: Code: 0f 0b ef 04 a0 54 2b c0 81 e6 00 04 00 00 
74 37 b8 a5 c2 0f 
Oct 11 13:01:21 blue01 kernel: EFLAGS: 00010013
Oct 11 13:01:21 blue01 kernel: eax: c8cf364b   ebx: c8cf3628   ecx: 00000020   
edx: c8cf362c
Oct 11 13:01:21 blue01 kernel: esi: 00020c00   edi: 00000000   ebp: c1c0f060   
esp: f7eaddf0
Oct 11 13:01:21 blue01 kernel: ds: 0018   es: 0018   ss: 0018
Oct 11 13:01:21 blue01 kernel: Process pagebuf_io_CPU0 (pid: 9, 
stackpage=f7ead000)
Oct 11 13:01:21 blue01 kernel: Stack: c8cf362c 00000020 00000000 00000040 
c1c0f060 00000000 000000f0 c1c0f060 
Oct 11 13:01:21 blue01 kernel:        c0133192 c1c0f060 f7edf000 000000f0 
00000246 c56e3cf4 00000000 0002cd1a 
Oct 11 13:01:21 blue01 kernel:        00000246 d051146c 00000000 d05114cc 
00000001 c01e4f64 0000000c 000000f0 
Oct 11 13:01:21 blue01 kernel: Call Trace:    [<c0133192>] [<c01e4f64>] 
[<c01d973f>] [<c01e51dd>] [<c01e564c>]
Oct 11 13:01:21 blue01 kernel:   [<c01e52d9>] [<c01cce44>] [<c01cd6c0>] 
[<c01ced10>] [<c011f25d>] [<c01e58e5>]
Oct 11 13:01:21 blue01 kernel:   [<c0107296>] [<c01e5770>]
Oct 11 13:01:21 blue01 kernel: Code: 0f 0b ef 04 a0 54 2b c0 81 e6 00 04 00 00 
74 37 b8 a5 c2 0f 

>>EIP; c0132c26 <kmem_cache_alloc_batch+f6/1a0>   <=====
Trace; c0133192 <kmalloc+b2/250>
Trace; c01e4f64 <_pagebuf_page_io+304/450>
Trace; c01d973e <xfs_trans_chunk_committed+1be/1f4>
Trace; c01e51dc <_page_buf_page_apply+12c/140>
Trace; c01e564c <_pagebuf_segment_apply+ac/110>
Trace; c01e52d8 <pagebuf_iorequest+e8/140>
Trace; c01cce44 <xlog_bdstrat_cb+14/40>
Trace; c01cd6c0 <xlog_sync+210/400>
Trace; c01ced10 <xlog_sync_sched+10/20>
Trace; c011f25c <__run_task_queue+5c/70>
Trace; c01e58e4 <pagebuf_iodone_daemon+174/1c0>
Trace; c0107296 <kernel_thread+26/30>
Trace; c01e5770 <pagebuf_iodone_daemon+0/1c0>
Code;  c0132c26 <kmem_cache_alloc_batch+f6/1a0>
00000000 <_EIP>:
Code;  c0132c26 <kmem_cache_alloc_batch+f6/1a0>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0132c28 <kmem_cache_alloc_batch+f8/1a0>
   2:   ef                        out    %eax,(%dx)
Code;  c0132c28 <kmem_cache_alloc_batch+f8/1a0>
   3:   04 a0                     add    $0xa0,%al
Code;  c0132c2a <kmem_cache_alloc_batch+fa/1a0>
   5:   54                        push   %esp
Code;  c0132c2c <kmem_cache_alloc_batch+fc/1a0>
   6:   2b c0                     sub    %eax,%eax
Code;  c0132c2e <kmem_cache_alloc_batch+fe/1a0>
   8:   81 e6 00 04 00 00         and    $0x400,%esi
Code;  c0132c34 <kmem_cache_alloc_batch+104/1a0>
   e:   74 37                     je     47 <_EIP+0x47> c0132c6c 
<kmem_cache_alloc_batch+13c/1a0>
Code;  c0132c36 <kmem_cache_alloc_batch+106/1a0>
  10:   b8 a5 c2 0f 00            mov    $0xfc2a5,%eax

Though this is different to previous/recent crashes from the same kernel that 
have been more of the form:

Sep 25 20:36:50 blue01 kernel: kernel BUG at slab.c:1439!
Sep 25 20:36:50 blue01 kernel: invalid operand: 0000
Sep 25 20:36:50 blue01 kernel: CPU:    1
Sep 25 20:36:50 blue01 kernel: EIP:    0010:[<c0133a64>]    Not tainted
Sep 25 20:36:50 blue01 kernel: ESep 25 20:36:50 blue01 kernel: kernel BUG at 
slab.c:1439!
Sep 25 20:36:50 blue01 kernel: invalid operand: 0000
Sep 25 20:36:50 blue01 kernel: CPU:    1
Sep 25 20:36:50 blue01 kernel: EIP:    0010:[<c0133a64>]    Not tainted
FLAGS: 00010016
Sep 25 20:36:50 blue01 kernel: eax: 5a2cf071   ebx: 00a59840   ecx: f7edec10   
edx: c1c0f060
Sep 25 20:36:50 blue01 kernel: esi: f732c000   edi: f732cf5c   ebp: f732cee8   
esp: f7eddf48
Sep 25 20:36:50 blue01 kernel: ds: 0018   es: 0018   ss: 0018
Sep 25 20:36:50 blue01 kernel: Process kswapd (pid: 5, stackpage=f7edd000)
Sep 25 20:36:50 blue01 kernel: Stack: 00008842 00000002 f7edec10 f7edec00 
00000000 00000002 00000000 00000000 
Sep 25 20:36:50 blue01 kernel:        00000000 c1c0f060 00000020 000001d0 
00000006 00000000 c0134e59 c036fa08 
Sep 25 20:36:50 blue01 kernel:        00000006 000001d0 c036fa08 00000000 
c0134f0c 00000020 c036fa08 00000002 
Sep 25 20:36:50 blue01 kernel: Call Trace:    [<c0134e59>] [<c0134f0c>] 
[<c0134fb1>] [<c0135026>] [<c013515d>]
Sep 25 20:36:50 blue01 kernel:   [<c0105000>] [<c0107296>] [<c01350c0>]
Sep 25 20:36:50 blue01 kernel: 
Sep 25 20:36:50 blue01 kernel: Code: 0f 0b 9f 05 a0 54 2b c0 8b 44 24 24 89 ea 
8b 48 18 b8 71 f0 
Sep 25 20:36:50 blue01 kernel: EFLAGS: 00010016
Sep 25 20:36:50 blue01 kernel: eax: 5a2cf071   ebx: 00a59840   ecx: f7edec10   
edx: c1c0f060
Sep 25 20:36:50 blue01 kernel: esi: f732c000   edi: f732cf5c   ebp: f732cee8   
esp: f7eddf48
Sep 25 20:36:50 blue01 kernel: ds: 0018   es: 0018   ss: 0018
Sep 25 20:36:50 blue01 kernel: Process kswapd (pid: 5, stackpage=f7edd000)
Sep 25 20:36:50 blue01 kernel: Stack: 00008842 00000002 f7edec10 f7edec00 
00000000 00000002 00000000 00000000 
Sep 25 20:36:50 blue01 kernel:        00000000 c1c0f060 00000020 000001d0 
00000006 00000000 c0134e59 c036fa08 
Sep 25 20:36:50 blue01 kernel:        00000006 000001d0 c036fa08 00000000 
c0134f0c 00000020 c036fa08 00000002 
Sep 25 20:36:50 blue01 kernel: Call Trace:    [<c0134e59>] [<c0134f0c>] 
[<c0134fb1>] [<c0135026>] [<c013515d>]
Sep 25 20:36:50 blue01 kernel:   [<c0105000>] [<c0107296>] [<c01350c0>]
Sep 25 20:36:50 blue01 kernel: Code: 0f 0b 9f 05 a0 54 2b c0 8b 44 24 24 89 ea 
8b 48 18 b8 71 f0 

>>EIP; c0133a64 <kmem_cache_reap+1c4/490>   <=====
Trace; c0134e58 <shrink_caches+18/90>
Trace; c0134f0c <try_to_free_pages+3c/60>
Trace; c0134fb0 <kswapd_balance_pgdat+50/a0>
Trace; c0135026 <kswapd_balance+26/40>
Trace; c013515c <kswapd+9c/b6>
Trace; c0105000 <_stext+0/0>
Trace; c0107296 <kernel_thread+26/30>
Trace; c01350c0 <kswapd+0/b6>
Code;  c0133a64 <kmem_cache_reap+1c4/490>
00000000 <_EIP>:
Code;  c0133a64 <kmem_cache_reap+1c4/490>   <=====
   0:   0f 0b                     ud2a      <=====
Code;  c0133a66 <kmem_cache_reap+1c6/490>
   2:   9f                        lahf   
Code;  c0133a66 <kmem_cache_reap+1c6/490>
   3:   05 a0 54 2b c0            add    $0xc02b54a0,%eax
Code;  c0133a6c <kmem_cache_reap+1cc/490>
   8:   8b 44 24 24               mov    0x24(%esp,1),%eax
Code;  c0133a70 <kmem_cache_reap+1d0/490>
   c:   89 ea                     mov    %ebp,%edx
Code;  c0133a72 <kmem_cache_reap+1d2/490>
   e:   8b 48 18                  mov    0x18(%eax),%ecx
Code;  c0133a74 <kmem_cache_reap+1d4/490>
  11:   b8 71 f0 00 00            mov    $0xf071,%eax


Regards and thanks
 
Ian Hardy
Research Services
Information Systems Services
Southampton University




<Prev in Thread] Current Thread [Next in Thread>