xfs
[Top] [All Lists]

RE: oops umounting full LVM snapshots

To: "FORRESTER,JUSTIN (HP-Loveland,ex1)" <justin_forrester@xxxxxx>, "'Eric Sandeen'" <sandeen@xxxxxxx>, "'Steve Lord'" <lord@xxxxxxx>
Subject: RE: oops umounting full LVM snapshots
From: "FORRESTER,JUSTIN (HP-Loveland,ex1)" <justin_forrester@xxxxxx>
Date: Wed, 27 Feb 2002 17:40:26 -0500
Cc: "DICKENS,CARY (HP-Loveland,ex2)" <cary_dickens2@xxxxxx>, "'Xfs \"Mailing List (E-mail)'" <linux-xfs@xxxxxxxxxxx>, "PATTERSON,ANDREW (HP-Loveland,ex2)" <andrew_patterson@xxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
I've got some more info about the umount oops that might be helpful.  It
looks like what is happening in our case is the following:

- overflow a LVM snapshot 
- in response, LVM *de-activates* the logical volume for the snapshot, such
that subsequent write attempts cause I/O errors
- XFS attempts to write log information, gets the I/O error, and shuts the
filesystem down.
- something is corrupted at this point, such that writes to a *different*
XFS filesystem result in the kernel oops.  In our case this is when umount
writes to /etc/mtab, which is not even on an LVM volume.

I've included some log information below that substantiates this theory.

Thanks,
Justin



The following lvscan was executed after the snapshot was overflowed, you can
see the snapshot volume is inactive:

--------------------------->
tom# lvscan
lvscan -- ACTIVE            "/dev/vg01/lvol1" [96.00 MB]
lvscan -- ACTIVE   Original "/dev/vg02/lvol1" [992.00 MB]
lvscan -- inactive Snapshot "/dev/vg02/lvol1-snap1" [15.94 MB] of
/dev/vg02/lvol1
lvscan -- 3 logical volumes with 1.08 GB total in 2 volume groups
lvscan -- 2 active / 1 inactive logical volumes
<---------------------------

Then, when the umount was executed we see syslog messages from XFS shutting
the fs down:

--------------------------->
Feb 27 14:49:07 localhost kernel: lvm - lvm_map: ll_rw_blk for inactive LV
/dev/vg02/lvol1-snap1
Feb 27 14:49:07 localhost kernel: lvm - lvm_map: ll_rw_blk for inactive LV
/dev/vg02/lvol1-snap1
Feb 27 14:49:07 localhost kernel: I/O error in filesystem ("lvm(58,1)")
meta-data dev 0x3a01 block 0xf809f
Feb 27 14:49:07 localhost kernel:        ("xlog_iodone") error 5 buf count
1024
Feb 27 14:49:07 localhost kernel: xfs_force_shutdown(lvm(58,1),0x2) called
from line 939 of file xfs_log.c.  Return address = 0xc01bfe36
Feb 27 14:49:07 localhost kernel: Log I/O Error Detected.  Shutting down
filesystem: lvm(58,1)
Feb 27 14:49:07 localhost kernel: Please umount the filesystem, and rectify
the problem(s)
<----------------------------

Then in my original message below we see the xfs write to /etc/mtab causing
the oops.  



> -----Original Message-----
> From: FORRESTER,JUSTIN (HP-Loveland,ex1) [mailto:justin_forrester@xxxxxx]
> Sent: Wednesday, February 27, 2002 11:29 AM
> To: 'Eric Sandeen'; 'Steve Lord'
> Cc: DICKENS,CARY (HP-Loveland,ex2); 'Xfs "Mailing List (E-mail)';
> PATTERSON,ANDREW (HP-Loveland,ex2)
> Subject: RE: oops umounting full LVM snapshots
> 
> 
> 
> 
> We were able to capture an oops yesterday (as opposed to the machine just
> locking up hard).  Here's the oops that we got while umounting a full
> snapshot volume (lvm 1.0.2, kernel 2.4.17).
> 
> Thanks,
> Justin
> 
> 
> invalid operand: 0000
> CPU:    0
> EIP: 0010:[<c0135468>]   Not tainted
> Using defaults from ksymoops -t elf32-i386 -a i386
> EFLAGS: 00010282
> eax: 00000000 ebx: 00000002 ecx: f3bee6   edx: c03ff1ac
> esi: f3bee600 edi: c2defa00 ebp: f3c94080 esp: f2ecfd48
> ds: 0018 es: 0018 ss: 0018
> Process umount (pid: 7235, stackpage=f4141000)
> Call Trace: [<c0135f5c>] [<c0135f7b>] [<c0135ebe>] [<c01da7eb>]
> [<c01dac0b>]
> [<c01d9b80>] [<c01dadc5>] [<c01daf87>] [<c01db103>] [<c01dff8c>]
> [<c01e12c3>] [<c01dff8c>] [<c01dc464>] [<c0133f57>] [<c0106d6f>]
> Code: 0f 0b 83 3a 00 75 05 89 0a 89 49 24 8b 02 89 41 20 8b 02 8b
> 
> >>EIP; c0135468 <__insert_into_lru_list+1c/5c>   <=====
> Trace; c0135f5c <__refile_buffer+54/5c>
> Trace; c0135f7b <refile_buffer+17/24>
> Trace; c0135ebe <__mark_buffer_dirty+26/2c>
> Trace; c01da7eb <hook_buffers_to_page_delay+33/48>
> Trace; c01dac0b <__pb_block_commit_write_async+47/4c>
> Trace; c01d9b80 <pagebuf_commit_write+40/8c>
> Trace; c01dadc5 <__pagebuf_do_delwri+1b5/23c>
> Trace; c01daf87 <_pagebuf_file_write+13b/1f4>
> Trace; c01db103 <pagebuf_generic_file_write+c3/2f4>
> Trace; c01dff8c <linvfs_pb_bmap+0/c4>
> Trace; c01e12c3 <xfs_write+383/640>
> Trace; c01dff8c <linvfs_pb_bmap+0/c4>
> Trace; c01dc464 <linvfs_write+2c0/304>
> Trace; c0133f57 <sys_write+8f/f0>
> Trace; c0106d6f <system_call+33/38>
> Code;  c0135468 <__insert_into_lru_list+1c/5c>
> 00000000 <_EIP>:
> Code;  c0135468 <__insert_into_lru_list+1c/5c>   <=====
>    0:   0f 0b                     ud2a      <=====
> Code;  c013546a <__insert_into_lru_list+1e/5c>
>    2:   83 3a 00                  cmpl   $0x0,(%edx)
> Code;  c013546d <__insert_into_lru_list+21/5c>
>    5:   75 05                     jne    c <_EIP+0xc> c0135474
> <__insert_into_lru_list+28/5c>
> Code;  c013546f <__insert_into_lru_list+23/5c>
>    7:   89 0a                     mov    %ecx,(%edx)
> Code;  c0135471 <__insert_into_lru_list+25/5c>
>    9:   89 49 24                  mov    %ecx,0x24(%ecx)
> Code;  c0135474 <__insert_into_lru_list+28/5c>
>    c:   8b 02                     mov    (%edx),%eax
> Code;  c0135476 <__insert_into_lru_list+2a/5c>
>    e:   89 41 20                  mov    %eax,0x20(%ecx)
> Code;  c0135479 <__insert_into_lru_list+2d/5c>
>   11:   8b 02                     mov    (%edx),%eax
> Code;  c013547b <__insert_into_lru_list+2f/5c>
>   13:   8b 00                     mov    (%eax),%eax
> 
> 
> 
> 
> > -----Original Message-----
> > From: Eric Sandeen [mailto:sandeen@xxxxxxx]
> > Sent: Tuesday, February 26, 2002 5:34 PM
> > To: Steve Lord
> > Cc: DICKENS,CARY " "(HP-Loveland,ex2); Xfs "Mailing List (E-mail);
> > PATTERSON,ANDREW " "(HP-Loveland,ex2)
> > Subject: RE: oops umounting full LVM snapshots
> >
> > I'm starting to wonder, now...  I patched the kernel to increase the
> > stack by 100% and I still get the oops.  The patch also allows me to see
> > stack depth, and things look ok.
> >
> > FWIW, it's even simpler to show the problem, it's not necessary to
> > overflow the snapshot or even copy anything to them.  Just create a
> > couple snapshot volumes, mount them, and unmount them.  Unmounting the
> > first snapshot does a forced shutdown, unmounting the second one does a
> > force shutdown and then oopses.
> >
> > Just for kicks I created 2 dirty xfs filesystems and mounted them
> > ro,norecovery, and unmounted - so at least that works.
> >
> > So it looks like maybe with lvm, xfs is trying to do more log flushing
> > than it should on an ro filesystem, which generates the i/o error, which
> > shuts us down - not sure about the oops yet.  I'm sure Steve will pipe
> > up if this theory is too far out of line.  :)
> >
> > Still looking...
> >
> > -Eric
> >
> >
> > On Tue, 2002-02-26 at 14:11, Steve Lord wrote:
> >
> > > If it is stack overflow as we suspect then different drivers may push
> it
> > > over the edge in different ways. What we need to do is catch it in the
> > > act and see if there isn't something we can push off the stack.
> > --
> > Eric Sandeen      XFS for Linux     http://oss.sgi.com/projects/xfs
> > sandeen@xxxxxxx   SGI, Inc.


<Prev in Thread] Current Thread [Next in Thread>