[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: oops umounting full LVM snapshots
- To: "'Eric Sandeen'" <sandeen@sgi.com>, "'Steve Lord'" <lord@sgi.com>
- Subject: RE: oops umounting full LVM snapshots
- From: "FORRESTER,JUSTIN (HP-Loveland,ex1)" <justin_forrester@hp.com>
- Date: Wed, 27 Feb 2002 10:28:35 -0800
- Cc: "DICKENS,CARY (HP-Loveland,ex2)" <cary_dickens2@hp.com>, "'Xfs \"Mailing List (E-mail)'" <linux-xfs@oss.sgi.com>, "PATTERSON,ANDREW (HP-Loveland,ex2)" <andrew_patterson@hp.com>
- Sender: owner-linux-xfs@oss.sgi.com
We were able to capture an oops yesterday (as opposed to the machine just
locking up hard). Here's the oops that we got while umounting a full
snapshot volume (lvm 1.0.2, kernel 2.4.17).
Thanks,
Justin
invalid operand: 0000
CPU: 0
EIP: 0010:[<c0135468>] Not tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010282
eax: 00000000 ebx: 00000002 ecx: f3bee6 edx: c03ff1ac
esi: f3bee600 edi: c2defa00 ebp: f3c94080 esp: f2ecfd48
ds: 0018 es: 0018 ss: 0018
Process umount (pid: 7235, stackpage=f4141000)
Call Trace: [<c0135f5c>] [<c0135f7b>] [<c0135ebe>] [<c01da7eb>] [<c01dac0b>]
[<c01d9b80>] [<c01dadc5>] [<c01daf87>] [<c01db103>] [<c01dff8c>]
[<c01e12c3>] [<c01dff8c>] [<c01dc464>] [<c0133f57>] [<c0106d6f>]
Code: 0f 0b 83 3a 00 75 05 89 0a 89 49 24 8b 02 89 41 20 8b 02 8b
>>EIP; c0135468 <__insert_into_lru_list+1c/5c> <=====
Trace; c0135f5c <__refile_buffer+54/5c>
Trace; c0135f7b <refile_buffer+17/24>
Trace; c0135ebe <__mark_buffer_dirty+26/2c>
Trace; c01da7eb <hook_buffers_to_page_delay+33/48>
Trace; c01dac0b <__pb_block_commit_write_async+47/4c>
Trace; c01d9b80 <pagebuf_commit_write+40/8c>
Trace; c01dadc5 <__pagebuf_do_delwri+1b5/23c>
Trace; c01daf87 <_pagebuf_file_write+13b/1f4>
Trace; c01db103 <pagebuf_generic_file_write+c3/2f4>
Trace; c01dff8c <linvfs_pb_bmap+0/c4>
Trace; c01e12c3 <xfs_write+383/640>
Trace; c01dff8c <linvfs_pb_bmap+0/c4>
Trace; c01dc464 <linvfs_write+2c0/304>
Trace; c0133f57 <sys_write+8f/f0>
Trace; c0106d6f <system_call+33/38>
Code; c0135468 <__insert_into_lru_list+1c/5c>
00000000 <_EIP>:
Code; c0135468 <__insert_into_lru_list+1c/5c> <=====
0: 0f 0b ud2a <=====
Code; c013546a <__insert_into_lru_list+1e/5c>
2: 83 3a 00 cmpl $0x0,(%edx)
Code; c013546d <__insert_into_lru_list+21/5c>
5: 75 05 jne c <_EIP+0xc> c0135474
<__insert_into_lru_list+28/5c>
Code; c013546f <__insert_into_lru_list+23/5c>
7: 89 0a mov %ecx,(%edx)
Code; c0135471 <__insert_into_lru_list+25/5c>
9: 89 49 24 mov %ecx,0x24(%ecx)
Code; c0135474 <__insert_into_lru_list+28/5c>
c: 8b 02 mov (%edx),%eax
Code; c0135476 <__insert_into_lru_list+2a/5c>
e: 89 41 20 mov %eax,0x20(%ecx)
Code; c0135479 <__insert_into_lru_list+2d/5c>
11: 8b 02 mov (%edx),%eax
Code; c013547b <__insert_into_lru_list+2f/5c>
13: 8b 00 mov (%eax),%eax
> -----Original Message-----
> From: Eric Sandeen [mailto:sandeen@sgi.com]
> Sent: Tuesday, February 26, 2002 5:34 PM
> To: Steve Lord
> Cc: DICKENS,CARY " "(HP-Loveland,ex2); Xfs "Mailing List (E-mail);
> PATTERSON,ANDREW " "(HP-Loveland,ex2)
> Subject: RE: oops umounting full LVM snapshots
>
> I'm starting to wonder, now... I patched the kernel to increase the
> stack by 100% and I still get the oops. The patch also allows me to see
> stack depth, and things look ok.
>
> FWIW, it's even simpler to show the problem, it's not necessary to
> overflow the snapshot or even copy anything to them. Just create a
> couple snapshot volumes, mount them, and unmount them. Unmounting the
> first snapshot does a forced shutdown, unmounting the second one does a
> force shutdown and then oopses.
>
> Just for kicks I created 2 dirty xfs filesystems and mounted them
> ro,norecovery, and unmounted - so at least that works.
>
> So it looks like maybe with lvm, xfs is trying to do more log flushing
> than it should on an ro filesystem, which generates the i/o error, which
> shuts us down - not sure about the oops yet. I'm sure Steve will pipe
> up if this theory is too far out of line. :)
>
> Still looking...
>
> -Eric
>
>
> On Tue, 2002-02-26 at 14:11, Steve Lord wrote:
>
> > If it is stack overflow as we suspect then different drivers may push it
> > over the edge in different ways. What we need to do is catch it in the
> > act and see if there isn't something we can push off the stack.
> --
> Eric Sandeen XFS for Linux http://oss.sgi.com/projects/xfs
> sandeen@sgi.com SGI, Inc.