lkcd
[Top] [All Lists]

[Fwd: Undeliverable: Re: Cant seem to create crash dumps .... help]

To: lkcd@xxxxxxxxxxx
Subject: [Fwd: Undeliverable: Re: Cant seem to create crash dumps .... help]
From: Larry Cohen <Larry.Cohen@xxxxxxxxxxxx>
Date: Fri, 02 Mar 2001 12:25:29 -0500
Sender: owner-lkcd@xxxxxxxxxxx
I tried to get this to Matt but the message was bounced.
-Larry Cohen
--- Begin Message ---
To: Larry Cohen <Larry.Cohen@xxxxxxxxxxxx>
Subject: Undeliverable: Re: Cant seem to create crash dumps .... help
From: System Administrator <postmaster@xxxxxxxxxxxx>
Date: Fri, 2 Mar 2001 12:29:57 -0500
Your message

  To:      Matt D. Robinson
  Subject: Re: Cant seem to create crash dumps .... help
  Sent:    Fri, 2 Mar 2001 12:18:45 -0500

did not reach the following recipient(s):

Matt D. Robinson on Fri, 2 Mar 2001 12:29:57 -0500
    The recipient name is not recognized
        The MTS-ID of the original message is: c=us;a= ;p=storigen
systems;l=XCHANGESERVER0103021729F6F4RC8G
    MSEXCH:IMS:Storigen Systems Inc.:Lowell:XCHANGESERVER 3550
(000B09AA) 550 5.1.1 <matt@xxxxxxxxxxxxxx>... User unknown


--- Begin Message ---
To: "Matt D. Robinson" <matt@xxxxxxxxxxxxxx>
Subject: Re: Cant seem to create crash dumps .... help
From: Larry Cohen <Larry.Cohen@xxxxxxxxxxxx>
Date: Fri, 2 Mar 2001 12:18:45 -0500
Hi Matt,

Sorry I did not get back sooner.   Sadly the reason is I did not notice
this message
... sigh.
Anyhow it looks like "unused_list_lock" was 1   ... I remember diving
into that and
recalling that it
was not locked.

-Larry


"Matt D. Robinson" wrote:

> Larry Cohen wrote:
> >
> > Hi Matt,
> > It seems gdb works in some capacity.  I was able to break it just
before the
> > hang point and got the
> > following trace :
> >
> > (gdb) bt
> > #0  wait_kio (rw=1, nr=16, bh=0xcd797a38, size=4096) at
buffer.c:2015
> > #1  0xc013d79b in brw_kiovec (rw=1, nr=1, iovec=0xc03831c8, dev=774,
> >     b=0xcd797c74, size=4096) at buffer.c:2147
> > #2  0xc019f836 in dump_kernel_write () at vmdump.c:558
> > #3  0xc019fb6d in dump_write_header () at vmdump.c:728
> > #4  0xc019fc3f in dump_execute_memdump (panic_str=0xc035d820
"testing crash",
> >     regs=0x0) at vmdump.c:780
> > #5  0xc019feb7 in dump_execute (panic_str=0xc035d820 "testing
crash", regs=0x0)
> >
> >     at vmdump.c:944
> > #6  0xc011a661 in panic (fmt=0xc02e3aa8 "testing crash") at
panic.c:77
> > #7  0xc0236e54 in my_function(start=0xcd797f14) at  my_function.c:74
> > #8  0xd0909070 in ?? ()
> > #9  0xc011bad6 in sys_init_module (name_user=0x8058580
"my_function_module",
> >     mod_user=0x8060920) at module.c:544
> > #10 0xc01092ff in system_call () at af_packet.c:1876
> > #11 0x804ab0e in ?? () at af_packet.c:1876
> > #12 0x804b1a7 in ?? () at af_packet.c:1876
> > #13 0x804b3f2 in ?? () at af_packet.c:1876
> > #14 0x400349cb in ?? () at af_packet.c:1876
>
> Yea, looks like someone's holding one of the I/O locks in the
> kiobuf path.  Do you know what the value of unused_list_lock is?
> It's very likely that there's nothing LKCD can do (in this
> version of the driver) to resolve the problem, since it's the
> kiobuf code that's locked up, not LKCD. :(
>
> --Matt
>
> > Matt D. Robinson" wrote:
> >
> > > Larry Cohen wrote:
> > > >
> > > > >
> > > >
> > > > Hi Matt,
> > > >
> > > > Thanks much for responding so quickly.
> > > > I did check below and I believe everything is correct.
> > > >
> > > > /proc/sys/dumpdev contains /dev/vmdump.
> > > > /dev/vmdump is a link /dev/hda6 which is the swap device.
> > > > ls -l  /dev/hda6 yields:
> > > > brw-rw----    1 root     disk       3,   6 May  5  1998
/dev/hda6
> > > > swapon -s yeilds:
> > > > [root@ipa vmdump]# swapon -s
> > > > Filename                        Type            Size    Used
Priority
> > > > /dev/hda6                       partition       2097136 0
-1
> > > >
> > > > /proc/sys/kernel/panic is 5.
> > > >
> > > > rc.sysinit has been updated.
> > > >
> > > > I am running  Linux 2.4.0  with the 3.1.1 patches so I presume I
do not
> > > > need the raw I/O patch.
> > > > I asked the fellow who put the system together to mail me the
hardware
> > > > specs and I'll send those along
> > > > as soon as I get them.  It is an Intel machine.
> > > >
> > > > The output I see is:
> > > >
> > > > (gdb) c
> > > > Continuing.
> > > > CPU:    1
> > > > EIP:    0010:[<c02207ee>]
> > > > EFLAGS: 00010246
> > > > eax: 00000000   ebx: c7c36000   ecx: 00000010   edx: 00000000
> > > > esi: c89c4000   edi: 0000009c   ebp: c7c97ef0   esp: c7c979a8
> > > > ds: 0018   es: 0018   ss: 0018
> > > > Process mount.sfs (pid: 1062, stackpage=c7c95000)
> > > > Stack: c89e2a00 00000008 c0302adc c037d680 c037d640 50193d03
c037d680
> > > > 4088c800
> > > >        c7c97ae0 c01da8d8 c7c97a14 c01da935 c037d680 cff26ae0
00000001
> > > > c037d640
> > > >        c037d680 c1473e60 c0300980 20e466e0 0000003c 00000005
0000009c
> > > > c037d640
> > > > Call Trace: [<c01da8d8>] [<c01da935>] [<c0139f14>] [<c014d88d>]
> > > > [<c014daf5>] [<c013d5cc>] [<c013d735>]
> > > >        [<c013d9ff>] [<c013e583>] [<c013e394>] [<c013e737>]
[<c0109183>]
> > > >
> > > > Code: f3 ab 8b 85 28 fb ff ff 83 b8 14 10 00 00 00 75 11 68 40
2e
> > > > Dumping to device 0x306 [ide0(3,6)] ...
> > > > Writing dump header ...
> > > >
> > > > .... the system is wedged at this point and I can not break in
with gdb.
> > > > The hang appears to happen waiting in wait_on_buffer ().   A few
blocks
> > > > appear to be written
> > > > but then everything locks up.
> > >
> > > Looks like your configuration is correct.  My presumption here is
that
> > > you're dying somewhere in the page cache code -- otherwise, you
wouldn't
> > > be hung up waiting to write out to disk.  This means that
someone's
> > > holding the io_request_lock (or one of the other locks in the I/O
space)
> > > which is preventing us from being able to write out the dump
pages.
> > >
> > > Since a dump may not be possible, what you should do is take the
built
> > > kernel, and run 'lcrash' on the live system, and disassemble the
> > > instructions based on the "Call Trace" above, which should give
you a
> > > good idea as to what the stack trace at the time of the panic is.
So
> > > if you 'dis c01da8d8', you should see the function name where the
problem
> > > occurred.  Do the same for the rest of the arguments (or just use
> > > 'ksymoops' in the meantime).
> > >
> > > If this isn't a page cache/buffer/request queue lock scenario,
then we
> > > shouldn't have hung up.
> > >
> > > Oh, one other thing ... gdb probably won't work after you call
down
> > > into our function, because die() has been called, and you're
beyond the
> > > kgdb breakpoint area (and smp_send_stop() should have stopped all
CPUs
> > > except the one we are executing on).
> > >
> > > Let's see what the disassembled functions show ... let me know as
> > > soon as you can.  Thanks. :)
> > >
> > > --Matt
> > >
> > > > Thanks very much for your help,
> > > > -Larry
> > > >
> > > > > Larry Cohen wrote:
> > > > > >
> > > > > > Hi,  I'm pretty sure I have installed the lkcd patch and
utilities ok
> > > > > > but when I the system panics it
> > > > > > will hang when trying to write out the dump to the swap
device.
> > > > > > My disks are IDE not SCSI is this a problem?
> > > > > > Any other thoughts?
> > > > > >
> > > > > > Thanks,
> > > > > > Larry Cohen
> > > > >
> > > > > Do you have the output from the console?  It sounds like you
> > > > > have everything configured correctly.  If you run
"/sbin/vmdump config",
> > > > > are the right values showing up in /proc/sys/vmdump?  IDE and
SCSI
> > > > > disks should work with the 3.1.1 patches.
> > > > >
> > > > > Typically there are problems when the right devices aren't
configured
> > > > > in /proc/sys/vmdump (which are updated when "/sbin/vmdump
config" is
> > > > > run), or /proc/sys/kernel/panic is zero, which means the
system won't
> > > > > reset after taking a dump.
> > > > >
> > > > > Note that you have to modify your /etc/rc.d/rc.sysinit (or
like script
> > > > > depending on the Linux variant you are running) to add the
/sbin/vmdump
> > > > > calls.  Check out the README.
> > > > >
> > > > > If you've done all this and it still fails, send me the
console output,
> > > > > along with your hardware specifications, and we'll see what we
can do
> > > > > about getting to the bottom of your problem.  Thanks, Larry.
> > > > >
> > > > > --Matt

--- End Message ---

--- End Message ---
<Prev in Thread] Current Thread [Next in Thread>
  • [Fwd: Undeliverable: Re: Cant seem to create crash dumps .... help], Larry Cohen <=