lkcd
[Top] [All Lists]

Re: testing lkcd

To: Brian Hall <brianw.hall@xxxxxxxxxx>
Subject: Re: testing lkcd
From: Matt Robinson <yakker@xxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Nov 1999 12:13:53 -0800 (PST)
Cc: lkcd@xxxxxxxxxxx, torvalds@xxxxxxxxxxxxx
In-reply-to: <XFMail.991109131959.brianw.hall@xxxxxxxxxx>
Sender: owner-lkcd@xxxxxxxxxxx
On Tue, 9 Nov 1999, Brian Hall wrote:
|>Thanks for the detailed reply. But will lkcd work if there is a big crash and
|>interrupts can't be handled? The worst crashes in Linux usually end with
|>"Aieee, killing interrupt handler" !

Right now, there's not a hook in do_exit(), but there is in die().  The
real issue is making sure we put the dump_execute() in the right
location if do_exit() is called directly from a driver.  We might also
want to make crash dumping in interrupt handlers optional (configurable
from user-land).  Under normal circumstances, if die() is called, we
actually never get to the "Aieee" message -- we'll dump system memory
and exit.  There are some cases where die() isn't referred to directly,
but those shouldn't matter too much (like sys_reboot()).

If you have suggestions, let me know.  I'd typically add it before we
lock the kernel, and not terminate anything (as far as files, timers,
etc. are concerned), so we catch as much of the screwed up system state
as possible.  Of course, doing this means that you eliminate a whole
lot of clean-up work.

Remember, there is a hook in die().  This catches a TON of cases.  I'm
looking to find dumps where die() isn't called.

Again, we expect that most dumps will be saved correctly.  The biggest
issue as far as we are concerned is making sure 'lcrash' can read the
stack trace and give you output as to what happened both in the interrupt
handler and the kernel stack page for the associated task.  If you have
a very good example for an interrupt handler panic, pass it on.  I'm
in the process of doing a number of things all at once, and I haven't
created one yet.

Thanks for the feedback, and let us know what else we can provide.  Also,
for everyone else on this list, we'll be releasing a LKCD development
priority list in the next day or so which will list out what our biggest
concerns for future development are, and where we are focusing our
efforts.  If you have any desire to do development, speak up, we're
more than happy to share the load.  We feel this product has tremendous
potential in the future for commercial customers and support companies
who want reliable and fast ways of getting system crash information.

Time to start asking Linus for acceptance ... :)

--Matt

P.S.  For Linus, the link is http://oss.sgi.com/projects/lkcd/ ...

|>On 09-Nov-99 Tom Morano wrote:
|>> Brian Hall wrote:
|>>> 
|>>> Will the crash dump work if the interrupt handler dies? I have a script 
that
|>>> will kill a 2.2.5-2.2.10 kernel via a TCP exploit, but I don't know a way 
to
|>>> do
|>>> that with 2.2.13. Can someone tell me how to cause this? I'd like to test
|>>> this
|>>> case.
|>
|>--
|>Brian Hall <brianw.hall@xxxxxxxxxx>
|>Linux Consultant


<Prev in Thread] Current Thread [Next in Thread>