lkcd
[Top] [All Lists]

Re: dump problem while debugging scsi hba driver

To: hiren_mehta@xxxxxxxxxxx
Subject: Re: dump problem while debugging scsi hba driver
From: "Matt D. Robinson" <yakker@xxxxxxxxxxxxxx>
Date: Tue, 07 Nov 2000 09:54:27 -0800
Cc: lkcd@xxxxxxxxxxx
Organization: Alacritech, Inc.
References: <FEEBE78C8360D411ACFD00D0B74779718808E1@xxxxxxxxxxxxxxxxxxxxx>
Sender: owner-lkcd@xxxxxxxxxxx
hiren_mehta@xxxxxxxxxxx wrote:
> 
> I am trying to debug a scsi hba driver (this driver is not
> for AIC7XXX) panic using lkcd. The dump device is on AIC7xxx.
> Also the /root /usr etc are on AIC7xxx. Now if the scsi hba driver
> panics, then can the linux dump to the dump device on aic7xxx ?
> 
> -hiren

If the AIC7xxx driver panics, it's going to be hit and miss as to
whether you get a dump image or not.  The best solution is to go
through some other disk driver (such as an IDE driver) to dump.
This especially makes sense if you're debugging your stuff.  Let
me know more specifically what you're doing, and perhaps I can
offer some more details as to what you might be seeing.

With that said ...

Okay, I'm going to use this as an opportunity to open up a discussion
on this problem.  I'd like to hear people's feedback on what should
be the right direction for the future.  It's important to hear back
something on this ...

Right now, as of 2.4, we end up calling brw_kiovec() as a mechanism
for getting our pages out to disk.  While this is great and all, it
is hardly what I call "acceptable" for dumping purposes.

The problem lies in a couple of areas.  First, Linus has said that
he doesn't want raw I/O for various reasons in the kernel.  While
kiobufs are a nice feature, they hardly come close to what I call
"raw I/O", because they don't get around problems dealing with
buffer head locks and device driver spinlocks.  In addition, Linus
has also said to me that we shouldn't be going through the standard
IDE driver when we dump to disk, as he doesn't trust it (his words,
not mine).

I've dealt with this problem long enough, and it is excruciatingly
annoying.  So where does this leave us, in terms of future development?

Here's what I propose, and I'd like to hear from those of you out there
that have an interest in this area.

*  I'd like to see us create a separate set of generic disk drivers
   that specifically have the purpose of writing out raw to disk.
   Drivers for IDE and SCSI initially, and then any other driver we
   need after that.

*  These drivers can be used for the purpose of writing out raw to
   disk, with the assumption that anyone using them must understand
   they could be clobbering data if writing to a drive where buffered
   I/O is taking place (this should only happen due to coder error,
   where a user tries to use both to the same disk partition).  The
   point is they are supposed to be reliable -- speed isn't a huge
   consideration up front.

*  I don't want to take the path of adding "features" to the current
   set of drivers, because A) they may not be maintained properly,
   B) they will be burdened down by other opinions as to what raw I/O
   really is, and C) we can't guarantee some type of locking won't be
   thrown into the mix.

The complexities are probably:

1)  Inserting a duplicate driver stream into the kernel;
2)  Writing small enough yet complete enough drivers to perform basic
    raw I/O tasks (open, read, write, close) without locking;
3)  Getting this accepted as a standard part of the kernel (yes, I know
    Linus is against a kernel debugger, but this isn't a kernel debugger,
    and despite how awesome 'lcrash' is, it's a crash dump analyzer, not
    a kernel debugger) ... LKCD _needs_ to be part of the kernel.  To
    those of us that care about RAS initiatives, it isn't an option. 
    And if not LKCD, then something like it.

I'd typically recommend just putting in a 'if (dumping)' mechanism to
do lock avoidance down through the driver level, but there isn't a real
raw I/O driver to put that in, and the best solution I see is to make
one.  I've explored this, and I've written some stuff up, but I wanted
to get people's thoughts first before I go running down one path and
people think we should go down some other path.  Andre Hedrick showed me
some taskfile_wait() stuff that can do really low level raw I/O, but
I'm not sure whether it's something we can use or not.

Can I get people's thoughts, please?  I don't ask for much. :)

--Matt

<Prev in Thread] Current Thread [Next in Thread>