lkcd
[Top] [All Lists]

Re: Dump driver interface

To: suparna@xxxxxxxxxx
Subject: Re: Dump driver interface
From: "Matt D. Robinson" <yakker@xxxxxxxxxxxxxx>
Date: Mon, 08 Oct 2001 23:46:42 -0700
Cc: lkcd@xxxxxxxxxxx
Organization: Alacritech, Inc.
References: <CA256ADF.002AA8AE.00@xxxxxxxxxxxxxxxxxxx> <20011008142303.A1275@xxxxxxxxxx>
Sender: owner-lkcd@xxxxxxxxxxx
Suparna Bhattacharya wrote:
> 
> Matt,
> 
> To carry on some of the discussions you'd initiated about having a block
> device dump interface.
> 
> A question that comes to mind when comparing the proposed interface with the
> AIX ddump interface, is figuring out the right place to implement
> some the tasks that a driver needs to perform in order to implement dump
> support. Obviously we do want to keep the interface as simple and small
> as possible (to make acceptance easier), and from that angle what you've
> proposed sounds attractive. At the same time in the context of some of
> the past discussions just wanted to explore if there are any advantages
> of splitting up the interface and abstracting some common processing out
> of the driver into the standard dump code.
> 
> Looking for inputs/thoughts on this point to understand if its worth
> making the interface more granular. I'm not sure what the right answer is.
> 
> Another question is how this interface would be extended for network
> dumps.
> 
> Anyway, here goes:
> The various steps involved in dumping from the perspective of the driver:
> 
> 1. Set aside resources/memory required for dump time (because the dump
>    code can't share any state with the normal driver path). This could happen:
>         - during driver/device initialization, or
>         - during dump device configuration
>    The latter happens only for configured dump devices, rather than all
>    active devices that support dump.
>    However, it requires an interface into the driver to be called when
>    dump device configuration is taking place.
> 
>    If we have a dedicated dump device, the device initialization could
>    happen at dump configuration time, so no separate interface would be
>    required.

Setting aside resources may be problematic, as the main developers will
balk (I believe) at taking system resources for something you may never
use.  But then again, not everyone will use LKCD in the base kernel
tree,
so this may not be an issue (distributions can turn it on if they want
it in their releases).

> 2. Making the device ready for dump, just before starting dump writes.
>    If the device was in active use by the system then this step could
>    involve suspending/quiescing any existing i/o and making the h/w device
>    ready for i/o (could involve device resets in some cases for disruptive
>    dumps), and associated setup for non-interrupt based i/o.

If you call the dump block device operation, it can make sure the system
is silent.  The amount of time to verify this should be small enough.
It's more making sure that the device's request queue is shut down and
not in use, and that the driver can return a ready state.  A block
device read request may be issued to verify the hardware the first time
around.

>    Again, if this is a dedicated dump device then it is likely to be
>    ready for dump, though there could be some aspects like bus state to
>    think of in certain panic circumstances.

True.

> 3. Initiating dump writes to the device (without waiting for completion)
>    There would be a limit on how much i/o can be pushed to the driver/device 
> in
>    one go, without waiting for i/o completion
>    Would use non-interrupt based mode of i/o

I think you can poll and sleep ... even set timeouts if you want. 
Again,
if you time out waiting for a write to complete, there's in almost every
case a problem with the drive/media you're targeting.  You've already
verified that you can read from the device, and that the status of the
device is acceptable.  For a write to fail after checking all that is
pretty
fatal.

> 4. Checking for completion of earlier dump i/o. May have a timeout
>    upto which to try before returning.

I think timeouts are okay -- something on the order of the slowest
attached device (floppy?)

> 5. Poll in a loop waiting for (4) to succeed (within certain timeout bounds),
>    and submit next batch dump writes (3) and keep repeating till dump is
>    complete.

Sure, this sounds all fine.  Batch mode, iterative, either way.

> 6. Release device for normal use (opposite of 2), once dump is complete and
>    has been saved in a safe location (in the non-disruptive case)

Right.

> 7. Release resources set aside for dumping if dump device is unconfigured
>    (or device is unconfigured on the system)

What if you're in a circular dumping loop?  Also, re-kick off the
request queue.

> With bd_op->dump() steps 3-5 happen in one shot every time the dump buffer
> is written out. Step 2 might happen (if needed/appropriate) once depending on
> a state flag maintained by the driver. Step 6, I'm not sure about - could
> be some way to do this the next time the request function is invoked for the
> device. Step 1 could happen during device initialization and 7 during device
> shutdown.
> (Matt, Let me know if I've guessed any of this wrong)

This all sounds fine.  Just about what I initially proposed, but with
some of the looping constructs more clearly defined.

> Could there be an advantage of pulling up the loop in 5 into the common dump
> code instead and have separate interfaces for 3 and 4 (could even be a simple
> rw flag to bd_op->dump()) ? Besides some commonality of code, the poll routine
> could perform some additional checks / activity (e.g touch_nmi_watchdog()),
> and with multiple dump devices, some degree of parallelism in i/o can be
> achieved depending on the kind of h/w support. In general building higher
> level functionality/infrastructure/protocol support could be a little easier.
> [BTW, the fact that we send i/o in units of the dump buffer means that some
> room already exists for some additional checks, but the level of control is
> sharper with the split approach.]
> How useful is that ?

I absolutely think 5 should be common -- that way the driver has
little to think about.  If each device driver is given the option
to dump in its own unique way (beyond location/size/time/etc),
that could lead to 5 needing to know whether it's talking to an
IDE or SCSI driver, for example.

Sounds good ...

> Any other comments / observations / experiences to share ?

I think you've covered most of it.  The real key here is making sure
to maintain as little stuff as possible in the dump functionality for
each device driver.  That device operation is responsible for knowing
what to do when it is called, in terms of configuring the driver
state, reading flags, and writing out pages of data to disk.  For it
to do more than that, such as understanding higher level kernel
structures,
would not be a good thing.

Great outline, Suparna.

--Matt

> Regards
> Suparna
> 
> Matt Robinson wrote:
> >
> > I would imagine something
> > like the following:
> >
> >
> > struct block_device_operations {
> >     int (*open) (struct inode *, struct file *);
> >     int (*release) (struct inode *, struct file *);
> >     int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
> >     int (*check_media_change) (kdev_t);
> >     int (*revalidate) (kdev_t);
> >     int (*dump) (struct inode *, struct file *, const char *, size_t,
> > loff_t
> > *);
> > };
> >
> > It should use the write() constructs (similar to file_operations), but do
> > any kind of polling required within the function itself.
> >
> > IRIX used to call this bddump(), which was an alias to bwrite(), with
> > DIRECT_IO set (and had real direct I/O available through the block device
> > path).  Of course, I haven't looked at the code in over two years, so
> > I'm a bit rusty.
> >
> > In any event, Suparna, I think we do the compression the same way we
> > currently do, but throw out anything related to the kiobuf path.  Then
> > we pass in the dump buffer (which we currently fill with compressed
> > data) down through dump_device->bd_op->dump().  Then dump() does a
> > poll on the disk (with a timeout) waiting for the data to write out.
> >
> > This makes life easy for disruptive as well as non-disruptive dumps.
> >
> > The way I did it before was creating the request structures within
> > vmdump.c, which was quite ugly.  But again, it was just for test.
> >
> 
>   Suparna Bhattacharya
>   IBM Software Lab, India
>   E-mail : bsuparna@xxxxxxxxxx
>   Phone : 91-80-5267117, Extn : 3961
> >

<Prev in Thread] Current Thread [Next in Thread>