Matt,
To carry on some of the discussions you'd initiated about having a block
device dump interface.
A question that comes to mind when comparing the proposed interface with the
AIX ddump interface, is figuring out the right place to implement
some the tasks that a driver needs to perform in order to implement dump
support. Obviously we do want to keep the interface as simple and small
as possible (to make acceptance easier), and from that angle what you've
proposed sounds attractive. At the same time in the context of some of
the past discussions just wanted to explore if there are any advantages
of splitting up the interface and abstracting some common processing out
of the driver into the standard dump code.
Looking for inputs/thoughts on this point to understand if its worth
making the interface more granular. I'm not sure what the right answer is.
Another question is how this interface would be extended for network
dumps.
Anyway, here goes:
The various steps involved in dumping from the perspective of the driver:
1. Set aside resources/memory required for dump time (because the dump
code can't share any state with the normal driver path). This could happen:
- during driver/device initialization, or
- during dump device configuration
The latter happens only for configured dump devices, rather than all
active devices that support dump.
However, it requires an interface into the driver to be called when
dump device configuration is taking place.
If we have a dedicated dump device, the device initialization could
happen at dump configuration time, so no separate interface would be
required.
2. Making the device ready for dump, just before starting dump writes.
If the device was in active use by the system then this step could
involve suspending/quiescing any existing i/o and making the h/w device
ready for i/o (could involve device resets in some cases for disruptive
dumps), and associated setup for non-interrupt based i/o.
Again, if this is a dedicated dump device then it is likely to be
ready for dump, though there could be some aspects like bus state to
think of in certain panic circumstances.
3. Initiating dump writes to the device (without waiting for completion)
There would be a limit on how much i/o can be pushed to the driver/device in
one go, without waiting for i/o completion
Would use non-interrupt based mode of i/o
4. Checking for completion of earlier dump i/o. May have a timeout
upto which to try before returning.
5. Poll in a loop waiting for (4) to succeed (within certain timeout bounds),
and submit next batch dump writes (3) and keep repeating till dump is
complete.
6. Release device for normal use (opposite of 2), once dump is complete and
has been saved in a safe location (in the non-disruptive case)
7. Release resources set aside for dumping if dump device is unconfigured
(or device is unconfigured on the system)
With bd_op->dump() steps 3-5 happen in one shot every time the dump buffer
is written out. Step 2 might happen (if needed/appropriate) once depending on
a state flag maintained by the driver. Step 6, I'm not sure about - could
be some way to do this the next time the request function is invoked for the
device. Step 1 could happen during device initialization and 7 during device
shutdown.
(Matt, Let me know if I've guessed any of this wrong)
Could there be an advantage of pulling up the loop in 5 into the common dump
code instead and have separate interfaces for 3 and 4 (could even be a simple
rw flag to bd_op->dump()) ? Besides some commonality of code, the poll routine
could perform some additional checks / activity (e.g touch_nmi_watchdog()),
and with multiple dump devices, some degree of parallelism in i/o can be
achieved depending on the kind of h/w support. In general building higher
level functionality/infrastructure/protocol support could be a little easier.
[BTW, the fact that we send i/o in units of the dump buffer means that some
room already exists for some additional checks, but the level of control is
sharper with the split approach.]
How useful is that ?
Any other comments / observations / experiences to share ?
Regards
Suparna
Matt Robinson wrote:
>
> I would imagine something
> like the following:
>
>
> struct block_device_operations {
> int (*open) (struct inode *, struct file *);
> int (*release) (struct inode *, struct file *);
> int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long);
> int (*check_media_change) (kdev_t);
> int (*revalidate) (kdev_t);
> int (*dump) (struct inode *, struct file *, const char *, size_t,
> loff_t
> *);
> };
>
> It should use the write() constructs (similar to file_operations), but do
> any kind of polling required within the function itself.
>
> IRIX used to call this bddump(), which was an alias to bwrite(), with
> DIRECT_IO set (and had real direct I/O available through the block device
> path). Of course, I haven't looked at the code in over two years, so
> I'm a bit rusty.
>
> In any event, Suparna, I think we do the compression the same way we
> currently do, but throw out anything related to the kiobuf path. Then
> we pass in the dump buffer (which we currently fill with compressed
> data) down through dump_device->bd_op->dump(). Then dump() does a
> poll on the disk (with a timeout) waiting for the data to write out.
>
> This makes life easy for disruptive as well as non-disruptive dumps.
>
> The way I did it before was creating the request structures within
> vmdump.c, which was quite ugly. But again, it was just for test.
>
Suparna Bhattacharya
IBM Software Lab, India
E-mail : bsuparna@xxxxxxxxxx
Phone : 91-80-5267117, Extn : 3961
>
|