Suparna wrote:
The following applies to panic dump type situations only (not
non-disruptive dump, that is):
- Integration with Mission Critical's 2 kernel approach for standalone
dumping situations - for cases where the driver doesn't have a dump
interface or if it is detected that the interface can't be used for some
reason (based on some verification scheme) and the normal i/o path can't /
shouldn't be used.
I gathered some insight from some of the folks who were familiar with a
previous similar kernel dumping effort and this was the route that they
had chosen for the implementation, very similar to what MCL is doing.
The reasons had more to do with accuracy, dump reliability, and a cleaner
overall implementation; prior to this they were doing panic side dumping
and had investigated several alternatives to make it more reliable, we've
considered all of these here. But it is disruptive.
So, a MCL approach for disruptive that is more accurate with non-disruptive
working within the kernel makes sense, especially since we should have more
confidence that the system facilities are ok for a non-disruptive. And, the
levels of intrusiveness is the right way to go.
Regards,
Dave Howell
-----Original Message-----
From: bsuparna@xxxxxxxxxx [mailto:bsuparna@xxxxxxxxxx]
Sent: Wednesday, September 19, 2001 11:55 AM
To: Schaal, Richard
Cc: 'lkcd@xxxxxxxxxxx'
Subject: Re: Non Disruptive Dumps - Question
Valid observations !
As Richard Moore mentions, the direction we are looking at would be to
eventually have the degree of system quiescing configurable, because of the
tradeoffs inherent in the various choices that we make.
If we try to freeze everything, we do affect nomal system operation (at
least timing sensitive operations) as you've observed and that may be an
issue depending on the environment. If we don't freeze things, then we have
a drift in the dump image. The more active the system is, the greater could
be the drift. If we freeze things exactly as is at the instant of dumping
we get an accurate snapshot, but this snapshot may contain data structures
that are in an inconsistent state which could make interpretation a little
difficult. In some situations an experienced debugger may even utilize the
drift to an advantage to interpret some things about the state of the
system. Thus, depending on the kind of problem determination that is
required in a given situation, one may prefer to choose a certain level of
quiescing.
If we wish to utilize the existing code paths in the system for dumping,
and allow various kinds of dump devices / drivers, then there is also the
consideration of the level of system capabilities that need to be available
for dumping (interrupts, softirqs, relevant locks, even kernel worker
threads ?), as well as the changes in state that the dump i/o path
execution causes.
The livedump capability that exists in lcrash today is perhaps an example
where we get a dump while the system continues to operate without any
disruption, but could have inconsistent state with drift as the system
state changes while it is being dumped. What mission critical's dump
facility attempts to provide could in some ways (sans some details) thought
to consitute the opposite end (though it doesn't support dump of the entire
physical memory - e.g user memory areas, because it needs some extra space
for the second kernel boot), i.e close to accurate memory state snapshot
under most conditions with complete disruption of system operation.
It is interesting that you mention mirrored memory, because that kind of
thing indeed could allow one to continue operation without loosing
accuracy. In fact one of the ideas that we had been toying with a while
back (when we just started looking into crash dump) was to see if some kind
of copy-on-write memory snapshotting implementation could be used to
achieve the dual goals of accurate system snapshot and normal system
operation and dumping through normal system interfaces. The idea was to
make use of either page table or segment protection support to implement
such a scheme. One could think of multiple ways to keep track of the
modified portions. (I'm not familiar with the Stratus configuration, so I
don't know if this mirroring happens in hardware or in software) There is,
however, a tradeoff even in this case in terms of resource consumption and
complexity/intrusiveness of our solution. And that is also affected by the
extent of activity that we decide to allow on the system. It is easy enough
to notice that the amount of additional memory needed to maintain the
snapshot depends on the extent/spread of changes in system memory state all
the while from the instant the dump was triggered to the time it completes.
It would vary depending on how exactly we maintain the snapshot, but the
extra space and complexity as well as performance implications do rise, the
more the activity level that we decide to support. For example if we were
to allow normal scheduling and have all applications running without
interruption, then we need a more complicated scheme and more available
memory to maintain the parallel states. So we need to weight the practical
benefits before attempting this.
I guess what we would first try to achieve would be a reasonably working
solution - attempt to keep the drift low, but may not really freeze
everything (e.g. may allow interrupts, and perhaps some critcal kernel
code); also attempt to keep the disruption of the system low, but again not
ideal unjittered operation (e.g. applications would be suspended to start
with). It would be an approximate solution, not an exact one, and the
degree of drift may also depend on the state/context from which dump is
triggered (because of the kind of context that is needed in order to use
the existing i/o path for dumping), but may be of some practical utility.
Then we could look into improving this further towards the ideal solution
for configurable quiesce levels, as we do further work on selective
dumping.
Regarding your specific question about servicing/routing interrupts, if we
knew exactly what interrupts are necessary for the dump driver, then we
could perhaps keep only those active, but again trying to support all kinds
of devices involves some extra considerations in deciding what exactly
needs to be active... And as you observe totally shutting off other
interrupts could have some side effects on system operation continuation.
Yes, we did also think of redirecting the concerned interrupts to take a
special execution path that wouldn't tie in with kernel resources/locks, or
switch to a poll based approach (which is probably what you mean by
"status drive the controller") but then again this needs some specific
knowledge of the IRQs that the device needs. And on Linux we have a wide
range of devices/controllers that we'd ideally like to support for dump.
One approach that is also under consideration is having devices export a
dump interface that could be poll based and avoid locks/resources used in
the normal i/o path (You'd may have seen some discussions about that on
this list earlier - AIX has a ddump interface for example), and do the
right things to ensure normal i/o path works after dump. Matt is probably
already exploring this possibility for IDE to start with. Something like
this could be used for network dumps too, as you might already know. Dave
Howell might have some further thoughts on that.
So some of the efforts would include:
- First attempting an approximate solution with practical tradeoffs between
drift and system continuation+minimal dump environment context setup &
activity requirements -- kind of best effort start and try refining as we
get better ideas
- Provide configurable levels of quiescing together with more granular
selective dumping feature introduction
- Device Driver dump interface evolution (towards more accurate snapshot
and minimizing system dependence)
The following applies to panic dump type situations only (not
non-disruptive dump, that is):
- Integration with mission critical's 2 kernel approach for standalone
dumping situations - for cases where the driver doesn't have a dump
interface or if it is detected that the interface can't be used for some
reason (based on some verification scheme) and the normal i/o path can't /
shouldn't be used.
Regards
Suparna
Suparna Bhattacharya
IBM Software Lab, India
E-mail : bsuparna@xxxxxxxxxx
Phone : 91-80-5267117, Extn : 3961
"Schaal, Richard" <richard.schaal@xxxxxxxxx> on 09/19/2001 06:22:37 AM
Please respond to "Schaal, Richard" <richard.schaal@xxxxxxxxx>
To: "'lkcd@xxxxxxxxxxx'" <lkcd@xxxxxxxxxxx>
cc: (bcc: Suparna Bhattacharya/India/IBM)
Subject: Non Disruptive Dumps - Question
I'm curious as to the external behaviour one would expect to see when
taking
a non-disruptive dump.
Would you be able to start the dump and then continue working on your
application while the dump continues?
- Don't laugh - we could do that at Stratus because of the mirrored memory.
In the first cut on a general purpose system, I would expect to be able to
start a dump - the system would freeze
during the dump, and then when complete, the system would be responsive
once
more. - and not require a reboot.
The reason I ask this, is that I see you folks poking about the IO_APIC
area
, and I think you might be thinking about directing interrupts from all
sources to the one CPU that we want running in order to take the dump. I'm
coming from the other direction thinking that I don't want any interrupts
at
all during the whole dump process.
Which is easier? Would one technique produce a better dump than the other?
Is freezing the system for the duration of the dump going to cause dropped
connections? - is that why you want to be servicing interrupts? If you
service interrupts for I/O chances are that you will blur the dump.
Bottom line - I wonder if it is easier to status drive the disk controller
or redirect and then restore interrupt routing on the fly.
I look forward to your views -
Richard
|