> Hello,
>
> Looking at the vmdump code, here is something that puzzles me.
> I'm not sure if I'm missing something obvious here.
>
> Since right now dump involves wait_kio calls, which involves a context
> switch to another runnable process, isn't there a chance of the memory
> state changing whilst the dump is going on. Couldn't the dump become
> inconsistent, or not correctly reflect the state of the system when the
> incident that triggered the dump happened ? (Since interrupts aren't turned
> off, even that could affect the state ... but to a lesser extent, I guess)
Yes -- the whole point behind adding smp_send_stop() into the panic()
and die_if_kernel() mechanism was to avoid having other processes run
while the dump was taking place. I didn't see a good hook in the
scheduler to say, "Okay, hold off, don't run any other jobs except
mine", and putting smp_send_stop() into place messes up both x86 and
ia64 systems, due to the local APIC being disabled (meaning, if your
system crashes on a CPU other than 0, you're toast).
This leads to the second problem -- even if you do stop all other
system processes and are able to disable interrupts to most devices,
you can't write out to disk in a "raw" fashion. Kiobufs are a hack
at best as far as raw I/O is concerned. It's just a page grouping
mechanism for good s/g stuff, IMHO. Linux is immature as far as
raw device output is concerned.
> I had actually started with looking into the smp_send_stop issue and the
> more generic issue of getting a consistent system snapshot (as accurately
> reflecting the state at the time of the system crash as possible), when
> this question came to mind. BTW, is there some work going on in this area ?
> Or have the issues been sorted out already ?
There are two ways to do this:
1) Stop all system activity, shut down interrupts as much as possible,
and dump all of memory to disk.
2) Stop the system immediately, reset the system, and on the way back
up, early in the boot process, dump the memory to disk either at
bios or in the setup of the kernel.
Both mechanisms have their problems in Linux. I don't like the second
solution, because not every system (most, in fact) preserve memory state
between system resets. The first solution is as close as I can get at
this point to saving the memory dump accurately, and even with that,
we can have problems in some circumstances.
For example, what if you crash in a disk interrupt handler?
> Matt you had mentioned that you were working on a specialized IDE driver
> for dump, to avoid having to go through the normal kio/raw i/o path in the
> kernel. Is that still in the plan ?
Yes, although I sent it off to Andre Hedrick, and he sent me a
single line response saying (basically), "Why would you ever want
to do that?"
Needless to say, it wasn't very encouraging. I have it, it's written,
and it works on my disk here in the office, but I haven't tried it on
multiple IDE disks, or different IDE controllers, etc. It's basically
untested outside my office. :)
The best solution (IMHO) is to create:
raw device table
raw device handles (open(), read(), write(), etc.)
disk device driver handles (ide_rw_open(), etc.)
Right now, ide-disk.c interfacing to ide.c is horrific. Putting in
my raw disk mechanism is again, doing things in a non-elegant way,
but it does get the job done.
Anyway, I have something, you're more than welcome to look it over
and tell me what you think. I was hoping to get Andre's impression
on things, but given the way the kernel development has been going
on lately, I'm never sure what's going to get in.
> Regards
> Suparna
--Matt
|