>> I'm having some issues with MP system dumping - I'm getting some Oops
type
>> issues in the
>> ext2 file system in a stress test - the system tries to dump, but then
>> multiple processors get watchdog timeouts
>> and hose the dump - I'm hoping that the later code from the CVS tree
will
>> begin to address this issue so I can
>> get to work on the "real" problem.
>
>I'm fixing an issue with SMP right now. Suparna hasn't said much
>except "hmmm", but I should be able to move the sti() call in
>dump_silence_system() in front of the __dump_silence_system() call
>to re-enable the interrupts.
>
>I'm not sure the watchdog timeouts are the same. Can you mail me
>the stress test/configuration you're using?
Well, Matt, I think I said much more than that in my last mail on the SMP
issues ! But perhaps it wasn't expressed too well ? I'll go into more
detail again, then in a while.
But before that, I had thought that your suggestion of moving "sti" up only
had to do with triggering dump from an interrupt handler on an SMP system,
to avoid calling smp_call_function with interrupts disabled, didn't it ? To
us, it appears that making sure that dump works in general from interrupt
context in itself (keeping in mind all kinds of drivers) involves more
thought. What Vamsi is experimenting with for kdb (dumping from a
different thread context), may be part of the solution (i.e. always ensure
that we have a legal context when we perform the dump i/o). Another
approach could turn out to be the dump driver interface that you've been
working on. It is also possible that an intermediate solution emerges. In
any case, we plan to share our observations and trial patches along the
way.
Coming back to the MP issues.
I really believe that it is important to have a reliable solution which we
are confident will work under most situations, _and_ that we are at least
in a position to say for sure which situations it may not work under.
We already have made some of the changes that we think are needed for the
SMP issues (for non-disruptive dumping) and got them to run (yes this is
with your latest code, with a modified version of lkcd_config with a few
fixes :)). Bharata could pass on the patch to you, if you'd like to take a
look at it.
But this is a tricky area - its great to see it work when we try it out :),
but we know that this doesn't necessarily mean it will work under all
conditions. That's why we've been spending so much time trying to analyse
and understand the issues, and trying to close logical loopholes as far as
possible. And there are few things we still need to work on before we have
our code ready for release. There are also a few tradeoffs that we end up
making in terms of dump accuracy as we try to fix some of the problems.
So, what are the some of the pending concerns with MP system dumping ?
For now, lets leave out the case of dumping from an interrupt handler, to
simply the scope.
If we make the other CPUs spin inside the IPI (CALL_FUNCTION_VECTOR) while
dump is in progress, we need to take care of the following things:
1. If we spin the other cpu's with interrupts disabled, then we need to
make sure that the NMI watchdog timer doesn't report lockups (given that
dumping would take some time) on the CPUs which are intentionally made to
spin (we might want lockups to be detected on the dumping cpu just in case
dump itself runs into problems).
We had this check in our patch.
2. It is possible that a disk interrupt generated in the dump i/o process,
gets delivered to one of the spinning CPUs rather than the dumping CPU.
This seems possible because the IRQ affinity for this interrupt indicates
that any CPU could receive the interrupt. If the other CPUs are spinning
with interrupts disabled, then they won't service such interrupts --
resulting in a potential deadlock as the dumping thread waits for i/o to
complete.
We spent some time delving through APIC arbitration logic section of the
Intel manual :) to see if there is something that could be used to avoid
this. (If you look at my last posting in this context, you might notice
where some of the observations there came from ...).
Somehow, when we tested our code on a 2 way machine, we never seemed to be
hitting this case ... Bharata tried it on a 4 way yesterday and in one of
the trials he did run into a deadlock (this doesn't happen consistently,
though).
There are 2 ways to avoid this:
(a) Simply keep interrupts enabled on the other CPUs as they spin. This is
likely to cause a little more drift in the dump snapshot, but then right
now since we already have interrupts enabled on the dumping CPU, its
probably not too bad.
We've already tried this out.
(b) Change the IRQ affinities (i.e. program the APIC) so that interrupts
get delivered only to the dumping CPU during the dump process, and then
revert back to the original affinities, after dumping is through.
This is what we are experimenting with now.
Now, besides the above there is another issue to think about (not so much
for non-disruptive case). Given that smp_call_function waits for the other
CPUs to receive the IPI, do we have a failsafe method to get a dump in case
another CPU is caught in a tight loop with interrupts disabled ?
If we used an NMI IPI, this could cause potential deadlocks in the dump
path.
Another question is if there any possibility of our interrupting another
CPU in the midst of some i/o operations while it holds a lock needed in the
dump path. We have been studying the i/o code to check if such a
possibility could arise. I haven't noticed anything so far, but would like
to check this out further just to be absolutely sure. For
spin_lock_irqsave, this shouldn't be a problem., so obviously, this can
happen for locks taken only in the pre-requestfn stage of the block i/o
logic, if at all.
Anything else that I missed ?
Regards
Suparna
Suparna Bhattacharya
IBM Software Lab, India
E-mail : bsuparna@xxxxxxxxxx
Phone : 91-80-5267117, Extn : 3961
|