bsuparna@xxxxxxxxxx wrote:
> >> I'm having some issues with MP system dumping - I'm getting some Oops
> type
> >> issues in the
> >> ext2 file system in a stress test - the system tries to dump, but then
> >> multiple processors get watchdog timeouts
> >> and hose the dump - I'm hoping that the later code from the CVS tree
> will
> >> begin to address this issue so I can
> >> get to work on the "real" problem.
> >
> >I'm fixing an issue with SMP right now. Suparna hasn't said much
> >except "hmmm", but I should be able to move the sti() call in
> >dump_silence_system() in front of the __dump_silence_system() call
> >to re-enable the interrupts.
> >
> >I'm not sure the watchdog timeouts are the same. Can you mail me
> >the stress test/configuration you're using?
>
> Well, Matt, I think I said much more than that in my last mail on the SMP
> issues ! But perhaps it wasn't expressed too well ? I'll go into more
> detail again, then in a while.
Actually, you did, my apologies. I was referring to the other night in
our discussions based on that recommendation I made (which again, is not
the inevitable fix, as you indicate below). :)
> But before that, I had thought that your suggestion of moving "sti" up only
> had to do with triggering dump from an interrupt handler on an SMP system,
> to avoid calling smp_call_function with interrupts disabled, didn't it ? To
> us, it appears that making sure that dump works in general from interrupt
> context in itself (keeping in mind all kinds of drivers) involves more
> thought. What Vamsi is experimenting with for kdb (dumping from a
> different thread context), may be part of the solution (i.e. always ensure
> that we have a legal context when we perform the dump i/o). Another
> approach could turn out to be the dump driver interface that you've been
> working on. It is also possible that an intermediate solution emerges. In
> any case, we plan to share our observations and trial patches along the
> way.
Great, I'm ready to test anything you may have, and of course, I'm
always sharing what I test out here.
> Coming back to the MP issues.
> I really believe that it is important to have a reliable solution which we
> are confident will work under most situations, _and_ that we are at least
> in a position to say for sure which situations it may not work under.
>
> We already have made some of the changes that we think are needed for the
> SMP issues (for non-disruptive dumping) and got them to run (yes this is
> with your latest code, with a modified version of lkcd_config with a few
> fixes :)). Bharata could pass on the patch to you, if you'd like to take a
> look at it.
Sure, although I fixed a few things in lkcd_config already per an
earlier
E-mail. I'm not sure how many cross over, but I didn't test -q at all,
and Marty showed me (in so many words) that it wasn't ready for prime
time.
Again, hopefully this is all corrected. I was mostly concerned with
the DIOS stuff.
> But this is a tricky area - its great to see it work when we try it out :),
> but we know that this doesn't necessarily mean it will work under all
> conditions. That's why we've been spending so much time trying to analyse
> and understand the issues, and trying to close logical loopholes as far as
> possible. And there are few things we still need to work on before we have
> our code ready for release. There are also a few tradeoffs that we end up
> making in terms of dump accuracy as we try to fix some of the problems.
I'd like to know a bit more of what those tradeoffs are. I assume
you're on #lkcd?
> So, what are the some of the pending concerns with MP system dumping ?
> For now, lets leave out the case of dumping from an interrupt handler, to
> simply the scope.
>
> If we make the other CPUs spin inside the IPI (CALL_FUNCTION_VECTOR) while
> dump is in progress, we need to take care of the following things:
>
> 1. If we spin the other cpu's with interrupts disabled, then we need to
> make sure that the NMI watchdog timer doesn't report lockups (given that
> dumping would take some time) on the CPUs which are intentionally made to
> spin (we might want lockups to be detected on the dumping cpu just in case
> dump itself runs into problems).
> We had this check in our patch.
Can this be as simple as dump_in_progress, or something more complex?
> 2. It is possible that a disk interrupt generated in the dump i/o process,
> gets delivered to one of the spinning CPUs rather than the dumping CPU.
> This seems possible because the IRQ affinity for this interrupt indicates
> that any CPU could receive the interrupt. If the other CPUs are spinning
> with interrupts disabled, then they won't service such interrupts --
> resulting in a potential deadlock as the dumping thread waits for i/o to
> complete.
Eww. This means that going to re-program the APIC.
> We spent some time delving through APIC arbitration logic section of the
> Intel manual :) to see if there is something that could be used to avoid
> this. (If you look at my last posting in this context, you might notice
> where some of the observations there came from ...).
> Somehow, when we tested our code on a 2 way machine, we never seemed to be
> hitting this case ... Bharata tried it on a 4 way yesterday and in one of
> the trials he did run into a deadlock (this doesn't happen consistently,
> though).
> There are 2 ways to avoid this:
> (a) Simply keep interrupts enabled on the other CPUs as they spin. This is
> likely to cause a little more drift in the dump snapshot, but then right
> now since we already have interrupts enabled on the dumping CPU, its
> probably not too bad.
> We've already tried this out.
Okay. This one is probably faster to implement (meaning less complex)
as well.
> (b) Change the IRQ affinities (i.e. program the APIC) so that interrupts
> get delivered only to the dumping CPU during the dump process, and then
> revert back to the original affinities, after dumping is through.
> This is what we are experimenting with now.
If this is possible, great, but given what you have to change while
going silent, how possible is this? Note, I'm just now going to look
at the code path.
> Now, besides the above there is another issue to think about (not so much
> for non-disruptive case). Given that smp_call_function waits for the other
> CPUs to receive the IPI, do we have a failsafe method to get a dump in case
> another CPU is caught in a tight loop with interrupts disabled ?
> If we used an NMI IPI, this could cause potential deadlocks in the dump
> path.
This will always be the case, though. Unless you add in a special NMI
dump handler to handle interrupts and system state at the point of the
NMI, you won't have much luck, and even then, an NMI dump after a new
dump type is risky (double panics are almost always useless).
> Another question is if there any possibility of our interrupting another
> CPU in the midst of some i/o operations while it holds a lock needed in the
> dump path. We have been studying the i/o code to check if such a
> possibility could arise. I haven't noticed anything so far, but would like
> to check this out further just to be absolutely sure. For
> spin_lock_irqsave, this shouldn't be a problem., so obviously, this can
> happen for locks taken only in the pre-requestfn stage of the block i/o
> logic, if at all.
>
> Anything else that I missed ?
You've covered practically everything. I'll login now to chat with you
about this some more. I'm curious about your APIC thoughts.
Thanks, Suparna.
--Matt
> Regards
> Suparna
>
> Suparna Bhattacharya
> IBM Software Lab, India
> E-mail : bsuparna@xxxxxxxxxx
> Phone : 91-80-5267117, Extn : 3961
|