Keith Owens wrote:
> On Wed, 12 Sep 2001 23:25:19 -0700,
> "Matt D. Robinson" <yakker@xxxxxxxxxxxxxx> wrote:
> >bsuparna@xxxxxxxxxx wrote:
> >> 1. If we spin the other cpu's with interrupts disabled, then we need to
> >> make sure that the NMI watchdog timer doesn't report lockups (given that
> >
> >Can this be as simple as dump_in_progress, or something more complex?
>
> Andrew Morton has code in the -AC tree which is a generic fix for the
> problem of the NMI watchdog tripping on long events. kdb uses it in
> the -AC tree.
>
> if (*f == NULL) {
> /* Reset NMI watchdog once per poll loop */
> touch_nmi_watchdog();
> f = &poll_funcs[0];
> }
>
> There is no equivalent in Linus's tree, you have to hack the NMI
> handler yourself :(. Time to push Andrew Morton and AC to get the NMI
> changes into Linus's tree.
Thanks, Keith ...
This looks like a reasonable patch to use, although shouldn't
touch_nmi_watchdog() reset both the last_irq_sums[] and the
alert_counter[] for all CPUs? Otherwise, won't you be
dropping back into this loop function over and over and
over again? Then again, you probably re-enter it anyway.
Looks like touch_nmi_watchdog() is needed in combination with
dump_in_progress. In case you don't have the tree handy,
Suparna (I didn't), touch_nmi_watchdog() does:
void touch_nmi_watchdog (void)
{
int i;
/*
* Just reset the alert counters, (other CPUs might be
* spinning on locks we hold):
*/
for (i = 0; i < smp_num_cpus; i++)
alert_counter[i] = 0;
}
alert_counter is moved out of nmi_watchdog_tick() along with
last_irq_sums (global in AC's patch) so you can modify them
outside of nmi_watchdog_tick().
--Matt
|