lkcd
[Top] [All Lists]

Re: Non disruptive dumps -- current work.

To: bharata@xxxxxxxxxx
Subject: Re: Non disruptive dumps -- current work.
From: "Matt D. Robinson" <yakker@xxxxxxxxxxxxxx>
Date: Sat, 15 Sep 2001 01:43:45 -0700
Cc: lkcd@xxxxxxxxxxx
References: <20010914174403.A1601@in.ibm.com>
Sender: owner-lkcd@xxxxxxxxxxx
Bharata B Rao wrote:
> 
> Hi Matt,
> 
> We have integrated our changes for non-disruptive dumps into the
> dump_silence/dump_resume framework. We have a kind of working version,
> but we still need to sort out some issues.
> 
> In our current approach, the dumping cpu sends call function vector ipi
> to other cpus to put them to spin. When the dump is complete, other cpus
> are released from spin and made to continue. We saw that spinning
> with interrupts disabled will not work always as sometimes the disk interrupts
> go to the spinning cpus and get lost. This results in the dumping process to
> hang. As a work around, we are now enabling the interrupts and making the
> local_irq_count zero (to make sure that disk interrupts are not missed
> and softirqs are not prevented from running) and restoring the local_irq_count
> at the end of spin. This approach is found to work in most cases.
> Here the system state during dump can drift to the extent that other cpus
> can handle interrupts and softirqs.
> 
> We are not sure if this is a right approach. As an alternative approach,
> we are also thinking of changing the irq affinity of disk interrupts
> (or all interrupts or some interrupts) to the dumping cpu. Currently we are
> tyring this approach.
> 
> Comments ?

Trying to separate disk interrupts to the dumping CPU might be a pain.
Here's a thought, I haven't tried it yet, and I'm not going to get to
it tonight, so it's a weekend project now:

If we can temporarily set the cpu_online_map to smp_processor_id()
which sets only the dumping CPU, and then call setup_IO_APIC_irqs(),
this may do the "right thing" in terms of redirecting interrupts to
only our CPU. In theory, this sets up all IRQs to point to the
dumping CPU based on cpu_online_map.  In arch/i386/kernel/io_apic.c,
TARGET_CPUS gets defined as cpu_online_map.

So ... again, I haven't tried this yet, and won't until tomorrow at
the earliest, but:

        cli();
        saved_cpu_online_map = cpu_online_map;
        cpu_online_map = smp_processor_id();
        setup_IO_APIC_irqs();
        cpu_online_map = saved_cpu_online_map;
        sti();

... just re-call setup_IO_APIC_irqs() to put things back
after dumping.

If someone's got a hankering to try it tonight, go for it.  Or,
of course, if I'm off my rocker here, feel free to mention that
as well.  :)

Thanks, Bharata.

--Matt

> Regards,
> Bharata.

<Prev in Thread] Current Thread [Next in Thread>