lkcd
[Top] [All Lists]

RE: capturing cpu states on SMP

To: "James Washer" <washer@xxxxxxxxxx>
Subject: RE: capturing cpu states on SMP
From: "S Vamsikrishna" <vamsi_krishna@xxxxxxxxxx>
Date: Mon, 29 Oct 2001 13:20:00 +0530
Cc: lkcd@xxxxxxxxxxx, mvb@xxxxxxxxxx, bharata@xxxxxxxxxx
Importance: Normal
Sender: owner-lkcd@xxxxxxxxxxx

We send an NMI-class IPI to other cpus to capture the registers and stack.
This is the only guaranteed way to ensure that other cpus respond. If they
don't respond to NMI, there is absolutely nothing we can do. We don't wait
around for our IPI to be handled so even in that case, we don't hang the
dump process.

We need to capture the stack, even though we would prefer not to. It is an
additional compilication we would gladly get rid of. The reason being that
the stack could change between the time the registers are captured and the
time that page is written out in the dumping process. The time between
these two events could be rather long when we do deferred dumps (1). The
chages in the stack could be so significant as to render backtracing
impossible/totally inaccurate.

(1) Deferred dumps: when we desire non-disruptive dumps of a running system
for capturing snapshots, we have to ensure that the actual process of
writing the dump happens from a known location in the kernel where we have
not held any locks reqd in the dump-writing process or disabled interrupts
or running inside the disk driver or from a Dynamic Probes' probe handler,
where the probe could be from just about any location in the kernel. In
these cases what we plan to do is capture the system state (registers/stack
for backtracing purposes) immediately on a dump request and wake up a dump
daemon (kernel thread) which will do the actual dump writing when it is
scheduled.

Regards,
Crash Dump Team,
Linux Technology Center,
IBM Software Lab, Bangalore.
Ph: +91 80 5262355 Extn: 3959
Internet: vamsi_krishna@xxxxxxxxxx


"James Washer" <washer@xxxxxxxxxx> on 10/26/2001 11:12:09 PM

Please respond to "James Washer" <washer@xxxxxxxxxx>

To:   lkcd@xxxxxxxxxxx
cc:    (bcc: S Vamsikrishna/India/IBM)
Subject:  RE: capturing cpu states on SMP





I'm interested in hearing HOW you capture the register information from
processors "(executing a tight loop, interrupts disabled)"  Care to let me
(us) know?

 - jim


"Monty Vanderbilt" <mvb@xxxxxxxxxx>@oss.sgi.com on 10/25/2001 12:38:25 PM

Sent by:  owner-lkcd@xxxxxxxxxxx


To:   bharata@xxxxxxxxxxxxx, <lkcd@xxxxxxxxxxx>
cc:
Subject:  RE: capturing cpu states on SMP



Great idea!

Why is it necessary to capture the stacks? Those pages should already be in
the memory dump. With the registers you should be able to seed the
backtrace
for any cpu.

-----Original Message-----
From: owner-lkcd@xxxxxxxxxxx [mailto:owner-lkcd@xxxxxxxxxxx]On Behalf Of
Bharata B Rao
Sent: Thursday, October 25, 2001 3:29 AM
To: lkcd@xxxxxxxxxxx
Subject: capturing cpu states on SMP


Hello,

This note is just a heads up to avoid duplicating our efforts. We are
working
on capturing the registers and stack on all the cpus at the time of
dumping.
This has been found to be crucial to debug problems where some of the cpus
on an SMP are hung (executing a tight loop, interrupts disabled).

We have this working in the kernel side. We have also added a command to
display
the saved registers in the lcrash. We need to add some bits to lcrash
so that it can look at the right (saved) stack when back tracing.

Comments?
--
Crash Dump Team,
IBM Linux Technology Center,
IBM Software Lab, Bangalore.

Ph: 91-80-5262355 Ex: 3962
Mail: bharata@xxxxxxxxxx








<Prev in Thread] Current Thread [Next in Thread>