Debugging Spinlocks on SMP via KDB

Scott Lurndal slurn at nanobiz.com
Tue Jul 18 10:23:27 PDT 2000


> 
> Ahh, so, kdb is keyed off of threads not pids or proc entries.

linux doesn't have threads.   Every schedulable entity has a pid
and a proc entry.  Note that bottom half processing happens
in the context of an arbitrary process, likewise interrupt handling.

> 
> That's really helpful, the only problem being, setting a break
> point every time the spinlock code is called, slows it down so
> much that we never see the panic. We thought that if the

A breakpoint which is never hit will not cause any slowdown.   In this
case, you must craft the breakpoint such that it only triggers when
the condition of interest (i.e. a long-term waiter on a spin-lock)
occurs.

One could set the breakpoint within the spinlock code rather
than at the beginning of the spinlock code.    This will require
disassembling the spinlock to determine where it is appropriate
to set the breakpoint.  This way, the breakpoint only triggers 
if you must actually spin.   With a bit of work, you can even 
modify the spin lock to spin in two places, once for short term
and if that length of spin isn't sufficient, it will branch to
long-term spin code which you can set a breakpoint on.  This way
only long-term (infinite?) spins will trigger the breakpoint.

You may want to modify the spin lock code anyway to save 
the value of 'current' for the thread/process which acquires
the spinlock, and modify the unlock code to reset the saved
value.   Note that the value must be protected by the spinlock 
itself.    You could do this without the debugger, assuming your
system isn't deadlocking because of the recursive acquisition, 
by remembering current and _return_address for each call to 
spin_lock and spin_unlock.

In fact, if it is simply a recursive spinlock acquisition, you can
simply enter the debugger when the 'current' value of the process
which is acquiring the lock is the same as the current value of
the process which owns the lock.   see 'KDB_ENTER()' macro in 
kdb.h.  This would require some trivial modifications to the spin_lock
and spin_unlock code, which may perturb the timing enough to mask 
the problem :-(.   Note that in v0.6 KDB_ENTER() will be a no-op
if interrupts are disabled.

scott
> debugger is keyed off of thread ids, then we can grab that somehow
> and simply add it to the spinlock structure. Saving the state of
> the thread that has the lock and the thread that wants the lock.
> 
> -bmb-
> 
> 
> > -----Original Message-----
> > From: Scott Lurndal [mailto:slurn at nanobiz.com]
> > Sent: Tuesday, July 18, 2000 12:33 PM
> > To: Boerner, Brian
> > Cc: kdb at oss.sgi.com
> > Subject: Re: Debugging Spinlocks on SMP via KDB
> > 
> > 
> > > 
> > > Howdy folks..
> > > 
> > > I'm trying to debug a spinlock problem using kdb 
> > v0.6-2.2.13. Here is a
> > > basic summary of the problem.
> > > 
> > > When my driver is compiled as part of the resident kernel 
> > (i.e. not a
> > > module) on an SMP box, the same
> > > CPU comes along and tries to acquire a lock it already has. 
> > I, of course,
> > > panic the system. My question
> > > is more or less a theory question. 
> > 
> > If this is a bohr bug (as opposed to a heisenbug),  i.e. the bug is
> > reproducible 100% of the time, you can set a breakpoint on 
> > the spinlock
> > spin code (i.e. after it has been determined that the caller 
> > must spin)
> > and examine the saved state (the thread which owns the lock). 
> >  You will
> > be able to acquire stack tracebacks for all involved threads, albeit
> > module symbols may be incomplete.
> > 
> > You must use the breakpoint if the spin lock disables 
> > interrupts because
> > you will otherwise be unable to enter the kernel debugger, unless your
> > hardware exposes an NMI button.
> > 
> > scott
> > > 
> > > Is it possible to use kdb to look at a given thread at the 
> > time the system
> > > panics? 
> > > It is my hopes that I can squirrel away the thread of the 
> > process that has
> > > the lock
> > > and the thread of the process trying to acquire the lock 
> > and look at them in
> > > parallel
> > > to determine the conditions under which this is happening.
> > > 
> > > I'm fairly new to kernel level debuggers and any help would 
> > be greatly
> > > appreciated. 
> > > 
> > > Brian M. Boerner
> > > System Software Developer
> > > Adaptec, Inc.
> > > Nashua, NH 03060
> > > (603) 579-4625
> > > 
> > 
> 




More information about the kdb mailing list