lkcd
[Top] [All Lists]

Suggestion Box + SMP blues

To: "'Matt D. Robinson'" <yakker@xxxxxxxxxxxxxx>, "Schaal, Richard" <richard.schaal@xxxxxxxxx>
Subject: Suggestion Box + SMP blues
From: "Schaal, Richard" <richard.schaal@xxxxxxxxx>
Date: Mon, 17 Sep 2001 17:26:28 -0700
Cc: "'lkcd@xxxxxxxxxxx'" <lkcd@xxxxxxxxxxx>
Sender: owner-lkcd@xxxxxxxxxxx
Hi Matt,

In doing my development and testing, now that the dump recovery seems to be
working, I find my disk is filling
pretty rapidly because the same dump is recovered more than once. - I dump
to a separate device and not to the
swap area.  I wonder if the dump save step shouldn't set some sort of flag
in the dump header on the dump device
that would say this is a stale dump, which might need some --force flag in
order to "save" it again.

I seem to be dumping ok on my SMP system when I have a relatively simple
"oops" to cause a dump for testing, but with increased activity and possible
multiple processor panics, the dump is still failing with the console
messages pretty well scrambled - apparently messages being intermixed from
multiple processors.  Here's a sample...


Red Hat Linux release 7.1 (Seawolf)
Kernel 2.4.8 on an 8-processor i686

dopey login: (scsi0:A:1:0): Locking max tag count at 64
U<n1a<>1b<l1eU> nUna<t<>b11U>nalb<oa<>1e1U bln> l>UethneoaaUU  tnb
tnhblaalbnoe 
de l ltaeehtobaloaon  n  edkdlth e hhrataonod enad llneee lnl kkhhed a el
aereN 
krknnkndeederlUelnlrn neerLNLe e lUnl k Lp ekeNLoleUiernL  lnLpNrtNUUoneee
iLlNl
LL UL nr  tpNLN  pdpUeLoroLLi   ni tpopniao
dneio tntreaLt te ir ndtrtev re e prirdvdo iredee
tefnedrriuarertteenfeelcerf ur
eaarel fre eefddrdeeeraerrernene aenftnecsccd cereseevdire  nre at tca
uetvsaa0l
it  sa0  vr ia00tdvrdt0ua0rut0i0 l0rvt00 ia0ea0rdlu0tdsua s0ard
l0 e
 s0dal s0 pi p0r0:pp
0ri0ni0ent0:s0ts0i0ie
 
s0nn0s g0 p 0g0e0i0e 0s00s0p e000:0000
0ir
00 i
pp0n0tr:000ii
n0n0

c820
npg pr ieren i sipen pttrs:iii pnnr0tg0i0i n
pg:e0ini nd00*ep0 =00c
1001090040619e06010c8
0
0
041908810040g :i p00c
i910118:p1
144:81
*=dppe9c8990*1=eee0 0> *p =
 ==0dp<*ed0p d10e0 e  0==0 =0>0O20c
00
000001101
104100dppd0*d0ep0e
ede  = = =*  =0000P :   C   0*=0 *00p1p00d0e
0 
0>*0=p0d 0d0e0e
:

Oddly enough, if you take every third or fourth character, you can assemble
some of the common 
error messages. :-)

I'll take a look at the panic and dump path to see if there's a window of
opportunity for the processors to 
wander about after a panic.

Regards,
Richard






<Prev in Thread] Current Thread [Next in Thread>