From owner-lkcd@oss.sgi.com Fri Jun 1 01:10:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f518AJs25105 for lkcd-outgoing; Fri, 1 Jun 2001 01:10:19 -0700 Received: from rdgxch03.veritas.com ([62.172.234.2]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f518AEh25080 for ; Fri, 1 Jun 2001 01:10:14 -0700 Received: by RDGXCH03 with Internet Mail Service (5.5.2653.19) id ; Fri, 1 Jun 2001 09:07:36 +0100 Message-ID: <11BD31824E1DD511BCB000508BB9F75206BFFE@RDGXCH04> From: Simon Falvey To: "'lkcd@oss.sgi.com'" Subject: Problems with 2.4.4 crash dumps Date: Fri, 1 Jun 2001 09:09:59 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi. This may be a bit of a newbie question. I have only just discovered lkcd and I think I am going to get a lot of value out of it in my job (surely not? work related linux who ever heard of such a thing). I applied by hand the patch for 2.4.1 to the 2.4.4 kernel running under RH7.1. I have also placed a hook under the Magic sysrq key sequence hook a call to panic() on demand (This will be very useful for diagnosing hung systems). The system falls into the dump_execute routine which then calls alloc_kiovec in the section "start walking through the page tables ". The system then does not return (to dump_execute) from this. On the screen it reports a BUG in slab.c line 1073 which is a call to BUG(). It then returns back to panic() (I think) without finishing the dump but it does finish the panic and reboot the system. Such that.. /* debug: print markers to trace the problem */ printk("Marker 1 "); /* start walking through the page tables */ if (alloc_kiovec(1, &dump_iobuf)) { printk("Marker 2 "); printk("\n" KERN_WARNING "alloc_kiovec() failed!"); } else { printk("Marker 4 "); Neither Marker 2 nor Marker 4 are reached. I am entirely satisfied that the patch has applied correctly. The configuration of vmdump is a level 4 with compression to /dev/hda6 (the swap dev). Has anything similar been seen before? Has any one any ideas about the problem? Also any idea when the lkcd code will make the standard linux code base? Thanks Simon Simon Falvey Online Product Support Specialist VERITAS Software UK. Tel: +44 118 918 8105 From owner-lkcd@oss.sgi.com Fri Jun 1 07:54:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51EsZv26339 for lkcd-outgoing; Fri, 1 Jun 2001 07:54:35 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51EsXh26330 for ; Fri, 1 Jun 2001 07:54:33 -0700 Received: from f03n07e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 1.0) with ESMTP id AAA154428 for ; Sat, 2 Jun 2001 00:43:57 +1000 From: bsuparna@in.ibm.com Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f03n07e.au.ibm.com (8.8.8m3/NCO v4.96) with SMTP id AAA14618 for ; Sat, 2 Jun 2001 00:53:37 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256A5E.0051CD99 ; Sat, 2 Jun 2001 00:53:30 +1000 X-Lotus-FromDomain: IBMIN@IBMAU To: lkcd@oss.sgi.com Message-ID: Date: Fri, 1 Jun 2001 20:14:07 +0530 Subject: Possibility of system state changes during wait_kio ? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello, Looking at the vmdump code, here is something that puzzles me. I'm not sure if I'm missing something obvious here. Since right now dump involves wait_kio calls, which involves a context switch to another runnable process, isn't there a chance of the memory state changing whilst the dump is going on. Couldn't the dump become inconsistent, or not correctly reflect the state of the system when the incident that triggered the dump happened ? (Since interrupts aren't turned off, even that could affect the state ... but to a lesser extent, I guess) I had actually started with looking into the smp_send_stop issue and the more generic issue of getting a consistent system snapshot (as accurately reflecting the state at the time of the system crash as possible), when this question came to mind. BTW, is there some work going on in this area ? Or have the issues been sorted out already ? Matt you had mentioned that you were working on a specialized IDE driver for dump, to avoid having to go through the normal kio/raw i/o path in the kernel. Is that still in the plan ? Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 2525 From owner-lkcd@oss.sgi.com Fri Jun 1 12:40:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51JedL07116 for lkcd-outgoing; Fri, 1 Jun 2001 12:40:39 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51Jech07113 for ; Fri, 1 Jun 2001 12:40:38 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f51Jc7K17952; Fri, 1 Jun 2001 12:38:07 -0700 Message-ID: <3B17EF2D.376E64DA@alacritech.com> Date: Fri, 01 Jun 2001 12:38:21 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Howell, David P" CC: Dave Craft , lkcd@oss.sgi.com Subject: Re: LKCD 3.1.2 released ... References: <10C8636AE359D4119118009027AE998705EA3FB1@FMSMSX34> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Howell, David P" wrote: > > Matt, > Thanks for the workaround, I stubbed out the smp_send_stop() calls in > panic() > and trap() and now I'm getting a full dump. Any word on when the official > fix > for RH 7.1 will be out? I've checked the LKCD page for updates a few times > and > didn't yet see an update posted. I'm finishing up the conflicts to traps.c, then I'll release it. I'll try to have it tonight. BTW, anyone still using linux-2.4.0 or linux-2.4.1 trees? If so, I'll keep those up to date with the latest set of patches. Otherwise, at some point I'm going to drop them. --Matt > > Regards, > Dave Howell > > -----Original Message----- > From: Matt D. Robinson [mailto:yakker@alacritech.com] > Sent: Thursday, May 17, 2001 2:20 PM > To: Dave Craft > Cc: lkcd@oss.sgi.com > Subject: Re: LKCD 3.1.2 released ... > > Dave Craft wrote: > > > > Any more info on timeline for availability of RH 7.1 > > diffs? The diff version I've produced (based off the > > official 2.4.1 kernel diff) is hanging right after attempting > > to dump the header. Probably something I've done but > > I won't spend the time debugging it if you've got something > > official coming out shortly. > > > > Regards, > > dave > > I'm working on it. I just got 7.1 installed. To get this > to work, you just have to turn off smp_send_stop() in both > panic.c and arch/i386/kernel/traps.c (the places where the > patch adds the code). > > I'm considering someone's workaround, or just to comment out > the disable_local_APIC(). Not sure what the right thing to > do is yet. > > --Matt > > > > > > >"Matt D. Robinson" wrote: > > >> > > >> The latest LKCD update is available. The only > > >> changes are to the lkcdutils RPM for x86. I'm still > > >> testing 2.4.3 kernels (and RH7.1), once that's done, > > >> I'll spin put out the kernel patches in a 3.1.3 > > >> release. > > >> > > >> I'm trying to address the smp_send_stop() issue, so > > >> please bear with me. > > >> > > >> --Matt > > > > > >For those of you that downloaded lkcdutils-1.0-3 ... > > >Please get lkcdutils-1.0-4, there's one additional fix > > >for reverse wraparound in some graphics screens, and > > >also I've fixed a spec file error for those people > > >trying to do 'rpm --rebuild'. > > > > > >--Matt > > > > > > > -- > > Mail : dave@austin.ibm.com Phone : 512-838-8248 > > I am Jack's email closing From owner-lkcd@oss.sgi.com Fri Jun 1 16:00:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f51N0l003408 for lkcd-outgoing; Fri, 1 Jun 2001 16:00:47 -0700 Received: from dreamintek.com ([211.181.154.135]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f51N0jh03405 for ; Fri, 1 Jun 2001 16:00:46 -0700 Received: (qmail 6145 invoked from network); 1 Jun 2001 22:56:49 -0000 Received: from unknown (HELO tachoi) (211.181.154.245) by 211.181.154.135 with SMTP; 1 Jun 2001 22:56:49 -0000 Message-ID: <001a01c0eaee$86aee9e0$f59ab5d3@dreamintek.com> From: =?ks_c_5601-1987?B?w9bFwr7P?= To: Subject: lkcd has bug ? Date: Sat, 2 Jun 2001 07:59:34 +0900 MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_NextPart_000_0017_01C0EB39.F67E27E0" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-lkcd@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. ------=_NextPart_000_0017_01C0EB39.F67E27E0 Content-Type: text/plain; charset="ks_c_5601-1987" Content-Transfer-Encoding: base64 SGkNCg0KSSBpbnN0YWxsZWQgbGtjZCBhdCBrZXJuZWwgdmVyc2lvbiAyLjQuMSBhbmQgMi40LjUu IA0KVGhlbiwgSSBtYWRlIGtlcm5lbCBwYW5pYy4gbGtjZCB0cnllZCB0byBkdW1wIGNvcmUsIGJ1 dCBtYWRlIGEgcHJvYmxlbS4NCg0KVGhlIGxhc3QgbWVzc2FnZSB3YXMNCiJEdW1waW5nIHRvIGRl dmljZSAweDMwMyBbaWRlMCgzLDMpXSAuLi5rZXJuZWwgQlVHIGF0IHNsYWIuYzoxMDk1Ig0KU3lz dGVtIHdhcyBoYWx0ZWQuIEkgaGFkIHRvIHB1c2ggcmVzZXQgYnV0dG9uLg0KDQpXaGF0IGlzIHRo ZSBwcm9ibGVtPw0KDQpUaGFua3MNCg0KVGFlIEFtIENIT0kNCg== ------=_NextPart_000_0017_01C0EB39.F67E27E0 Content-Type: text/html; charset="ks_c_5601-1987" Content-Transfer-Encoding: base64 PCFET0NUWVBFIEhUTUwgUFVCTElDICItLy9XM0MvL0RURCBIVE1MIDQuMCBUcmFuc2l0aW9uYWwv L0VOIj4NCjxIVE1MPjxIRUFEPg0KPE1FVEEgaHR0cC1lcXVpdj1Db250ZW50LVR5cGUgY29udGVu dD0idGV4dC9odG1sOyBjaGFyc2V0PWtzX2NfNTYwMS0xOTg3Ij4NCjxNRVRBIGNvbnRlbnQ9Ik1T SFRNTCA1LjUwLjQxMzQuMTAwIiBuYW1lPUdFTkVSQVRPUj4NCjxTVFlMRT48L1NUWUxFPg0KPC9I RUFEPg0KPEJPRFkgYmdDb2xvcj0jZmZmZmZmPg0KPERJVj48Rk9OVCBzaXplPTI+SGk8L0ZPTlQ+ PC9ESVY+DQo8RElWPjxGT05UIHNpemU9Mj48L0ZPTlQ+Jm5ic3A7PC9ESVY+DQo8RElWPjxGT05U IHNpemU9Mj5JIGluc3RhbGxlZCBsa2NkIGF0IGtlcm5lbCB2ZXJzaW9uIDIuNC4xIGFuZCAyLjQu NS4gDQo8L0ZPTlQ+PC9ESVY+DQo8RElWPjxGT05UIHNpemU9Mj5UaGVuLCBJIG1hZGUga2VybmVs IHBhbmljLiBsa2NkIHRyeWVkIHRvIGR1bXAgY29yZSwgYnV0IG1hZGUgYSANCnByb2JsZW0uPC9G T05UPjwvRElWPg0KPERJVj48Rk9OVCBzaXplPTI+PC9GT05UPiZuYnNwOzwvRElWPg0KPERJVj48 Rk9OVCBzaXplPTI+VGhlIGxhc3QgbWVzc2FnZSB3YXM8L0ZPTlQ+PC9ESVY+DQo8RElWPjxGT05U IHNpemU9Mj4iRHVtcGluZyB0byBkZXZpY2UgMHgzMDMgW2lkZTAoMywzKV0gLi4ua2VybmVsIEJV RyBhdCANCnNsYWIuYzoxMDk1IjwvRk9OVD48L0RJVj4NCjxESVY+PEZPTlQgc2l6ZT0yPlN5c3Rl bSB3YXMgaGFsdGVkLiBJIGhhZCB0byBwdXNoIHJlc2V0IGJ1dHRvbi48L0ZPTlQ+PC9ESVY+DQo8 RElWPjxGT05UIHNpemU9Mj48L0ZPTlQ+Jm5ic3A7PC9ESVY+DQo8RElWPjxGT05UIHNpemU9Mj5X aGF0IGlzIHRoZSBwcm9ibGVtPzwvRk9OVD48L0RJVj4NCjxESVY+PEZPTlQgc2l6ZT0yPjwvRk9O VD4mbmJzcDs8L0RJVj4NCjxESVY+PEZPTlQgc2l6ZT0yPlRoYW5rczwvRk9OVD48L0RJVj4NCjxE SVY+PEZPTlQgc2l6ZT0yPjwvRk9OVD4mbmJzcDs8L0RJVj4NCjxESVY+PEZPTlQgc2l6ZT0yPlRh ZSBBbSBDSE9JPC9GT05UPjwvRElWPjwvQk9EWT48L0hUTUw+DQo= ------=_NextPart_000_0017_01C0EB39.F67E27E0-- From owner-lkcd@oss.sgi.com Mon Jun 4 00:35:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f547Z9F17072 for lkcd-outgoing; Mon, 4 Jun 2001 00:35:09 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f547Z8h17068 for ; Mon, 4 Jun 2001 00:35:08 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f547aWh25969; Mon, 4 Jun 2001 00:36:32 -0700 Date: Mon, 4 Jun 2001 00:36:32 -0700 (PDT) From: "Matt D. Robinson" To: cc: Matt Robinson , Subject: Re: Problems with 2.4.4 crash dumps Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Simon Falvey wrote: > Hi. > > This may be a bit of a newbie question. I have only just discovered lkcd and > I think I am going to get a lot of value out of it in my job (surely not? > work related linux who ever heard of such a thing). If you're in support, it can be useful ... :) > I applied by hand the patch for 2.4.1 to the 2.4.4 kernel running under > RH7.1. I have also placed a hook under the Magic sysrq key sequence hook a > call to panic() on demand (This will be very useful for diagnosing hung > systems). > > The system falls into the dump_execute routine which then calls > alloc_kiovec in the section "start walking through the page tables ". The > system then does not return (to dump_execute) from this. On the screen it > reports a BUG in slab.c line 1073 which is a call to BUG(). It then returns > back to panic() (I think) without finishing the dump but it does finish the > panic and reboot the system. Such that.. > > /* debug: print markers to trace the problem */ > printk("Marker 1 "); > /* start walking through the page tables */ > if (alloc_kiovec(1, &dump_iobuf)) { > printk("Marker 2 "); > printk("\n" KERN_WARNING > "alloc_kiovec() failed!"); > } else { > printk("Marker 4 "); > > Neither Marker 2 nor Marker 4 are reached. Two things changed - the alloc_kiovec() functionality changed between 2.4.1 and 2.4.4. Since I've been porting the patches forward, I'm going to move the alloc_kiovec() code up into the dump_open() function so this problem doesn't occur again. That way you don't have to worry about the changes to the kiovec code. > I am entirely satisfied that the patch has applied correctly. The > configuration of vmdump is a level 4 with compression to /dev/hda6 (the swap > dev). > > Has anything similar been seen before? Has any one any ideas about the > problem? > Also any idea when the lkcd code will make the standard linux code base? I haven't seen it before, but then again, I haven't tried this on 2.4.4 (yet). I've built a 2.4.2 patch, and I'm building a kernel now. If this patch works, I'll put these on oss.sgi.com now. > Thanks > > Simon I hope this helps. More later tonight ... > Simon Falvey > Online Product Support Specialist > VERITAS Software UK. > Tel: +44 118 918 8105 --Matt From owner-lkcd@oss.sgi.com Mon Jun 4 00:49:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f547no118817 for lkcd-outgoing; Mon, 4 Jun 2001 00:49:50 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f547nmh18810 for ; Mon, 4 Jun 2001 00:49:48 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f547pDq26028; Mon, 4 Jun 2001 00:51:13 -0700 Date: Mon, 4 Jun 2001 00:51:13 -0700 (PDT) From: "Matt D. Robinson" To: cc: , Matt Robinson Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk > Hello, > > Looking at the vmdump code, here is something that puzzles me. > I'm not sure if I'm missing something obvious here. > > Since right now dump involves wait_kio calls, which involves a context > switch to another runnable process, isn't there a chance of the memory > state changing whilst the dump is going on. Couldn't the dump become > inconsistent, or not correctly reflect the state of the system when the > incident that triggered the dump happened ? (Since interrupts aren't turned > off, even that could affect the state ... but to a lesser extent, I guess) Yes -- the whole point behind adding smp_send_stop() into the panic() and die_if_kernel() mechanism was to avoid having other processes run while the dump was taking place. I didn't see a good hook in the scheduler to say, "Okay, hold off, don't run any other jobs except mine", and putting smp_send_stop() into place messes up both x86 and ia64 systems, due to the local APIC being disabled (meaning, if your system crashes on a CPU other than 0, you're toast). This leads to the second problem -- even if you do stop all other system processes and are able to disable interrupts to most devices, you can't write out to disk in a "raw" fashion. Kiobufs are a hack at best as far as raw I/O is concerned. It's just a page grouping mechanism for good s/g stuff, IMHO. Linux is immature as far as raw device output is concerned. > I had actually started with looking into the smp_send_stop issue and the > more generic issue of getting a consistent system snapshot (as accurately > reflecting the state at the time of the system crash as possible), when > this question came to mind. BTW, is there some work going on in this area ? > Or have the issues been sorted out already ? There are two ways to do this: 1) Stop all system activity, shut down interrupts as much as possible, and dump all of memory to disk. 2) Stop the system immediately, reset the system, and on the way back up, early in the boot process, dump the memory to disk either at bios or in the setup of the kernel. Both mechanisms have their problems in Linux. I don't like the second solution, because not every system (most, in fact) preserve memory state between system resets. The first solution is as close as I can get at this point to saving the memory dump accurately, and even with that, we can have problems in some circumstances. For example, what if you crash in a disk interrupt handler? > Matt you had mentioned that you were working on a specialized IDE driver > for dump, to avoid having to go through the normal kio/raw i/o path in the > kernel. Is that still in the plan ? Yes, although I sent it off to Andre Hedrick, and he sent me a single line response saying (basically), "Why would you ever want to do that?" Needless to say, it wasn't very encouraging. I have it, it's written, and it works on my disk here in the office, but I haven't tried it on multiple IDE disks, or different IDE controllers, etc. It's basically untested outside my office. :) The best solution (IMHO) is to create: raw device table raw device handles (open(), read(), write(), etc.) disk device driver handles (ide_rw_open(), etc.) Right now, ide-disk.c interfacing to ide.c is horrific. Putting in my raw disk mechanism is again, doing things in a non-elegant way, but it does get the job done. Anyway, I have something, you're more than welcome to look it over and tell me what you think. I was hoping to get Andre's impression on things, but given the way the kernel development has been going on lately, I'm never sure what's going to get in. > Regards > Suparna --Matt From owner-lkcd@oss.sgi.com Mon Jun 4 09:46:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f54GkOl10501 for lkcd-outgoing; Mon, 4 Jun 2001 09:46:24 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f54GkOh10498 for ; Mon, 4 Jun 2001 09:46:24 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f54GhnK31214; Mon, 4 Jun 2001 09:43:53 -0700 Message-ID: <3B1BBA7F.17389309@alacritech.com> Date: Mon, 04 Jun 2001 09:42:39 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: =?iso-8859-1?Q?=C3=D6=C5=C2=BE=CF?= CC: lkcd@oss.sgi.com Subject: Re: lkcd has bug ? References: <001a01c0eaee$86aee9e0$f59ab5d3@dreamintek.com> Content-Type: text/plain; charset=iso-8859-1 X-MIME-Autoconverted: from 8bit to quoted-printable by smtp.alacritech.com id f54GhnK31214 Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f54GkOh10499 Sender: owner-lkcd@oss.sgi.com Precedence: bulk > ÃÖÅÂ¾Ï wrote: > > Hi > > I installed lkcd at kernel version 2.4.1 and 2.4.5. > Then, I made kernel panic. lkcd tryed to dump core, but made a problem. > > The last message was > "Dumping to device 0x303 [ide0(3,3)] ...kernel BUG at slab.c:1095" > System was halted. I had to push reset button. > > What is the problem? > > Thanks > > Tae Am CHOI This is most likely do to the free_kiovec() call in drivers/block/vmdump.c. This by itself can cause problems when you exit out. Comment out that function call, and you should be okay. Note that you might have problems with 2.4.5 -- I've tested most of them this morning, and panic()s work fine for dumping and analyzing, but analyzing doesn't work on Oops cases. I'm going to upload the patches to oss.sgi.com now, and I'll fix the lkcdutils RPM next. --Matt From owner-lkcd@oss.sgi.com Mon Jun 4 23:40:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f556eG718524 for lkcd-outgoing; Mon, 4 Jun 2001 23:40:16 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f556eFh18521 for ; Mon, 4 Jun 2001 23:40:16 -0700 Received: from alacritech.com (localhost.localdomain [127.0.0.1]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f556fqg01258 for ; Mon, 4 Jun 2001 23:41:52 -0700 Message-ID: <3B1C7F30.B8660268@alacritech.com> Date: Mon, 04 Jun 2001 23:41:52 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: LKCD 3.1.3 available ... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk I've created patches for: linux-2.4.2 linux-2.4.3 linux-2.4.4 I haven't tested against RH7.1 trees yet (for linux-2.4.2-2), but I have tried out the linux-2.4.4 tree against a RH 7.1 installed system, and it died in 'kudzu', believe it or not (got a great dump, too ...) So the patches should work just fine. In any case, you can get it from: http://oss.sgi.com/projects/lkcd/download/current/v2.4-kernel-patches Note I didn't fix the 3.1.3/README file yet, and I can't fix it until tomorrow. Let me know if you have any problems. Fixed issues: Moved alloc_kiovec() from dump_execute() to dump_open() (this should be sufficient for now -- although highmem is still a concern) Removed all smp_send_stop() calls. Sure, for i386, you can leave it in, but it was breaking ia64. So I decided to remove it until a more complete solution is found, even if it is architecture-dependant. Thanks, all. This should correct most of the little annoyances people were seeing with the 2.4.[234] series. BTW, is it just me, or is linux-2.4.5 broken in general? There were lots of problems with just building the tree, much less running against it. --Matt P.S. If people need a linux-2.4.0 or linux-2.4.1 diff, let me know. I didn't hear back from anyone. From owner-lkcd@oss.sgi.com Tue Jun 5 14:47:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55LlPw07049 for lkcd-outgoing; Tue, 5 Jun 2001 14:47:25 -0700 Received: from mx.webfountain.com (mx.digitalfountain.com [209.219.233.39]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55LlMh07029 for ; Tue, 5 Jun 2001 14:47:22 -0700 Received: (qmail 27651 invoked from network); 5 Jun 2001 21:47:13 -0000 Received: from mail.intranet (10.1.1.37) by mx.digitalfountain.com with SMTP; 5 Jun 2001 21:47:13 -0000 Received: from belarus (belarus.intranet [10.1.3.53]) by mail.intranet (8.9.3/8.9.3) with SMTP id OAA29959; Tue, 5 Jun 2001 14:46:47 -0700 X-Authentication-Warning: mail.intranet: Host belarus.intranet [10.1.3.53] claimed to be belarus From: "Michael Walfish" To: "Matt D. Robinson" , Cc: "Yoel Inbar" Subject: RE: LKCD 3.1.3 available ... Date: Tue, 5 Jun 2001 14:59:01 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal In-Reply-To: <3B1C7F30.B8660268@alacritech.com> Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt D. Robinson wrote: > Let me know if you have any problems. Hi Matt, We're big fans of lkcd (so far it's been really easy to use and understand). Any help you can provide is greatly appreciated. I applied the 2.4.2 patch this morning. Here are some observations: (dmesg and relevant parts of our .config at the end of this mail) 1) no problems when I force a kernel crash inside a user process 2) when I force a crash inside an interrupt (for a device driver), I get the following, in order: -->standard oops message -->interesting message, below -->another oops for the second CPU -->system reset, presumably driven by the lkcd patch Thanks again, Mike Walfish ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ +++ Dumping to device 0x831 [sd(8,49)] ... Writing dump header ...Scheduling in interrupt kernel BUG at sched.c:681! wait_on_irq, CPU 1: irq: 1 [ 1 0 ] bh: 1 [ 0 1 ] Stack dumps: CPU 0: CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 e7d2d9a0 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 c011ddb5 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 c025c84c Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ ++++ ---------------------------------------------------------------------------- ----- CONFIG_X86=y CONFIG_ISA=y CONFIG_UID16=y CONFIG_MODULES=y CONFIG_KMOD=y CONFIG_MPENTIUMIII=y CONFIG_X86_WP_WORKS_OK=y CONFIG_X86_INVLPG=y CONFIG_X86_CMPXCHG=y CONFIG_X86_BSWAP=y CONFIG_X86_POPAD_OK=y CONFIG_X86_TSC=y CONFIG_X86_GOOD_APIC=y CONFIG_X86_PGE=y CONFIG_X86_USE_PPRO_CHECKSUM=y CONFIG_X86_MSR=y CONFIG_X86_CPUID=y CONFIG_NOHIGHMEM=y CONFIG_MTRR=y CONFIG_SMP=y CONFIG_HAVE_DEC_LOCK=y CONFIG_NET=y CONFIG_X86_IO_APIC=y CONFIG_X86_LOCAL_APIC=y CONFIG_PCI=y CONFIG_PCI_GOANY=y CONFIG_PCI_BIOS=y CONFIG_PCI_DIRECT=y CONFIG_PCI_NAMES=y CONFIG_HOTPLUG=y CONFIG_SYSVIPC=y CONFIG_SYSCTL=y CONFIG_KCORE_ELF=y CONFIG_BINFMT_AOUT=y CONFIG_BINFMT_ELF=y CONFIG_BINFMT_MISC=y CONFIG_PM=y CONFIG_PNP=y CONFIG_ISAPNP=y CONFIG_BLK_DEV_FD=y CONFIG_BLK_CPQ_DA=y CONFIG_BLK_DEV_LOOP=m CONFIG_MD=y CONFIG_BLK_DEV_MD=y CONFIG_MD_RAID0=y CONFIG_PACKET=y CONFIG_NETFILTER=y CONFIG_UNIX=y CONFIG_INET=y CONFIG_IP_MULTICAST=y CONFIG_IP_NF_CONNTRACK=m CONFIG_IP_NF_FTP=m CONFIG_IP_NF_IPTABLES=m CONFIG_IP_NF_MATCH_LIMIT=m CONFIG_IP_NF_MATCH_MAC=m CONFIG_IP_NF_MATCH_MARK=m CONFIG_IP_NF_MATCH_MULTIPORT=m CONFIG_IP_NF_MATCH_TOS=m CONFIG_IP_NF_MATCH_STATE=m CONFIG_IP_NF_FILTER=m CONFIG_IP_NF_TARGET_REJECT=m CONFIG_IP_NF_NAT=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IP_NF_TARGET_MASQUERADE=m CONFIG_IP_NF_TARGET_REDIRECT=m CONFIG_IP_NF_NAT_FTP=m CONFIG_IP_NF_MANGLE=m CONFIG_IP_NF_TARGET_TOS=m CONFIG_IP_NF_TARGET_MARK=m CONFIG_IP_NF_TARGET_LOG=m CONFIG_IP_NF_COMPAT_IPCHAINS=m CONFIG_IP_NF_NAT_NEEDED=y CONFIG_IDE=y CONFIG_BLK_DEV_IDE=y CONFIG_BLK_DEV_IDEDISK=y CONFIG_BLK_DEV_IDECD=y CONFIG_BLK_DEV_CMD640=y CONFIG_BLK_DEV_RZ1000=y CONFIG_BLK_DEV_IDEPCI=y CONFIG_IDEPCI_SHARE_IRQ=y CONFIG_BLK_DEV_IDE_MODES=y CONFIG_SCSI=y CONFIG_BLK_DEV_SD=y CONFIG_SCSI_DEBUG_QUEUES=y CONFIG_SCSI_MULTI_LUN=y CONFIG_SCSI_CONSTANTS=y CONFIG_SCSI_AIC7XXX=y CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y CONFIG_SCSI_SYM53C8XX=y CONFIG_NETDEVICES=y CONFIG_DUMMY=m CONFIG_NET_ETHERNET=y CONFIG_NET_PCI=y CONFIG_EEPRO100=m CONFIG_VT=y CONFIG_VT_CONSOLE=y CONFIG_SERIAL=y CONFIG_SERIAL_CONSOLE=y CONFIG_UNIX98_PTYS=y CONFIG_MOUSE=y CONFIG_PSMOUSE=y CONFIG_AUTOFS_FS=m CONFIG_AUTOFS4_FS=m CONFIG_FAT_FS=m CONFIG_MSDOS_FS=m CONFIG_VFAT_FS=m CONFIG_ISO9660_FS=m CONFIG_JOLIET=y CONFIG_NTFS_FS=m CONFIG_PROC_FS=y CONFIG_DEVPTS_FS=y CONFIG_EXT2_FS=y CONFIG_NFS_FS=m CONFIG_NFS_V3=y CONFIG_NFSD=m CONFIG_NFSD_V3=y CONFIG_SUNRPC=m CONFIG_LOCKD=m CONFIG_LOCKD_V4=y CONFIG_SMB_FS=m CONFIG_MSDOS_PARTITION=y CONFIG_SMB_NLS=y CONFIG_NLS=y CONFIG_NLS_CODEPAGE_437=m CONFIG_NLS_ISO8859_1=m CONFIG_VGA_CONSOLE=y CONFIG_VMDUMP=y ---------------------------------------------------------------------------- ----- PCI: Probing PCI hardware PCI: Discovered peer bus 03 PCI: Device 00:00 not found by BIOS PCI: Device 00:01 not found by BIOS PCI: Device 00:78 not found by BIOS isapnp: Scanning for Pnp cards... isapnp: No Plug & Play device found Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Starting kswapd v1.8 pty: 256 Unix98 ptys configured block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue Uniform Multi-Platform E-IDE driver Revision: 6.31 ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 ServerWorks OSB4: chipset revision 0 ServerWorks OSB4: not 100% native mode: will probe irqs later hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 hda: ATAPI 24X CD-ROM drive, 128kB Cache Uniform CD-ROM driver Revision: 3.12 Floppy drive(s): fd0 is 1.44M FDC 0 is a National Semiconductor PC87306 Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ SERIAL_PCI ISAPNP enabled ttyS00 at 0x03f8 (irq = 4) is a 16550A ttyS01 at 0x02f8 (irq = 3) is a 16550A SCSI subsystem driver Revision: 1.00 sym53c8xx: at PCI bus 0, device 1, function 0 sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) sym53c8xx: 53c1510D detected sym53c8xx: at PCI bus 0, device 1, function 1 sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) sym53c8xx: 53c1510D detected sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 sym53c1510D-0: ID 7, Fast-40, Parity Checking sym53c1510D-0: on-chip RAM at 0xc3efe000 sym53c1510D-0: restart (scsi reset). sym53c1510D-0: Downloading SCSI SCRIPTS. sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 sym53c1510D-1: ID 7, Fast-40, Parity Checking sym53c1510D-1: on-chip RAM at 0xc3efc000 sym53c1510D-1: restart (scsi reset). sym53c1510D-1: Downloading SCSI SCRIPTS. scsi0 : sym53c8xx - version 1.6b scsi1 : sym53c8xx - version 1.6b Vendor: COMPAQ Model: BD0186398C Rev: BC1P Type: Direct-Access ANSI SCSI revision: 02 Vendor: COMPAQ Model: BD0186398C Rev: BC1P Type: Direct-Access ANSI SCSI revision: 02 Vendor: COMPAQ Model: BD0186398C Rev: BC1P Type: Direct-Access ANSI SCSI revision: 02 Vendor: COMPAQ Model: BD0186398C Rev: BC1P Type: Direct-Access ANSI SCSI revision: 02 sym53c1510D-1-<0,0>: tagged command queue depth set to 32 sym53c1510D-1-<1,0>: tagged command queue depth set to 32 sym53c1510D-1-<2,0>: tagged command queue depth set to 32 sym53c1510D-1-<3,0>: tagged command queue depth set to 32 Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<0,0>: wide: wide=1 chg=0. sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<0,0>: wide: wide=1 chg=0. sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) Partition check: sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<1,0>: wide: wide=1 chg=0. sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<1,0>: wide: wide=1 chg=0. sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) sdb: sdb1 sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<2,0>: wide: wide=1 chg=0. sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<2,0>: wide: wide=1 chg=0. sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) sdc: sdc1 sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<3,0>: wide: wide=1 chg=0. sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. sym53c1510D-1-<3,0>: wide: wide=1 chg=0. sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) sdd: sdd1 raid0 personality registered md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md.c: sizeof(mdp_super_t) = 4096 autodetecting RAID arrays (read) sdb1's sb offset: 17781632 [events: 00000001] (read) sdc1's sb offset: 17781632 [events: 00000001] autorun ... considering sdc1 ... adding sdc1 ... adding sdb1 ... created md0 bind bind running: now! sdc1's event counter: 00000001 sdb1's event counter: 00000001 md0: max total readahead window set to 4096k md0: 2 data-disks, max readahead per data-disk: 2048k raid0: looking at sdb1 raid0: comparing sdb1(17781248) with sdb1(17781248) raid0: END raid0: ==> UNIQUE raid0: 1 zones raid0: looking at sdc1 raid0: comparing sdc1(17781248) with sdb1(17781248) raid0: EQUAL raid0: FINAL 1 zones zone 0 checking sdb1 ... contained as device 0 (17781248) is smallest!. checking sdc1 ... contained as device 1 zone->nb_dev: 2, size: 35562496 current zone offset: 17781248 done. raid0 : md_size is 35562496 blocks. raid0 : conf->smallest->size is 35562496 blocks. raid0 : nb_zone is 1. raid0 : Allocating 8 bytes for hash. md: updating md0 RAID superblock on device sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 . ... autorun DONE. NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 8192 buckets, 64Kbytes TCP: Hash tables configured (established 65536 bind 65536) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. VFS: Mounted root (ext2 filesystem) readonly. Freeing unused kernel memory: 232k freed Adding Swap: 2097136k swap-space (priority -1) eepro100.c:v1.09j-t 9/29/99 Donald Becker http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin and others eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. Board assembly 010101-034, Physical connectors present: RJ45 Primary interface chip i82555 PHY #1. General self-test: passed. Serial sub-system self-test: passed. Internal registers self-test: passed. ROM checksum self-test: passed (0x04f4518b). vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 15:08:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55M8pW10574 for lkcd-outgoing; Tue, 5 Jun 2001 15:08:51 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55M8mh10560 for ; Tue, 5 Jun 2001 15:08:48 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 15:23:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55MNJf12955 for lkcd-outgoing; Tue, 5 Jun 2001 15:23:19 -0700 Received: from mx.webfountain.com (mx.digitalfountain.com [209.219.233.39]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55MNJh12951 for ; Tue, 5 Jun 2001 15:23:19 -0700 Received: (qmail 28898 invoked from network); 5 Jun 2001 22:23:13 -0000 Received: from mail.intranet (10.1.1.37) by mx.digitalfountain.com with SMTP; 5 Jun 2001 22:23:13 -0000 Received: from belarus (belarus.intranet [10.1.3.53]) by mail.intranet (8.9.3/8.9.3) with SMTP id PAA01021; Tue, 5 Jun 2001 15:22:47 -0700 X-Authentication-Warning: mail.intranet: Host belarus.intranet [10.1.3.53] claimed to be belarus From: "Michael Walfish" To: "Matt D. Robinson" Cc: , "Yoel Inbar" Subject: RE: LKCD 3.1.3 available ... Date: Tue, 5 Jun 2001 15:35:02 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal In-Reply-To: <3B1D5775.98615A6A@alacritech.com> Sender: owner-lkcd@oss.sgi.com Precedence: bulk > Can you send me the code that you're > running to generate the interrupt crash? The driver depends on a custom NIC, and the relevant snippets are below. If you'd like the whole driver, please let me know. > is this an SMP or non-SMP system? SMP. Two x86 processors. Not using interrupt-CPU affinities feature (so the interrupts can go to both CPUs). -Mike +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ int nic_open(struct net_device* dev) { [snip] if (request_irq(dev->irq, &nic_interrupt, SA_SHIRQ, dev->name, dev)) { printk(KERN_DEBUG "REPLICATOR request_irq failed\n"); return -EAGAIN; } [snip] } /* a user-level program causes this flag to be set to 1 */ int global_crash_flag = 0; /* these interrupts are generated by the hardware */ void nic_interrupt(int irq, void *dev_id, struct pt_regs *regs) { volatile char* foo = 0; [snip] if (global_crash_flag) *foo = 1; [snip] } +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ From owner-lkcd@oss.sgi.com Tue Jun 5 15:32:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55MWCd13759 for lkcd-outgoing; Tue, 5 Jun 2001 15:32:12 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55MWAh13756 for ; Tue, 5 Jun 2001 15:32:10 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55MTkK24112; Tue, 5 Jun 2001 15:29:46 -0700 Message-ID: <3B1D5CFB.57719CBE@alacritech.com> Date: Tue, 05 Jun 2001 15:28:11 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: multipart/mixed; boundary="------------E26E38C44AF036229D3D4E54" Sender: owner-lkcd@oss.sgi.com Precedence: bulk This is a multi-part message in MIME format. --------------E26E38C44AF036229D3D4E54 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Michael Walfish wrote: > > > Can you send me the code that you're > > running to generate the interrupt crash? > > The driver depends on a custom NIC, and the relevant snippets are below. If > you'd like the whole driver, please let me know. > > > is this an SMP or non-SMP system? > > SMP. Two x86 processors. Not using interrupt-CPU affinities feature (so the > interrupts can go to both CPUs). If you want to try something I'm thinking about, try this patch. Let me know if this makes any difference whatsoever (UNTESTED, BTW). I'm sure you'll get the gist of what I'm trying to do ... just to stop schedule() from happening. Of course, this can lead to lots of other "issues" which I haven't even tried out yet. Otherwise, I'll work on this later tonight when I get home. --Matt > > -Mike > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > int nic_open(struct net_device* dev) > { > [snip] > if (request_irq(dev->irq, &nic_interrupt, SA_SHIRQ, dev->name, dev)) { > printk(KERN_DEBUG "REPLICATOR request_irq failed\n"); > return -EAGAIN; > } > [snip] > } > > /* a user-level program causes this flag to be set to 1 */ > int global_crash_flag = 0; > > /* these interrupts are generated by the hardware */ > void nic_interrupt(int irq, void *dev_id, struct pt_regs *regs) > { > volatile char* foo = 0; > > [snip] > > if (global_crash_flag) > *foo = 1; > > [snip] > > } > > +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ --------------E26E38C44AF036229D3D4E54 Content-Type: text/plain; charset=us-ascii; name="patch" Content-Transfer-Encoding: 7bit Content-Disposition: inline; filename="patch" --- kernel/sched.c.orig Tue Jun 5 15:21:57 2001 +++ kernel/sched.c Tue Jun 5 15:11:50 2001 @@ -105,16 +105,27 @@ struct kernel_stat kstat; +#if defined(CONFIG_VMDUMP) +extern int dump_in_progress; +#endif + #ifdef CONFIG_SMP #define idle_task(cpu) (init_tasks[cpu_number_map(cpu)]) #define can_schedule(p,cpu) ((!(p)->has_cpu) && \ +#if defined(CONFIG_VMDUMP) + (!dump_in_progress) && \ +#endif ((p)->cpus_allowed & (1 << cpu))) #else #define idle_task(cpu) (&init_task) +#if defined(CONFIG_VMDUMP) +#define can_schedule(p,cpu) (!(dump_in_progress)) +#else #define can_schedule(p,cpu) (1) +#endif #endif @@ -512,6 +523,11 @@ struct list_head *tmp; int this_cpu, c; +#if defined(CONFIG_VMDUMP) + if (dump_in_progress) { + goto dump_in_progress; + } +#endif if (!current->active_mm) BUG(); need_resched_back: prev = current; @@ -686,6 +702,9 @@ scheduling_in_interrupt: printk("Scheduling in interrupt\n"); BUG(); +#if defined(CONFIG_VMDUMP) +dump_in_progress: +#endif return; } --- drivers/block/vmdump.c.orig Tue Jun 5 15:24:21 2001 +++ drivers/block/vmdump.c Tue Jun 5 15:25:46 2001 @@ -99,6 +99,7 @@ char dumpdev_name[PATH_MAX]; /* the name of the dump device */ struct file *dump_file; /* the file pointer of the dump device */ kdev_t dumpdev; /* the actual kdev_t device number */ +int dump_in_progress = 0; /* are we really dumping now? */ void *dump_page_buf; /* dump page buffer for memcpy()! */ int dump_level = DUMP_KERN; /* the current dump level */ int dump_compress_pages = TRUE; /* whether to try to compress each page */ @@ -916,6 +917,7 @@ /* we set this to FALSE so we don't ever re-enter this code! */ dump_okay = FALSE; + dump_in_progress = 1; /* make sure the dump_level variable is in range (0 - 4) */ if ((dump_level > DUMP_ALL) || (dump_level < DUMP_NONE)) { --------------E26E38C44AF036229D3D4E54-- From owner-lkcd@oss.sgi.com Tue Jun 5 16:10:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f55NAJT20202 for lkcd-outgoing; Tue, 5 Jun 2001 16:10:19 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f55NAGh20196 for ; Tue, 5 Jun 2001 16:10:16 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 17:10:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f560AMk29757 for lkcd-outgoing; Tue, 5 Jun 2001 17:10:22 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f560AJh29746 for ; Tue, 5 Jun 2001 17:10:19 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 18:10:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f561AIn04800 for lkcd-outgoing; Tue, 5 Jun 2001 18:10:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f561AFh04793 for ; Tue, 5 Jun 2001 18:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 19:10:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f562AJP12630 for lkcd-outgoing; Tue, 5 Jun 2001 19:10:19 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f562AFh12615 for ; Tue, 5 Jun 2001 19:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 21:10:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f564AJk27344 for lkcd-outgoing; Tue, 5 Jun 2001 21:10:19 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f564AGh27340 for ; Tue, 5 Jun 2001 21:10:16 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 22:10:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f565AIJ04226 for lkcd-outgoing; Tue, 5 Jun 2001 22:10:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f565AFh04219 for ; Tue, 5 Jun 2001 22:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Tue Jun 5 23:10:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f566AIt10484 for lkcd-outgoing; Tue, 5 Jun 2001 23:10:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f566AFh10476 for ; Tue, 5 Jun 2001 23:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Wed Jun 6 00:10:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f567AMv17099 for lkcd-outgoing; Wed, 6 Jun 2001 00:10:22 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f567AJh17089 for ; Wed, 6 Jun 2001 00:10:19 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Wed Jun 6 02:10:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f569AII01227 for lkcd-outgoing; Wed, 6 Jun 2001 02:10:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f569AFh01208 for ; Wed, 6 Jun 2001 02:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Wed Jun 6 03:10:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56AAMN09862 for lkcd-outgoing; Wed, 6 Jun 2001 03:10:22 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56AAJh09847 for ; Wed, 6 Jun 2001 03:10:19 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Wed Jun 6 04:10:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56BAJn19976 for lkcd-outgoing; Wed, 6 Jun 2001 04:10:19 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56BAFh19964 for ; Wed, 6 Jun 2001 04:10:15 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f55M6CK04515; Tue, 5 Jun 2001 15:06:14 -0700 Message-ID: <3B1D5775.98615A6A@alacritech.com> Date: Tue, 05 Jun 2001 15:04:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish CC: lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Walfish wrote: > > Matt D. Robinson wrote: > > Let me know if you have any problems. > > Hi Matt, > > We're big fans of lkcd (so far it's been really easy to use and understand). > Any help you can provide is greatly appreciated. > > I applied the 2.4.2 patch this morning. Here are some observations: > (dmesg and relevant parts of our .config at the end of this mail) > > 1) no problems when I force a kernel crash inside a user process > 2) when I force a crash inside an interrupt (for a device driver), I get the > following, in order: > -->standard oops message > -->interesting message, below > -->another oops for the second CPU > -->system reset, presumably driven by the lkcd patch Hmmm, I haven't seen this. I can see from schedule() as to what might be happening, though. Can you send me the code that you're running to generate the interrupt crash? Basically what this means is, removing smp_send_stop() has messed things up. I might have something that can fix this, but first I need to know, is this an SMP or non-SMP system? ... back to the drawing board. I'll fix this quickly. --Matt > Thanks again, > Mike Walfish > > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > +++ > Dumping to device 0x831 [sd(8,49)] ... > Writing dump header ...Scheduling in interrupt > kernel BUG at sched.c:681! > > wait_on_irq, CPU 1: > irq: 1 [ 1 0 ] > bh: 1 [ 0 1 ] > Stack dumps: > CPU 0: > CPU 1:e7ff7ebc c0204a93 00000001 00000020 00000000 c010a56d c0204aa8 > e7d2d9a0 > 00000003 00000001 c0179e7a 00000000 c0179e44 00000000 c02aac60 > c011ddb5 > 00000000 00000000 00000020 00000000 c02aac60 e7d2db1c c0179df2 > c025c84c > Call Trace: [] [] [] [] [] > [ 011afd1>] [] > [] [] [] [] [] > [ 70>] [] [] > [] [] [] [] > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ > ++++ > > ---------------------------------------------------------------------------- > ----- > > CONFIG_X86=y > CONFIG_ISA=y > CONFIG_UID16=y > CONFIG_MODULES=y > CONFIG_KMOD=y > CONFIG_MPENTIUMIII=y > CONFIG_X86_WP_WORKS_OK=y > CONFIG_X86_INVLPG=y > CONFIG_X86_CMPXCHG=y > CONFIG_X86_BSWAP=y > CONFIG_X86_POPAD_OK=y > CONFIG_X86_TSC=y > CONFIG_X86_GOOD_APIC=y > CONFIG_X86_PGE=y > CONFIG_X86_USE_PPRO_CHECKSUM=y > CONFIG_X86_MSR=y > CONFIG_X86_CPUID=y > CONFIG_NOHIGHMEM=y > CONFIG_MTRR=y > CONFIG_SMP=y > CONFIG_HAVE_DEC_LOCK=y > CONFIG_NET=y > CONFIG_X86_IO_APIC=y > CONFIG_X86_LOCAL_APIC=y > CONFIG_PCI=y > CONFIG_PCI_GOANY=y > CONFIG_PCI_BIOS=y > CONFIG_PCI_DIRECT=y > CONFIG_PCI_NAMES=y > CONFIG_HOTPLUG=y > CONFIG_SYSVIPC=y > CONFIG_SYSCTL=y > CONFIG_KCORE_ELF=y > CONFIG_BINFMT_AOUT=y > CONFIG_BINFMT_ELF=y > CONFIG_BINFMT_MISC=y > CONFIG_PM=y > CONFIG_PNP=y > CONFIG_ISAPNP=y > CONFIG_BLK_DEV_FD=y > CONFIG_BLK_CPQ_DA=y > CONFIG_BLK_DEV_LOOP=m > CONFIG_MD=y > CONFIG_BLK_DEV_MD=y > CONFIG_MD_RAID0=y > CONFIG_PACKET=y > CONFIG_NETFILTER=y > CONFIG_UNIX=y > CONFIG_INET=y > CONFIG_IP_MULTICAST=y > CONFIG_IP_NF_CONNTRACK=m > CONFIG_IP_NF_FTP=m > CONFIG_IP_NF_IPTABLES=m > CONFIG_IP_NF_MATCH_LIMIT=m > CONFIG_IP_NF_MATCH_MAC=m > CONFIG_IP_NF_MATCH_MARK=m > CONFIG_IP_NF_MATCH_MULTIPORT=m > CONFIG_IP_NF_MATCH_TOS=m > CONFIG_IP_NF_MATCH_STATE=m > CONFIG_IP_NF_FILTER=m > CONFIG_IP_NF_TARGET_REJECT=m > CONFIG_IP_NF_NAT=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IP_NF_TARGET_MASQUERADE=m > CONFIG_IP_NF_TARGET_REDIRECT=m > CONFIG_IP_NF_NAT_FTP=m > CONFIG_IP_NF_MANGLE=m > CONFIG_IP_NF_TARGET_TOS=m > CONFIG_IP_NF_TARGET_MARK=m > CONFIG_IP_NF_TARGET_LOG=m > CONFIG_IP_NF_COMPAT_IPCHAINS=m > CONFIG_IP_NF_NAT_NEEDED=y > CONFIG_IDE=y > CONFIG_BLK_DEV_IDE=y > CONFIG_BLK_DEV_IDEDISK=y > CONFIG_BLK_DEV_IDECD=y > CONFIG_BLK_DEV_CMD640=y > CONFIG_BLK_DEV_RZ1000=y > CONFIG_BLK_DEV_IDEPCI=y > CONFIG_IDEPCI_SHARE_IRQ=y > CONFIG_BLK_DEV_IDE_MODES=y > CONFIG_SCSI=y > CONFIG_BLK_DEV_SD=y > CONFIG_SCSI_DEBUG_QUEUES=y > CONFIG_SCSI_MULTI_LUN=y > CONFIG_SCSI_CONSTANTS=y > CONFIG_SCSI_AIC7XXX=y > CONFIG_AIC7XXX_TCQ_ON_BY_DEFAULT=y > CONFIG_SCSI_SYM53C8XX=y > CONFIG_NETDEVICES=y > CONFIG_DUMMY=m > CONFIG_NET_ETHERNET=y > CONFIG_NET_PCI=y > CONFIG_EEPRO100=m > CONFIG_VT=y > CONFIG_VT_CONSOLE=y > CONFIG_SERIAL=y > CONFIG_SERIAL_CONSOLE=y > CONFIG_UNIX98_PTYS=y > CONFIG_MOUSE=y > CONFIG_PSMOUSE=y > CONFIG_AUTOFS_FS=m > CONFIG_AUTOFS4_FS=m > CONFIG_FAT_FS=m > CONFIG_MSDOS_FS=m > CONFIG_VFAT_FS=m > CONFIG_ISO9660_FS=m > CONFIG_JOLIET=y > CONFIG_NTFS_FS=m > CONFIG_PROC_FS=y > CONFIG_DEVPTS_FS=y > CONFIG_EXT2_FS=y > CONFIG_NFS_FS=m > CONFIG_NFS_V3=y > CONFIG_NFSD=m > CONFIG_NFSD_V3=y > CONFIG_SUNRPC=m > CONFIG_LOCKD=m > CONFIG_LOCKD_V4=y > CONFIG_SMB_FS=m > CONFIG_MSDOS_PARTITION=y > CONFIG_SMB_NLS=y > CONFIG_NLS=y > CONFIG_NLS_CODEPAGE_437=m > CONFIG_NLS_ISO8859_1=m > CONFIG_VGA_CONSOLE=y > CONFIG_VMDUMP=y > > ---------------------------------------------------------------------------- > ----- > > PCI: Probing PCI hardware > PCI: Discovered peer bus 03 > PCI: Device 00:00 not found by BIOS > PCI: Device 00:01 not found by BIOS > PCI: Device 00:78 not found by BIOS > isapnp: Scanning for Pnp cards... > isapnp: No Plug & Play device found > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Starting kswapd v1.8 > pty: 256 Unix98 ptys configured > block: queued sectors max/low 426005kB/294933kB, 1280 slots per queue > Uniform Multi-Platform E-IDE driver Revision: 6.31 > ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx > ServerWorks OSB4: IDE controller on PCI bus 00 dev 79 > ServerWorks OSB4: chipset revision 0 > ServerWorks OSB4: not 100% native mode: will probe irqs later > hda: Compaq CRN-8241B, ATAPI CD/DVD-ROM drive > ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 > hda: ATAPI 24X CD-ROM drive, 128kB Cache > Uniform CD-ROM driver Revision: 3.12 > Floppy drive(s): fd0 is 1.44M > FDC 0 is a National Semiconductor PC87306 > Serial driver version 5.02 (2000-08-09) with MANY_PORTS SHARE_IRQ > SERIAL_PCI ISAPNP enabled > ttyS00 at 0x03f8 (irq = 4) is a 16550A > ttyS01 at 0x02f8 (irq = 3) is a 16550A > SCSI subsystem driver Revision: 1.00 > sym53c8xx: at PCI bus 0, device 1, function 0 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c8xx: at PCI bus 0, device 1, function 1 > sym53c8xx: setting PCI_COMMAND_PARITY...(fix-up) > sym53c8xx: 53c1510D detected > sym53c1510D-0: rev 0x2 on pci bus 0 device 1 function 0 irq 11 > sym53c1510D-0: ID 7, Fast-40, Parity Checking > sym53c1510D-0: on-chip RAM at 0xc3efe000 > sym53c1510D-0: restart (scsi reset). > sym53c1510D-0: Downloading SCSI SCRIPTS. > sym53c1510D-1: rev 0x2 on pci bus 0 device 1 function 1 irq 11 > sym53c1510D-1: ID 7, Fast-40, Parity Checking > sym53c1510D-1: on-chip RAM at 0xc3efc000 > sym53c1510D-1: restart (scsi reset). > sym53c1510D-1: Downloading SCSI SCRIPTS. > scsi0 : sym53c8xx - version 1.6b > scsi1 : sym53c8xx - version 1.6b > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > Vendor: COMPAQ Model: BD0186398C Rev: BC1P > Type: Direct-Access ANSI SCSI revision: 02 > sym53c1510D-1-<0,0>: tagged command queue depth set to 32 > sym53c1510D-1-<1,0>: tagged command queue depth set to 32 > sym53c1510D-1-<2,0>: tagged command queue depth set to 32 > sym53c1510D-1-<3,0>: tagged command queue depth set to 32 > Detected scsi disk sda at scsi1, channel 0, id 0, lun 0 > Detected scsi disk sdb at scsi1, channel 0, id 1, lun 0 > Detected scsi disk sdc at scsi1, channel 0, id 2, lun 0 > Detected scsi disk sdd at scsi1, channel 0, id 3, lun 0 > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<0,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<0,0>: wide: wide=1 chg=0. > sym53c1510D-1-<0,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<0,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<0,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sda: 35565080 512-byte hdwr sectors (18209 MB) > Partition check: > sda: sda1 sda2 sda3 sda4 < sda5 sda6 sda7 > > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<1,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<1,0>: wide: wide=1 chg=0. > sym53c1510D-1-<1,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<1,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<1,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdb: 35565080 512-byte hdwr sectors (18209 MB) > sdb: sdb1 > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<2,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<2,0>: wide: wide=1 chg=0. > sym53c1510D-1-<2,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<2,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<2,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdc: 35565080 512-byte hdwr sectors (18209 MB) > sdc: sdc1 > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: wide msgout: 1-2-3-1. > sym53c1510D-1-<3,0>: wide msgin: 1-2-3-1. > sym53c1510D-1-<3,0>: wide: wide=1 chg=0. > sym53c1510D-1-<3,0>: sync msgout: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync msg in: 1-3-1-a-1f. > sym53c1510D-1-<3,0>: sync: per=10 scntl3=0x90 scntl4=0x0 ofs=31 fak=0 chg=0. > sym53c1510D-1-<3,*>: FAST-40 WIDE SCSI 80.0 MB/s (25 ns, offset 31) > SCSI device sdd: 35565080 512-byte hdwr sectors (18209 MB) > sdd: sdd1 > raid0 personality registered > md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md.c: sizeof(mdp_super_t) = 4096 > autodetecting RAID arrays > (read) sdb1's sb offset: 17781632 [events: 00000001] > (read) sdc1's sb offset: 17781632 [events: 00000001] > autorun ... > considering sdc1 ... > adding sdc1 ... > adding sdb1 ... > created md0 > bind > bind > running: > now! > sdc1's event counter: 00000001 > sdb1's event counter: 00000001 > md0: max total readahead window set to 4096k > md0: 2 data-disks, max readahead per data-disk: 2048k > raid0: looking at sdb1 > raid0: comparing sdb1(17781248) with sdb1(17781248) > raid0: END > raid0: ==> UNIQUE > raid0: 1 zones > raid0: looking at sdc1 > raid0: comparing sdc1(17781248) with sdb1(17781248) > raid0: EQUAL > raid0: FINAL 1 zones > zone 0 > checking sdb1 ... contained as device 0 > (17781248) is smallest!. > checking sdc1 ... contained as device 1 > zone->nb_dev: 2, size: 35562496 > current zone offset: 17781248 > done. > raid0 : md_size is 35562496 blocks. > raid0 : conf->smallest->size is 35562496 blocks. > raid0 : nb_zone is 1. > raid0 : Allocating 8 bytes for hash. > md: updating md0 RAID superblock on device > sdc1 [events: 00000002](write) sdc1's sb offset: 17781632 > sdb1 [events: 00000002](write) sdb1's sb offset: 17781632 > . > ... autorun DONE. > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 8192 buckets, 64Kbytes > TCP: Hash tables configured (established 65536 bind 65536) > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > VFS: Mounted root (ext2 filesystem) readonly. > Freeing unused kernel memory: 232k freed > Adding Swap: 2097136k swap-space (priority -1) > eepro100.c:v1.09j-t 9/29/99 Donald Becker > http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html > eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin > and others > eth0: OEM i82557/i82558 10/100 Ethernet, 00:02:A5:34:96:CF, IRQ 15. > Board assembly 010101-034, Physical connectors present: RJ45 > Primary interface chip i82555 PHY #1. > General self-test: passed. > Serial sub-system self-test: passed. > Internal registers self-test: passed. > ROM checksum self-test: passed (0x04f4518b). > vmdump: dump device opened: 0x831 From owner-lkcd@oss.sgi.com Wed Jun 6 08:11:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f56FBaX26948 for lkcd-outgoing; Wed, 6 Jun 2001 08:11:36 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f56FBZh26945 for ; Wed, 6 Jun 2001 08:11:35 -0700 Received: from alacritech.com (localhost.localdomain [127.0.0.1]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f56FCrm02286; Wed, 6 Jun 2001 08:12:53 -0700 Message-ID: <3B1E4875.C90572B9@alacritech.com> Date: Wed, 06 Jun 2001 08:12:53 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Walfish , lkcd@oss.sgi.com, Yoel Inbar Subject: Re: LKCD 3.1.3 available ... References: <3B1D5775.98615A6A@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk I apologize for all the repeats. It appears our mail server is reflecting some E-mail, and we're looking into it. Did you get the patch, Michael? --Matt "Matt D. Robinson" wrote: > > Michael Walfish wrote: > > > > Matt D. Robinson wrote: > > > Let me know if you have any problems. > > > > Hi Matt, > > > > We're big fans of lkcd (so far it's been really easy to use and understand). > > Any help you can provide is greatly appreciated. > > > > I applied the 2.4.2 patch this morning. Here are some observations: > > (dmesg and relevant parts of our .config at the end of this mail) > > > > 1) no problems when I force a kernel crash inside a user process > > 2) when I force a crash inside an interrupt (for a device driver), I get the > > following, in order: > > -->standard oops message > > -->interesting message, below > > -->another oops for the second CPU > > -->system reset, presumably driven by the lkcd patch > > Hmmm, I haven't seen this. I can see from schedule() as to what > might be happening, though. Can you send me the code that you're > running to generate the interrupt crash? > > Basically what this means is, removing smp_send_stop() has messed > things up. > > I might have something that can fix this, but first I need to know, > is this an SMP or non-SMP system? > > ... back to the drawing board. I'll fix this quickly. > > --Matt > > > Thanks again, > > Mike Walfish From owner-lkcd@oss.sgi.com Thu Jun 7 00:55:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f577tvN27426 for lkcd-outgoing; Thu, 7 Jun 2001 00:55:57 -0700 Received: from chexch04.uk.veritas.com ([62.172.234.2]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f577tuh27423 for ; Thu, 7 Jun 2001 00:55:56 -0700 Received: by CHEXCH04 with Internet Mail Service (5.5.2653.19) id <20JDBAJK>; Thu, 7 Jun 2001 08:59:20 +0100 Message-ID: <11BD31824E1DD511BCB000508BB9F75206C006@RDGXCH04> From: Simon Falvey To: "Lkcd (E-mail)" Subject: crashing under 2.4.4 Date: Thu, 7 Jun 2001 08:55:50 +0100 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, When a system is panicing and writting a crash dump file. Is it expected that the kernel keeps servicing user processes? It appears that you are able to switch virtual consoles, log in, run things while the vmdump process is doing its stuff. Surely this will severly confuse the dump image being created, it won't be so much of a snapshot of the system when the panic took place as a smudge. Also, I have noticed that inside lcrash >> ? id --- produces the help text for dis >> ? idcmds -- produces an error also other help texts such as bt produce help texts for commands for which they are possible abbreviations however as the helptext does not say this it could be that they too are coming up with the wrong information. Cheers Simon Simon Falvey Online Product Support Specialist VERITAS Software UK. Tel: +44 118 918 8105 From owner-lkcd@oss.sgi.com Thu Jun 7 11:10:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f57IA4E31869 for lkcd-outgoing; Thu, 7 Jun 2001 11:10:04 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with SMTP id f57IA3h31863 for ; Thu, 7 Jun 2001 11:10:03 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f57I7fM13809; Thu, 7 Jun 2001 11:07:41 -0700 Message-ID: <3B1FC259.1EEED9F0@alacritech.com> Date: Thu, 07 Jun 2001 11:05:13 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Simon Falvey CC: "Lkcd (E-mail)" Subject: Re: crashing under 2.4.4 References: <11BD31824E1DD511BCB000508BB9F75206C006@RDGXCH04> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Simon Falvey wrote: > > Hi, > > When a system is panicing and writting a crash dump file. Is it expected > that the kernel keeps servicing user processes? It appears that you are able > to switch virtual consoles, log in, run things while the vmdump process is > doing its stuff. Surely this will severly confuse the dump image being > created, it won't be so much of a snapshot of the system when the panic took > place as a smudge. That's what the patch I sent to Michael is trying to address. It attempts to stop all scheduling when a dump is taking place, and tries to avoid schedule() being activated inside an interrupt handler. I/O on terminals is a good point, however. > Also, I have noticed that inside lcrash > > >> ? id --- produces the help text for dis > >> ? idcmds -- produces an error > > also other help texts such as bt produce help texts for commands for which > they are possible abbreviations however as the helptext does not say this it > could be that they too are coming up with the wrong information. These were meant to document 'kdb' commands which area really just aliases inside of 'lcrash'. For example, old System 5 crash used 'dump', but other analyzers used 'od', etc. The point was to try and give people the commands they were used to. I can fix the help commands. 'idcmds' shouldn't give an error. > Cheers > > Simon > > Simon Falvey > Online Product Support Specialist > VERITAS Software UK. > Tel: +44 118 918 8105 Thanks, Simon. --Matt From owner-lkcd@oss.sgi.com Tue Jun 12 07:28:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5CESZS05719 for lkcd-outgoing; Tue, 12 Jun 2001 07:28:35 -0700 Received: from e4.ny.us.ibm.com ([32.97.182.104]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5CESXV05716 for ; Tue, 12 Jun 2001 07:28:33 -0700 Received: from f03n05e.au.ibm.com (f03n05s.au.ibm.com [9.185.166.73]) by e4.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id KAA177574 for ; Tue, 12 Jun 2001 10:26:33 -0400 From: bsuparna@in.ibm.com Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f03n05e.au.ibm.com (8.11.1m3/NCO v4.96) with SMTP id f5CEQax27790 for ; Wed, 13 Jun 2001 00:26:36 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256A69.004F53EF ; Wed, 13 Jun 2001 00:26:28 +1000 X-Lotus-FromDomain: IBMIN@IBMAU To: "Matt D. Robinson" cc: lkcd@oss.sgi.com Message-ID: Date: Tue, 12 Jun 2001 19:45:39 +0530 Subject: Re: Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Sorry I was travelling last week, and couldn't respond earlier. I can see that you've already thought through most of the the same issues that have been troubling me, on this subject :-) >> Hello, >> >> Looking at the vmdump code, here is something that puzzles me. >> I'm not sure if I'm missing something obvious here. >> >> Since right now dump involves wait_kio calls, which involves a context >> switch to another runnable process, isn't there a chance of the memory >> state changing whilst the dump is going on. Couldn't the dump become >> inconsistent, or not correctly reflect the state of the system when the >> incident that triggered the dump happened ? (Since interrupts aren't turned >> off, even that could affect the state ... but to a lesser extent, I guess) > >Yes -- the whole point behind adding smp_send_stop() into the panic() >and die_if_kernel() mechanism was to avoid having other processes run >while the dump was taking place. I didn't see a good hook in the >scheduler to say, "Okay, hold off, don't run any other jobs except >mine", and putting smp_send_stop() into place messes up both x86 and >ia64 systems, due to the local APIC being disabled (meaning, if your >system crashes on a CPU other than 0, you're toast). > In your response to Simon Falvey's question, you mentioned a patch that you had sent that addresses this problem of stopping all scheduling at the time of a dump. That sounds interesting. Could I take a look at it too ? (Have you already posted it somewhere ?) >This leads to the second problem -- even if you do stop all other >system processes and are able to disable interrupts to most devices, >you can't write out to disk in a "raw" fashion. Kiobufs are a hack >at best as far as raw I/O is concerned. It's just a page grouping >mechanism for good s/g stuff, IMHO. Linux is immature as far as >raw device output is concerned. > >> I had actually started with looking into the smp_send_stop issue and the >> more generic issue of getting a consistent system snapshot (as accurately >> reflecting the state at the time of the system crash as possible), when >> this question came to mind. BTW, is there some work going on in this area ? >> Or have the issues been sorted out already ? > >There are two ways to do this: > >1) Stop all system activity, shut down interrupts as much as possible, > and dump all of memory to disk. > >2) Stop the system immediately, reset the system, and on the way back > up, early in the boot process, dump the memory to disk either at > bios or in the setup of the kernel. > >Both mechanisms have their problems in Linux. I don't like the second >solution, because not every system (most, in fact) preserve memory state >between system resets. The first solution is as close as I can get at >this point to saving the memory dump accurately, and even with that, >we can have problems in some circumstances. > >For example, what if you crash in a disk interrupt handler? > Yes,the situation of a crash in an interrupt handler was the first problem that came to mind -- made me look in further and come up with the question in the first place. Its a tricky thing to handle. >> Matt you had mentioned that you were working on a specialized IDE driver >> for dump, to avoid having to go through the normal kio/raw i/o path in the >> kernel. Is that still in the plan ? > >Yes, although I sent it off to Andre Hedrick, and he sent me a >single line response saying (basically), "Why would you ever want >to do that?" :-) > >Needless to say, it wasn't very encouraging. I have it, it's written, >and it works on my disk here in the office, but I haven't tried it on >multiple IDE disks, or different IDE controllers, etc. It's basically >untested outside my office. :) > >The best solution (IMHO) is to create: > > raw device table > raw device handles (open(), read(), write(), etc.) > disk device driver handles (ide_rw_open(), etc.) > >Right now, ide-disk.c interfacing to ide.c is horrific. Putting in >my raw disk mechanism is again, doing things in a non-elegant way, >but it does get the job done. > >Anyway, I have something, you're more than welcome to look it over >and tell me what you think. I was hoping to get Andre's impression >on things, but given the way the kernel development has been going >on lately, I'm never sure what's going to get in. > Thanks. Yes, I'd like to take a look at what you have. Not that I know much about IDE drivers (I have been looking at the generic block layer code and raw i/o for a while, but not specific drivers so far), but it'll help me understand the approach that you've taken a little better. Ideally it would be nice to arrive at some kind of solution that isn't very device specific ... (oh yes, I realize its not all that easy ) so we know it'll just work on all kinds of h/w rightaway, so to say . Or worst case, something very minimalist for each kind of dump device(sort of like AIX does now). What you have may be a good start. Do let me know how I can access it. > >--Matt Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 2525 From owner-lkcd@oss.sgi.com Wed Jun 13 00:17:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5D7HMN19647 for lkcd-outgoing; Wed, 13 Jun 2001 00:17:22 -0700 Received: from e1.ny.us.ibm.com ([32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5D7HLP19644 for ; Wed, 13 Jun 2001 00:17:21 -0700 Received: from f03n05e.au.ibm.com (f03n05s.au.ibm.com [9.185.166.73]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id DAA180980 for ; Wed, 13 Jun 2001 03:15:14 -0400 From: bsuparna@in.ibm.com Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f03n05e.au.ibm.com (8.11.1m3/NCO v4.96) with SMTP id f5D7G1x34358 for ; Wed, 13 Jun 2001 17:16:02 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256A6A.00276B89 ; Wed, 13 Jun 2001 17:10:34 +1000 X-Lotus-FromDomain: IBMIN@IBMAU To: "Matt D. Robinson" cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 13 Jun 2001 12:29:50 +0530 Subject: Re: Your patch to stop scheduling during dump Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, Oops, I'd missed going through the note you'd sent to Michael with the patch. Found it now. Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 2525 From owner-lkcd@oss.sgi.com Wed Jun 13 09:35:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5DGZrq15412 for lkcd-outgoing; Wed, 13 Jun 2001 09:35:53 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5DGZrP15408 for ; Wed, 13 Jun 2001 09:35:53 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f5DGXaM02236; Wed, 13 Jun 2001 09:33:36 -0700 Message-ID: <3B279493.65F1F8E9@alacritech.com> Date: Wed, 13 Jun 2001 09:28:03 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: bsuparna@in.ibm.com CC: lkcd@oss.sgi.com, yakker@alacritech.com Subject: Re: Your patch to stop scheduling during dump References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk bsuparna@in.ibm.com wrote: > > Matt, > > Oops, I'd missed going through the note you'd sent to Michael with the > patch. > Found it now. > > Regards > Suparna This will only correct the case where scheduling is called during an interrupt. I have a 2.4.2 patch that will correct that problem as well as the dump interrupts case. If you'd like to give that a shot (or anyone), let me know. I've been working with this for the last two evenings, with some small success. Larry Sendlosky also provided a patch for changing when we flush the TLBs, along with what we do for disable_local_APIC(). I have to incorporate those changes as well (thanks, Larry). The other good thing to do is to add in 'int dump_cpu', to indicate which CPU is dumping. It's basically the disabling of the local APIC, but I decided to move all of this into arch/i386/kernel/vmdump.c, instead of depending on arch/i386/kernel/{traps,smp}.c for stuff. I guess one thing we need to also do is to create some sort of GKHI mechanism for allowing people to get a dump and continue normal operation. Suparna, do you (and the rest of the IBM folks) need write access to the SourceForge tree? --Matt From owner-lkcd@oss.sgi.com Thu Jun 14 06:57:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5EDvBQ19306 for lkcd-outgoing; Thu, 14 Jun 2001 06:57:11 -0700 Received: from e1.ny.us.ibm.com ([32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5EDv9P19302 for ; Thu, 14 Jun 2001 06:57:09 -0700 Received: from f03n05e.au.ibm.com (f03n05s.au.ibm.com [9.185.166.73]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id JAA339560; Thu, 14 Jun 2001 09:53:23 -0400 From: bsuparna@in.ibm.com Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f03n05e.au.ibm.com (8.11.1m3/NCO v4.96) with SMTP id f5EDrtV78480; Thu, 14 Jun 2001 23:53:55 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256A6B.004C581B ; Thu, 14 Jun 2001 23:53:52 +1000 X-Lotus-FromDomain: IBMIN@IBMAU To: "Matt D. Robinson" cc: lkcd@oss.sgi.com, yakker@alacritech.com Message-ID: Date: Thu, 14 Jun 2001 19:13:08 +0530 Subject: Re: Your patch to stop scheduling during dump Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk >This will only correct the case where scheduling is called during >an interrupt. I have a 2.4.2 patch that will correct that problem >as well as the dump interrupts case. If you'd like to give that >a shot (or anyone), let me know. I've been working with this for >the last two evenings, with some small success. Larry Sendlosky >also provided a patch for changing when we flush the TLBs, along with >what we do for disable_local_APIC(). I have to incorporate those >changes as well (thanks, Larry). The other good thing to do is to >add in 'int dump_cpu', to indicate which CPU is dumping. > >It's basically the disabling of the local APIC, but I decided to >move all of this into arch/i386/kernel/vmdump.c, instead of >depending on arch/i386/kernel/{traps,smp}.c for stuff. > Sounds good. We would surely like to give it a shot. >I guess one thing we need to also do is to create some sort of >GKHI mechanism for allowing people to get a dump and continue >normal operation. > Dprobes today can be used as a way to trigger crash dump, as it lets people take a dump from within a probe handler. It calls dump_execute directly, so at the moment a system restart happens after the dump. However we've been playing around with changing things a little to get it to continue normal operation after a dump. In this case we only stop the other CPUs temporarily while the dump is going on. Its just sort of a hack right now - we've only tried it once on a 2 way system at the moment and it needs some work ... (I'm not sure if it covers all the possibilities we need to think of, even besides probes in interrupt handlers). I would like to take a look at what you and Larry have done for disable_local_APIC ... GKHI would fit in more as a mechanism for built-in (compiled-in) probe points (efficient/fast especially when the hooks are not active/enabled). It could for example be used for having some well-known/fixed trigger points in the kernel, from where a dump may get triggered if those points are enabled. (This can be used to customize the kind of events where a dump would get triggered automatically). We had in mind the possibility of using GKHI as a hooking mechanism for RAS facilities, i.e. having some common hooks built into the kernel - as an alternative to patching the kernel for various facilities independently, though that's probably a different issue. >Suparna, do you (and the rest of the IBM folks) need write access >to the SourceForge tree? > Yes, that would be good. >--Matt Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 2525 From owner-lkcd@oss.sgi.com Mon Jun 25 15:13:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5PMDMU17824 for lkcd-outgoing; Mon, 25 Jun 2001 15:13:22 -0700 Received: from mx.webfountain.com (mx.digitalfountain.com [209.219.233.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5PMDMV17820 for ; Mon, 25 Jun 2001 15:13:22 -0700 Received: (qmail 10211 invoked from network); 25 Jun 2001 22:13:16 -0000 Received: from mail.intranet (10.1.1.37) by mx.digitalfountain.com with SMTP; 25 Jun 2001 22:13:16 -0000 Received: from digitalfountain.com (yoel@ricci.intranet [10.1.3.25]) by mail.intranet (8.9.3/8.9.3) with ESMTP id PAA26043; Mon, 25 Jun 2001 15:12:49 -0700 X-Authentication-Warning: mail.intranet: Host yoel@ricci.intranet [10.1.3.25] claimed to be digitalfountain.com Message-ID: <3B37B761.6080803@digitalfountain.com> Date: Mon, 25 Jun 2001 15:12:49 -0700 From: Yoel Inbar User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.3-XFS i686; en-US; rv:0.9.1) Gecko/20010607 X-Accept-Language: en-us MIME-Version: 1.0 To: lkcd@oss.sgi.com CC: Michael Walfish Subject: intermittent crash dump failures Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello, I'm using lkcd 3.1.3 on a machine with 640 MB of RAM, and I'm dumping to a 512 MB swap partition. Sometimes, crash dumps will fail, and console messages like the following will be printed: attempt to access past end of device 08:02: rw=1, want=13439964, limit=498015 I do have DUMP_LEVEL=4, so I'm afraid I might be running out of room on the partition I'm dumping to. However: 1. I also have DUMP_COMPRESS_PAGES=1. 2. Dumps don't fail consistently. So I think there might be something else going on. Any help would be greatly appreciated. Thanks, Yoel From owner-lkcd@oss.sgi.com Mon Jun 25 16:41:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5PNfV727390 for lkcd-outgoing; Mon, 25 Jun 2001 16:41:31 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5PNfUV27387 for ; Mon, 25 Jun 2001 16:41:30 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f5PNb6s22658; Mon, 25 Jun 2001 16:37:06 -0700 Message-ID: <3B37CC30.DA704B20@alacritech.com> Date: Mon, 25 Jun 2001 16:41:36 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: Yoel Inbar CC: lkcd@oss.sgi.com, Michael Walfish Subject: Re: intermittent crash dump failures References: <3B37B761.6080803@digitalfountain.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Yoel Inbar wrote: > > Hello, > I'm using lkcd 3.1.3 on a machine with 640 MB of RAM, and I'm dumping to > a 512 MB swap partition. Sometimes, crash dumps will fail, and console > messages like the following will be printed: > attempt to access past end of device > 08:02: rw=1, want=13439964, limit=498015 > I do have DUMP_LEVEL=4, so I'm afraid I might be running out of room on > the partition I'm dumping to. However: > 1. I also have DUMP_COMPRESS_PAGES=1. > 2. Dumps don't fail consistently. > So I think there might be something else going on. Any help would be > greatly appreciated. > > Thanks, > > Yoel Two things: - I'm going to release a new patch on Wednesday that includes a few changes, one to throw back in a smp_send_stop() mechanism, addresses interrupts on CPU 0, turns off schedule(), and states which CPU is dumping (in dump_execute()). - There are now a number of individuals that will be working on LKCD, thanks to developers at IBM, SGI and other locations. We'll be starting to open up more discussion on this list as to how to improve LKCD exponentially over its current level. More on this to follow. Now, as to what you're seeing, I suspect you are trying to write beyond the end of your dump device. What's strange is, the return value from brw_kiovec() wasn't < 0 (in fact, it returned the wrong value). I'll have to look at this some more, but can you tell me the size of your dump device? Just 'cat /proc/swaps', if you are using your swap device. I'll get on this as soon as I can. :) --Matt