From owner-lkcd@oss.sgi.com Fri Mar 2 09:36:41 2001 Received: by oss.sgi.com id ; Fri, 2 Mar 2001 09:36:32 -0800 Received: from [65.193.106.66] ([65.193.106.66]:42521 "EHLO XCHANGESERVER.storigen.com") by oss.sgi.com with ESMTP id ; Fri, 2 Mar 2001 09:36:23 -0800 Received: from storigen.com (vmlager1.storigen.com [192.168.0.111]) by XCHANGESERVER.storigen.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id F6F4RC8S; Fri, 2 Mar 2001 12:36:16 -0500 Message-ID: <3A9FD789.5C32397B@storigen.com> Date: Fri, 02 Mar 2001 12:25:29 -0500 From: Larry Cohen X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: [Fwd: Undeliverable: Re: Cant seem to create crash dumps .... help] Content-Type: multipart/mixed; boundary="------------E51821107214A885D6A2499A" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing This is a multi-part message in MIME format. --------------E51821107214A885D6A2499A Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit I tried to get this to Matt but the message was bounced. -Larry Cohen --------------E51821107214A885D6A2499A Content-Type: message/rfc822 Content-Transfer-Encoding: 7bit Content-Disposition: inline Received: by XCHANGESERVER.storigen.com id <01C0A33E.669CB813@XCHANGESERVER.storigen.com>; Fri, 2 Mar 2001 12:29:57 -0500 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE0970D89CB@XCHANGESERVER.storigen.com> From: System Administrator To: Larry Cohen Subject: Undeliverable: Re: Cant seem to create crash dumps .... help Date: Fri, 2 Mar 2001 12:29:57 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----_=_NextPart_000_01C0A33E.669CB813" This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C0A33E.669CB813 Content-Type: text/plain; charset="iso-8859-1" Your message To: Matt D. Robinson Subject: Re: Cant seem to create crash dumps .... help Sent: Fri, 2 Mar 2001 12:18:45 -0500 did not reach the following recipient(s): Matt D. Robinson on Fri, 2 Mar 2001 12:29:57 -0500 The recipient name is not recognized The MTS-ID of the original message is: c=us;a= ;p=storigen systems;l=XCHANGESERVER0103021729F6F4RC8G MSEXCH:IMS:Storigen Systems Inc.:Lowell:XCHANGESERVER 3550 (000B09AA) 550 5.1.1 ... User unknown ------_=_NextPart_000_01C0A33E.669CB813 Content-Type: message/rfc822 Message-ID: <3A9FD5F5.82AF084F@storigen.com> From: Larry Cohen To: "Matt D. Robinson" Subject: Re: Cant seem to create crash dumps .... help Date: Fri, 2 Mar 2001 12:18:45 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Hi Matt, Sorry I did not get back sooner. Sadly the reason is I did not notice this message ... sigh. Anyhow it looks like "unused_list_lock" was 1 ... I remember diving into that and recalling that it was not locked. -Larry "Matt D. Robinson" wrote: > Larry Cohen wrote: > > > > Hi Matt, > > It seems gdb works in some capacity. I was able to break it just before the > > hang point and got the > > following trace : > > > > (gdb) bt > > #0 wait_kio (rw=1, nr=16, bh=0xcd797a38, size=4096) at buffer.c:2015 > > #1 0xc013d79b in brw_kiovec (rw=1, nr=1, iovec=0xc03831c8, dev=774, > > b=0xcd797c74, size=4096) at buffer.c:2147 > > #2 0xc019f836 in dump_kernel_write () at vmdump.c:558 > > #3 0xc019fb6d in dump_write_header () at vmdump.c:728 > > #4 0xc019fc3f in dump_execute_memdump (panic_str=0xc035d820 "testing crash", > > regs=0x0) at vmdump.c:780 > > #5 0xc019feb7 in dump_execute (panic_str=0xc035d820 "testing crash", regs=0x0) > > > > at vmdump.c:944 > > #6 0xc011a661 in panic (fmt=0xc02e3aa8 "testing crash") at panic.c:77 > > #7 0xc0236e54 in my_function(start=0xcd797f14) at my_function.c:74 > > #8 0xd0909070 in ?? () > > #9 0xc011bad6 in sys_init_module (name_user=0x8058580 "my_function_module", > > mod_user=0x8060920) at module.c:544 > > #10 0xc01092ff in system_call () at af_packet.c:1876 > > #11 0x804ab0e in ?? () at af_packet.c:1876 > > #12 0x804b1a7 in ?? () at af_packet.c:1876 > > #13 0x804b3f2 in ?? () at af_packet.c:1876 > > #14 0x400349cb in ?? () at af_packet.c:1876 > > Yea, looks like someone's holding one of the I/O locks in the > kiobuf path. Do you know what the value of unused_list_lock is? > It's very likely that there's nothing LKCD can do (in this > version of the driver) to resolve the problem, since it's the > kiobuf code that's locked up, not LKCD. :( > > --Matt > > > Matt D. Robinson" wrote: > > > > > Larry Cohen wrote: > > > > > > > > > > > > > > > > > Hi Matt, > > > > > > > > Thanks much for responding so quickly. > > > > I did check below and I believe everything is correct. > > > > > > > > /proc/sys/dumpdev contains /dev/vmdump. > > > > /dev/vmdump is a link /dev/hda6 which is the swap device. > > > > ls -l /dev/hda6 yields: > > > > brw-rw---- 1 root disk 3, 6 May 5 1998 /dev/hda6 > > > > swapon -s yeilds: > > > > [root@ipa vmdump]# swapon -s > > > > Filename Type Size Used Priority > > > > /dev/hda6 partition 2097136 0 -1 > > > > > > > > /proc/sys/kernel/panic is 5. > > > > > > > > rc.sysinit has been updated. > > > > > > > > I am running Linux 2.4.0 with the 3.1.1 patches so I presume I do not > > > > need the raw I/O patch. > > > > I asked the fellow who put the system together to mail me the hardware > > > > specs and I'll send those along > > > > as soon as I get them. It is an Intel machine. > > > > > > > > The output I see is: > > > > > > > > (gdb) c > > > > Continuing. > > > > CPU: 1 > > > > EIP: 0010:[] > > > > EFLAGS: 00010246 > > > > eax: 00000000 ebx: c7c36000 ecx: 00000010 edx: 00000000 > > > > esi: c89c4000 edi: 0000009c ebp: c7c97ef0 esp: c7c979a8 > > > > ds: 0018 es: 0018 ss: 0018 > > > > Process mount.sfs (pid: 1062, stackpage=c7c95000) > > > > Stack: c89e2a00 00000008 c0302adc c037d680 c037d640 50193d03 c037d680 > > > > 4088c800 > > > > c7c97ae0 c01da8d8 c7c97a14 c01da935 c037d680 cff26ae0 00000001 > > > > c037d640 > > > > c037d680 c1473e60 c0300980 20e466e0 0000003c 00000005 0000009c > > > > c037d640 > > > > Call Trace: [] [] [] [] > > > > [] [] [] > > > > [] [] [] [] [] > > > > > > > > Code: f3 ab 8b 85 28 fb ff ff 83 b8 14 10 00 00 00 75 11 68 40 2e > > > > Dumping to device 0x306 [ide0(3,6)] ... > > > > Writing dump header ... > > > > > > > > .... the system is wedged at this point and I can not break in with gdb. > > > > The hang appears to happen waiting in wait_on_buffer (). A few blocks > > > > appear to be written > > > > but then everything locks up. > > > > > > Looks like your configuration is correct. My presumption here is that > > > you're dying somewhere in the page cache code -- otherwise, you wouldn't > > > be hung up waiting to write out to disk. This means that someone's > > > holding the io_request_lock (or one of the other locks in the I/O space) > > > which is preventing us from being able to write out the dump pages. > > > > > > Since a dump may not be possible, what you should do is take the built > > > kernel, and run 'lcrash' on the live system, and disassemble the > > > instructions based on the "Call Trace" above, which should give you a > > > good idea as to what the stack trace at the time of the panic is. So > > > if you 'dis c01da8d8', you should see the function name where the problem > > > occurred. Do the same for the rest of the arguments (or just use > > > 'ksymoops' in the meantime). > > > > > > If this isn't a page cache/buffer/request queue lock scenario, then we > > > shouldn't have hung up. > > > > > > Oh, one other thing ... gdb probably won't work after you call down > > > into our function, because die() has been called, and you're beyond the > > > kgdb breakpoint area (and smp_send_stop() should have stopped all CPUs > > > except the one we are executing on). > > > > > > Let's see what the disassembled functions show ... let me know as > > > soon as you can. Thanks. :) > > > > > > --Matt > > > > > > > Thanks very much for your help, > > > > -Larry > > > > > > > > > Larry Cohen wrote: > > > > > > > > > > > > Hi, I'm pretty sure I have installed the lkcd patch and utilities ok > > > > > > but when I the system panics it > > > > > > will hang when trying to write out the dump to the swap device. > > > > > > My disks are IDE not SCSI is this a problem? > > > > > > Any other thoughts? > > > > > > > > > > > > Thanks, > > > > > > Larry Cohen > > > > > > > > > > Do you have the output from the console? It sounds like you > > > > > have everything configured correctly. If you run "/sbin/vmdump config", > > > > > are the right values showing up in /proc/sys/vmdump? IDE and SCSI > > > > > disks should work with the 3.1.1 patches. > > > > > > > > > > Typically there are problems when the right devices aren't configured > > > > > in /proc/sys/vmdump (which are updated when "/sbin/vmdump config" is > > > > > run), or /proc/sys/kernel/panic is zero, which means the system won't > > > > > reset after taking a dump. > > > > > > > > > > Note that you have to modify your /etc/rc.d/rc.sysinit (or like script > > > > > depending on the Linux variant you are running) to add the /sbin/vmdump > > > > > calls. Check out the README. > > > > > > > > > > If you've done all this and it still fails, send me the console output, > > > > > along with your hardware specifications, and we'll see what we can do > > > > > about getting to the bottom of your problem. Thanks, Larry. > > > > > > > > > > --Matt ------_=_NextPart_000_01C0A33E.669CB813-- --------------E51821107214A885D6A2499A-- From owner-lkcd@oss.sgi.com Wed Mar 7 08:18:46 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 08:18:26 -0800 Received: from [65.193.106.66] ([65.193.106.66]:28185 "EHLO XCHANGESERVER.storigen.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 08:18:24 -0800 Received: from storigen.com (vmlager1.storigen.com [192.168.0.72]) by XCHANGESERVER.storigen.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GNFLGQDQ; Wed, 7 Mar 2001 11:18:21 -0500 Message-ID: <3AA65F06.5C33EF45@storigen.com> Date: Wed, 07 Mar 2001 11:17:10 -0500 From: Larry Cohen X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: New information regarding hung crash dump Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 619 Lines: 22 In smp_send_stop(), I commented out the call to disable_local_APIC(), The core dumps now succeed. Note. I just read the archives and saw that the kernel patch was updated. I had already noticed the __SMP__ bug and fixed my local copy (forgot to mention this when talking with Matt). So I was really calling smp_send_stop() which called disable_local_APIC(). Any ideas on why this is causing me grief? Is commenting out the routine going to cause other major problems? Also, when I finally go to use lcrash I noticed that there were no stack traces. Am I missing something here? Thanks, Larry Cohen From owner-lkcd@oss.sgi.com Wed Mar 7 10:50:58 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 10:50:48 -0800 Received: from w032.z064001165.sjc-ca.dsl.cnc.net ([64.1.165.32]:6405 "EHLO www.aparity.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 10:50:31 -0800 Received: from alacritech.com (localhost [127.0.0.1]) by www.aparity.com (8.9.3/8.9.3) with ESMTP id KAA03900; Wed, 7 Mar 2001 10:49:12 -0800 Message-ID: <3AA682A8.1F7913E5@alacritech.com> Date: Wed, 07 Mar 2001 10:49:12 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.18 i686) X-Accept-Language: en MIME-Version: 1.0 To: Larry Cohen CC: lkcd@oss.sgi.com Subject: Re: New information regarding hung crash dump References: <3AA65F06.5C33EF45@storigen.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1063 Lines: 33 Larry Cohen wrote: > > In smp_send_stop(), I commented out the call to disable_local_APIC(), > > The core dumps now succeed. > > Note. I just read the archives and saw that the kernel patch was > updated. I had already noticed > the __SMP__ bug and fixed my local copy (forgot to mention this when > talking with Matt). So I was > really calling smp_send_stop() which called disable_local_APIC(). > > Any ideas on why this is causing me grief? > Is commenting out the routine going to cause other major problems? I don't know -- I'll have to look at it more closely to find out why (I'm tracking down another lcrash bug at the moment from Mark Price). Unless the APIC is affecting the IDE driver directly ... > Also, when I finally go to use lcrash I noticed that there were no > stack traces. Am I missing > something here? No stack traces at all, or no stack traces for the failing process? I corrected a KDB patch issue earlier which caused problems for the failing process, but not every stack trace. > Thanks, > > Larry Cohen --Matt From owner-lkcd@oss.sgi.com Wed Mar 7 12:38:19 2001 Received: by oss.sgi.com id ; Wed, 7 Mar 2001 12:38:09 -0800 Received: from [65.193.106.66] ([65.193.106.66]:55334 "EHLO XCHANGESERVER.storigen.com") by oss.sgi.com with ESMTP id ; Wed, 7 Mar 2001 12:37:42 -0800 Received: from storigen.com (vmlager1.storigen.com [192.168.0.72]) by XCHANGESERVER.storigen.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id GPXGH2ZV; Wed, 7 Mar 2001 15:37:40 -0500 Message-ID: <3AA69BD4.530E9F51@storigen.com> Date: Wed, 07 Mar 2001 15:36:36 -0500 From: Larry Cohen X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: lkcd@oss.sgi.com Subject: Re: New information regarding hung crash dump References: <3AA65F06.5C33EF45@storigen.com> <3AA682A8.1F7913E5@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1742 Lines: 53 "Matt D. Robinson" wrote: > Larry Cohen wrote: > > > > In smp_send_stop(), I commented out the call to disable_local_APIC(), > > > > The core dumps now succeed. > > > > Note. I just read the archives and saw that the kernel patch was > > updated. I had already noticed > > the __SMP__ bug and fixed my local copy (forgot to mention this when > > talking with Matt). So I was > > really calling smp_send_stop() which called disable_local_APIC(). > > > > Any ideas on why this is causing me grief? > > Is commenting out the routine going to cause other major problems? > > I don't know -- I'll have to look at it more closely to > find out why (I'm tracking down another lcrash bug at the > moment from Mark Price). Unless the APIC is affecting the > IDE driver directly ... Unfortunately commenting out disable_local_APIC() did not work as well as I thought. It worked fine when I had pulled one of my cpu's out of the motherboard as an experiment (there really is no way to disable a cpu from the bios ??? ). When I plugged the cpu back and then tried a crash, the system would panic in flush_tbl_others() because: if ((cpumask & cpu_online_map) != cpumask) BUG(); Its getting closer but I'm not there yet. .... > > > > Also, when I finally go to use lcrash I noticed that there were no > > stack traces. Am I missing > > something here? > > No stack traces at all, or no stack traces for the failing > process? I corrected a KDB patch issue earlier which caused > problems for the failing process, but not every stack trace. > No stack trace for any process. I saw the KDB patch and applied it. Did not make any difference (should it have ?). Thanks again for your help Matt, -Larry From owner-lkcd@oss.sgi.com Fri Mar 9 09:00:52 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 09:00:42 -0800 Received: from sp2.spcorp.com ([198.16.9.6]:56830 "EHLO ans2.spri.sp.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 09:00:26 -0800 Received: from kenmsgrly01.schp.com by sp2.spcorp.com with SMTP id MAA19060 (InterLock SMTP Gateway 4.2 for ); Fri, 9 Mar 2001 12:00:24 -0500 (EST) Received: FROM kenmsgbham03.schp.com BY kenmsgrly01.schp.com ; Fri Mar 09 11:58:38 2001 -0500 Received: by kenmsgbham03.schp.com with Internet Mail Service (5.5.2448.0) id ; Fri, 9 Mar 2001 12:00:07 -0500 Message-ID: <19CA78C6099AD4119EF300508B952C30D315C0@KENMSG30.schp.com> From: "Scott, Colin" To: "'lkcd@oss.sgi.com'" Cc: "'goemon@anime.net'" Subject: lkcd & I2c Date: Fri, 9 Mar 2001 11:53:13 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2448.0) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1444 Lines: 28 Hi, Are there any plans to have lkcd dump the system hardware and environment status info using the I2c protocol? The current 2.4.x kernels already have I2c code and drivers. Maybe they could be used to get the hardware and environment status info at the exact time of a crash or kernel panic? It would be a good idea to be able to eliminate for example CPU/Chipset overheating problems or memory parity errors as the cause of a crash before starting a painstaking investigation into what might actually be bug free kernel code. This code should be added to lckd because the syslog daemon would probably not get chance to report most hardware errors before the system dumps memory and reboots. I would also like to see ECC memory status messages if bad ECC memory was the cause of the system panic. Michael O'Reilly has written some code to do ECC error reporting to the kernel logfile. Is there any chance that this code could be merged into lkcd? See http://www.anime.net/~goemon/linux-ecc/ for details. The addition of hardware failure reporting code to lkcd could make Linux into a more reliable and more Highly Available UNIX variant by allowing us to identify and replace bad hardware in the system and would take Linux one more step closer to becoming a mature and reliable OS. Colin Scott Senior Technical Specialist Schering-Plough Corp. Disclaimer: This email does not represent the opinions or interests of Schering-Plough Corp. From owner-lkcd@oss.sgi.com Fri Mar 9 10:50:02 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 10:49:43 -0800 Received: from smtp.alacritech.com ([209.10.208.82]:61707 "EHLO smtp.alacritech.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 10:49:20 -0800 Received: from alacritech.com (alpha.alacritech.com [10.1.1.27]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f29IlNh13488; Fri, 9 Mar 2001 10:47:23 -0800 Message-ID: <3AA92704.A2AA397A@alacritech.com> Date: Fri, 09 Mar 2001 10:55:00 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Scott, Colin" CC: "'lkcd@oss.sgi.com'" , "'goemon@anime.net'" Subject: Re: lkcd & I2c References: <19CA78C6099AD4119EF300508B952C30D315C0@KENMSG30.schp.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2158 Lines: 47 This sounds like a great idea. There's a few mechanisms we can follow to do this -- either save this in the architecture dependent dump header (best location), or save this in the standard vmdump process. I can modify __dump_configure_header() to save this for the x86 platforms to start. Do you have a set of code I can look at for saving this stuff? Oh, and what interrupt level do you need to run at? Can you access the I2C master at the time of the failure to send out requests to the I2C slave? BTW, Tom, this also applies to IA64, since Merced has an I2C interface to the chip where you can get status information. --Matt "Scott, Colin" wrote: > > Hi, > > Are there any plans to have lkcd dump the system hardware and environment > status info using the I2c protocol? The current 2.4.x kernels already have > I2c code and drivers. Maybe they could be used to get the hardware and > environment status info at the exact time of a crash or kernel panic? It > would be a good idea to be able to eliminate for example CPU/Chipset > overheating problems or memory parity errors as the cause of a crash before > starting a painstaking investigation into what might actually be bug free > kernel code. This code should be added to lckd because the syslog daemon > would probably not get chance to report most hardware errors before the > system dumps memory and reboots. I would also like to see ECC memory status > messages if bad ECC memory was the cause of the system panic. Michael > O'Reilly has written some code to do ECC error reporting to the kernel > logfile. Is there any chance that this code could be merged into lkcd? See > http://www.anime.net/~goemon/linux-ecc/ for details. The addition of > hardware failure reporting code to lkcd could make Linux into a more > reliable and more Highly Available UNIX variant by allowing us to identify > and replace bad hardware in the system and would take Linux one more step > closer to becoming a mature and reliable OS. > > Colin Scott > Senior Technical Specialist > Schering-Plough Corp. > > Disclaimer: This email does not represent the opinions or interests of > Schering-Plough Corp. From owner-lkcd@oss.sgi.com Fri Mar 9 12:48:23 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 12:48:13 -0800 Received: from anime.net ([63.172.78.150]:13836 "EHLO anime.net") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 12:47:57 -0800 Received: from localhost (goemon@localhost) by anime.net (8.9.3/8.9.3) with ESMTP id MAA32266; Fri, 9 Mar 2001 12:49:04 -0800 Date: Fri, 9 Mar 2001 12:49:04 -0800 (PST) From: Dan Hollis To: "Matt D. Robinson" cc: "Scott, Colin" , "'lkcd@oss.sgi.com'" Subject: Re: lkcd & I2c In-Reply-To: <3AA92704.A2AA397A@alacritech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2311 Lines: 55 What's lkcd? -Dan On Fri, 9 Mar 2001, Matt D. Robinson wrote: > This sounds like a great idea. There's a few mechanisms we > can follow to do this -- either save this in the architecture > dependent dump header (best location), or save this in the > standard vmdump process. > > I can modify __dump_configure_header() to save this for the > x86 platforms to start. Do you have a set of code I can look > at for saving this stuff? > > Oh, and what interrupt level do you need to run at? Can you > access the I2C master at the time of the failure to send out > requests to the I2C slave? > > BTW, Tom, this also applies to IA64, since Merced has an I2C > interface to the chip where you can get status information. > > --Matt > > "Scott, Colin" wrote: > > > > Hi, > > > > Are there any plans to have lkcd dump the system hardware and environment > > status info using the I2c protocol? The current 2.4.x kernels already have > > I2c code and drivers. Maybe they could be used to get the hardware and > > environment status info at the exact time of a crash or kernel panic? It > > would be a good idea to be able to eliminate for example CPU/Chipset > > overheating problems or memory parity errors as the cause of a crash before > > starting a painstaking investigation into what might actually be bug free > > kernel code. This code should be added to lckd because the syslog daemon > > would probably not get chance to report most hardware errors before the > > system dumps memory and reboots. I would also like to see ECC memory status > > messages if bad ECC memory was the cause of the system panic. Michael > > O'Reilly has written some code to do ECC error reporting to the kernel > > logfile. Is there any chance that this code could be merged into lkcd? See > > http://www.anime.net/~goemon/linux-ecc/ for details. The addition of > > hardware failure reporting code to lkcd could make Linux into a more > > reliable and more Highly Available UNIX variant by allowing us to identify > > and replace bad hardware in the system and would take Linux one more step > > closer to becoming a mature and reliable OS. > > > > Colin Scott > > Senior Technical Specialist > > Schering-Plough Corp. > > > > Disclaimer: This email does not represent the opinions or interests of > > Schering-Plough Corp. > From owner-lkcd@oss.sgi.com Fri Mar 9 15:53:43 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 15:53:23 -0800 Received: from smtp.alacritech.com ([209.10.208.82]:17164 "EHLO smtp.alacritech.com") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 15:53:10 -0800 Received: from alacritech.com (alpha.alacritech.com [10.1.1.27]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f29NpCh20464; Fri, 9 Mar 2001 15:51:12 -0800 Message-ID: <3AA96E39.9DFA97A9@alacritech.com> Date: Fri, 09 Mar 2001 15:58:49 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: Dan Hollis CC: "Scott, Colin" , lkcd@oss.sgi.com Subject: Re: lkcd & I2c References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 184 Lines: 12 Dan Hollis wrote: > > What's lkcd? > > -Dan http://oss.sgi.com/projects/lkcd Just our little contribution to the world which we hope makes it in the Linux kernel someday. --Matt From owner-lkcd@oss.sgi.com Fri Mar 9 16:22:43 2001 Received: by oss.sgi.com id ; Fri, 9 Mar 2001 16:22:33 -0800 Received: from anime.net ([63.172.78.150]:5649 "EHLO anime.net") by oss.sgi.com with ESMTP id ; Fri, 9 Mar 2001 16:22:19 -0800 Received: from localhost (goemon@localhost) by anime.net (8.9.3/8.9.3) with ESMTP id QAA03919; Fri, 9 Mar 2001 16:23:33 -0800 Date: Fri, 9 Mar 2001 16:23:33 -0800 (PST) From: Dan Hollis To: "Matt D. Robinson" cc: "Scott, Colin" , Subject: Re: lkcd & I2c In-Reply-To: <3AA96E39.9DFA97A9@alacritech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 296 Lines: 13 On Fri, 9 Mar 2001, Matt D. Robinson wrote: > Dan Hollis wrote: > > What's lkcd? > http://oss.sgi.com/projects/lkcd > Just our little contribution to the world which we hope > makes it in the Linux kernel someday. Ah, crashdumps. i cant see how ecc monitoring is related to crashdumps. -Dan From owner-lkcd@oss.sgi.com Wed Mar 14 09:16:28 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 09:16:18 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:17675 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 09:16:07 -0800 Received: from raptor.nova.sgi.com (raptor.nova.sgi.com [169.238.23.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id JAA03130 for ; Wed, 14 Mar 2001 09:25:58 -0800 (PST) mail_from (pearlste@nova.sgi.com) Received: from nova.sgi.com (dhcp-163-154-6-207.engr.sgi.com [163.154.6.207]) by raptor.nova.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id MAA25073 for ; Wed, 14 Mar 2001 12:14:49 -0500 (EST) Message-ID: <3AAFA6E2.AA550281@nova.sgi.com> Date: Wed, 14 Mar 2001 12:14:10 -0500 From: Ken Pearlstein Reply-To: pearlste@nova.sgi.com X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0r1 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: snia lcrash questions Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1844 Lines: 70 Here is my /etc/sysconfig/vmdump .... DUMP_ACTIVE=1 DUMPDEV=/dev/vmdump DUMPDIR=/var/log/vmdump DUMP_SAVE=1 DUMP_LEVEL=4 DUMP_COMPRESS_PAGES=0 PANIC_TIMEOUT=60 The first question is ) when I force a panic of the system ( I used a module which calls panic) sometimes I get the message "Warning: vmdump only contains a dump header. Printing dump header:" and sometimes I dont ... The vmdump."BOUNDS" files are the same size ..... -rw-r--r-- 1 root root 60344741 Mar 13 10:50 vmdump.2 -rw-r--r-- 1 root root 60344741 Mar 13 06:54 vmdump.1 vmdump.1 works and vmdump.2 doesnt ???? The second question is) do you know if an NMI will produce a dump ?? and the last question) .... when $? is non zero why do the following lines in save_vmdump $LCRASH -r map.$BOUNDS vmdump.$BOUNDS kerntypes.$BOUNDS \ > analysis.$BOUNDS 2>&1 if [ $? -ne 0 ] ; then echo "Error: could not create crash report." >&2 exit 1 fi produce the following output Saving crash dump data (if any) Error: could not create crash reAdding Swap: 530112k swap-space (priority -1) port.ut to the sniaconsole I am not sure if this happens on all linux consoles or is only an snia console problem ..... -- ========================================================================== Ken Pearlstein Field Tech Support Analyst cell phone 703-362-9793 pager 888 308-0734 C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 ========================================================================= From owner-lkcd@oss.sgi.com Wed Mar 14 09:33:18 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 09:33:08 -0800 Received: from smtp.alacritech.com ([209.10.208.82]:16398 "EHLO smtp.alacritech.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 09:32:56 -0800 Received: from alacritech.com (alpha.alacritech.com [10.1.1.27]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f2EHUjh27979; Wed, 14 Mar 2001 09:30:45 -0800 Message-ID: <3AAFAC82.6FFB0C78@alacritech.com> Date: Wed, 14 Mar 2001 09:38:10 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: pearlste@nova.sgi.com CC: lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <3AAFA6E2.AA550281@nova.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2472 Lines: 70 Ken Pearlstein wrote: > > Here is my /etc/sysconfig/vmdump .... > DUMP_ACTIVE=1 > DUMPDEV=/dev/vmdump > DUMPDIR=/var/log/vmdump > DUMP_SAVE=1 > DUMP_LEVEL=4 > DUMP_COMPRESS_PAGES=0 > PANIC_TIMEOUT=60 > > The first question is ) > when I force a panic of the system ( I used a module which calls panic) > sometimes I get the message "Warning: vmdump only contains a dump > header. Printing dump header:" and sometimes I dont ... You can't use a module to call panic(). lcrash isn't capable (yet) of debugging kernel modules. The resulting crash dump may not be usable by lcrash. The memory will be there, of course ... > The vmdump."BOUNDS" files are the same size ..... > -rw-r--r-- 1 root root 60344741 Mar 13 10:50 vmdump.2 > -rw-r--r-- 1 root root 60344741 Mar 13 06:54 vmdump.1 > > vmdump.1 works and vmdump.2 doesnt ???? Is the second one an exception dump, and not a panic() dump? > The second question is) do you know if an NMI will produce a dump ?? It _should_ ... I put in some code to catch the NMI watchdog, but I haven't reproduced the case as of yet. > and the last question) .... when $? is non zero why do the following > lines in save_vmdump > > $LCRASH -r map.$BOUNDS vmdump.$BOUNDS kerntypes.$BOUNDS > \ > > analysis.$BOUNDS 2>&1 > if [ $? -ne 0 ] ; then > echo "Error: could not create crash report." >&2 > > exit 1 > fi > > produce the following output > > Saving crash dump data (if any) > Error: could not create crash reAdding Swap: 530112k swap-space > (priority -1) > port.ut to the sniaconsole > > I am not sure if this happens on all linux consoles or is only an snia > console problem ..... This might be a console issue on your end. Looks like the kernel log buffer (dmesg) and the boot-up script information is going to the same screen. --Matt > -- > ========================================================================== > Ken Pearlstein Field Tech Support Analyst > > cell phone 703-362-9793 pager 888 308-0734 > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > ========================================================================= From owner-lkcd@oss.sgi.com Wed Mar 14 09:39:47 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 09:39:38 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:29732 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 09:39:33 -0800 Received: from raptor.nova.sgi.com (raptor.nova.sgi.com [169.238.23.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id JAA07314 for ; Wed, 14 Mar 2001 09:38:21 -0800 (PST) mail_from (pearlste@nova.sgi.com) Received: from nova.sgi.com (dhcp-163-154-6-207.engr.sgi.com [163.154.6.207]) by raptor.nova.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id MAA25341; Wed, 14 Mar 2001 12:38:13 -0500 (EST) Message-ID: <3AAFAC5E.45D089F6@nova.sgi.com> Date: Wed, 14 Mar 2001 12:37:34 -0500 From: Ken Pearlstein Reply-To: pearlste@nova.sgi.com X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0r1 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <3AAFA6E2.AA550281@nova.sgi.com> <3AAFAC82.6FFB0C78@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 3427 Lines: 92 "Matt D. Robinson" wrote: In all cases I used the module panic .... The lcrash using vmdump.1 from the module panic worked well !! I used the identical panic for vmdump.2 ...... The vmdump.2 size was identical to the vmdump.1 size however lcrash thought only the headers were dumped !! I will go test NMI now .... > Ken Pearlstein wrote: > > > > Here is my /etc/sysconfig/vmdump .... > > DUMP_ACTIVE=1 > > DUMPDEV=/dev/vmdump > > DUMPDIR=/var/log/vmdump > > DUMP_SAVE=1 > > DUMP_LEVEL=4 > > DUMP_COMPRESS_PAGES=0 > > PANIC_TIMEOUT=60 > > > > The first question is ) > > when I force a panic of the system ( I used a module which calls panic) > > sometimes I get the message "Warning: vmdump only contains a dump > > header. Printing dump header:" and sometimes I dont ... > > You can't use a module to call panic(). lcrash isn't capable (yet) > of debugging kernel modules. The resulting crash dump may not be > usable by lcrash. The memory will be there, of course ... > > > The vmdump."BOUNDS" files are the same size ..... > > -rw-r--r-- 1 root root 60344741 Mar 13 10:50 vmdump.2 > > -rw-r--r-- 1 root root 60344741 Mar 13 06:54 vmdump.1 > > > > vmdump.1 works and vmdump.2 doesnt ???? > > Is the second one an exception dump, and not a panic() dump? > > > The second question is) do you know if an NMI will produce a dump ?? > > It _should_ ... I put in some code to catch the NMI watchdog, but I > haven't reproduced the case as of yet. > > > and the last question) .... when $? is non zero why do the following > > lines in save_vmdump > > > > $LCRASH -r map.$BOUNDS vmdump.$BOUNDS kerntypes.$BOUNDS > > \ > > > analysis.$BOUNDS 2>&1 > > if [ $? -ne 0 ] ; then > > echo "Error: could not create crash report." >&2 > > > > exit 1 > > fi > > > > produce the following output > > > > Saving crash dump data (if any) > > Error: could not create crash reAdding Swap: 530112k swap-space > > (priority -1) > > port.ut to the sniaconsole > > > > I am not sure if this happens on all linux consoles or is only an snia > > console problem ..... > > This might be a console issue on your end. Looks like the kernel log > buffer (dmesg) and the boot-up script information is going to the same > screen. > > --Matt > > > -- > > ========================================================================== > > Ken Pearlstein Field Tech Support Analyst > > > > cell phone 703-362-9793 pager 888 308-0734 > > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > > ========================================================================= -- ========================================================================== Ken Pearlstein Field Tech Support Analyst cell phone 703-362-9793 pager 888 308-0734 C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 ========================================================================= From owner-lkcd@oss.sgi.com Wed Mar 14 09:49:38 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 09:49:28 -0800 Received: from mg03.austin.ibm.com ([192.35.232.20]:46030 "EHLO mg03.austin.ibm.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 09:49:19 -0800 Received: from austin.ibm.com (netmail1.austin.ibm.com [9.53.250.96]) by mg03.austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id LAA22138; Wed, 14 Mar 2001 11:51:36 -0600 Received: from craft.austin.ibm.com (craft.austin.ibm.com [9.53.145.12]) by austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id LAA27612; Wed, 14 Mar 2001 11:49:05 -0600 Received: (from dcraft@localhost) by craft.austin.ibm.com (AIX4.3/8.9.3/8.7-client1.01) id LAA45332; Wed, 14 Mar 2001 11:49:05 -0600 From: Dave Craft Message-Id: <200103141749.LAA45332@craft.austin.ibm.com> Subject: Re: snia lcrash questions To: yakker@alacritech.com (Matt D. Robinson) Date: Wed, 14 Mar 2001 11:49:04 -0600 (CST) Cc: pearlste@nova.sgi.com, lkcd@oss.sgi.com In-Reply-To: <3AAFAC82.6FFB0C78@alacritech.com> from "Matt D. Robinson" at Mar 14, 2001 09:38:10 AM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 596 Lines: 16 >You can't use a module to call panic(). lcrash isn't capable (yet) >of debugging kernel modules. The resulting crash dump may not be Regarding the above statement about inability to debug kernel modules. Did you guys ever get around to looking/rewriting the patch I supplied for getting tracebacks in kernel modules? The patch still applies and still works. Its buried in the archives of this mailing list. Is there any idea on when we'd see comparable function make it into lcrash? -- Mail : dave@austin.ibm.com Phone : 512-838-8248 I am Jack's email closing From owner-lkcd@oss.sgi.com Wed Mar 14 09:51:48 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 09:51:39 -0800 Received: from deliverator.sgi.com ([204.94.214.10]:31016 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 09:51:31 -0800 Received: from raptor.nova.sgi.com (raptor.nova.sgi.com [169.238.23.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id JAA09359 for ; Wed, 14 Mar 2001 09:50:18 -0800 (PST) mail_from (pearlste@nova.sgi.com) Received: from nova.sgi.com (dhcp-163-154-6-207.engr.sgi.com [163.154.6.207]) by raptor.nova.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id MAA25480; Wed, 14 Mar 2001 12:50:08 -0500 (EST) Message-ID: <3AAFAF27.B9998AB1@nova.sgi.com> Date: Wed, 14 Mar 2001 12:49:27 -0500 From: Ken Pearlstein Reply-To: pearlste@nova.sgi.com X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0r1 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" , lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <3AAFA6E2.AA550281@nova.sgi.com> <3AAFAC82.6FFB0C78@alacritech.com> <3AAFAC5E.45D089F6@nova.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 4293 Lines: 109 Ken Pearlstein wrote: Fron the snia console ... I did a control-t which put me to the l1 prompt ..... I typed nmi .... The resulting nmi of the system did NOT produce a dump !! > "Matt D. Robinson" wrote: > > In all cases I used the module panic .... The lcrash using vmdump.1 from the > module panic worked well !! > I used the identical panic for vmdump.2 ...... The vmdump.2 size was identical > to the vmdump.1 size however lcrash thought only the headers were dumped !! > > I will go test NMI now .... > > > Ken Pearlstein wrote: > > > > > > Here is my /etc/sysconfig/vmdump .... > > > DUMP_ACTIVE=1 > > > DUMPDEV=/dev/vmdump > > > DUMPDIR=/var/log/vmdump > > > DUMP_SAVE=1 > > > DUMP_LEVEL=4 > > > DUMP_COMPRESS_PAGES=0 > > > PANIC_TIMEOUT=60 > > > > > > The first question is ) > > > when I force a panic of the system ( I used a module which calls panic) > > > sometimes I get the message "Warning: vmdump only contains a dump > > > header. Printing dump header:" and sometimes I dont ... > > > > You can't use a module to call panic(). lcrash isn't capable (yet) > > of debugging kernel modules. The resulting crash dump may not be > > usable by lcrash. The memory will be there, of course ... > > > > > The vmdump."BOUNDS" files are the same size ..... > > > -rw-r--r-- 1 root root 60344741 Mar 13 10:50 vmdump.2 > > > -rw-r--r-- 1 root root 60344741 Mar 13 06:54 vmdump.1 > > > > > > vmdump.1 works and vmdump.2 doesnt ???? > > > > Is the second one an exception dump, and not a panic() dump? > > > > > The second question is) do you know if an NMI will produce a dump ?? > > > > It _should_ ... I put in some code to catch the NMI watchdog, but I > > haven't reproduced the case as of yet. > > > > > and the last question) .... when $? is non zero why do the following > > > lines in save_vmdump > > > > > > $LCRASH -r map.$BOUNDS vmdump.$BOUNDS kerntypes.$BOUNDS > > > \ > > > > analysis.$BOUNDS 2>&1 > > > if [ $? -ne 0 ] ; then > > > echo "Error: could not create crash report." >&2 > > > > > > exit 1 > > > fi > > > > > > produce the following output > > > > > > Saving crash dump data (if any) > > > Error: could not create crash reAdding Swap: 530112k swap-space > > > (priority -1) > > > port.ut to the sniaconsole > > > > > > I am not sure if this happens on all linux consoles or is only an snia > > > console problem ..... > > > > This might be a console issue on your end. Looks like the kernel log > > buffer (dmesg) and the boot-up script information is going to the same > > screen. > > > > --Matt > > > > > -- > > > ========================================================================== > > > Ken Pearlstein Field Tech Support Analyst > > > > > > cell phone 703-362-9793 pager 888 308-0734 > > > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > > > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > > > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > > > ========================================================================= > > -- > ========================================================================== > Ken Pearlstein Field Tech Support Analyst > > cell phone 703-362-9793 pager 888 308-0734 > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > ========================================================================= -- ========================================================================== Ken Pearlstein Field Tech Support Analyst cell phone 703-362-9793 pager 888 308-0734 C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 ========================================================================= From owner-lkcd@oss.sgi.com Wed Mar 14 10:25:48 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:25:28 -0800 Received: from tan7.ncr.com ([192.127.94.7]:56522 "EHLO esssol013.elsegundoca.ncr.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:25:24 -0800 Received: from eswssol002.elsegundoca.ncr.com (eswssol002 [141.206.1.4]) by esssol013.elsegundoca.ncr.com (8.9.2/8.9.2) with ESMTP id KAA29020; Wed, 14 Mar 2001 10:25:11 -0800 (PST) Received: (from kim@localhost) by eswssol002.elsegundoca.ncr.com (8.8.7/8.8.5) id KAA02058; Wed, 14 Mar 2001 10:25:14 -0800 (PST) Date: Wed, 14 Mar 2001 10:25:14 -0800 From: Moo Kim To: Dave Craft Cc: "Matt D. Robinson" , pearlste@nova.sgi.com, lkcd@oss.sgi.com Subject: Re: snia lcrash questions Message-ID: <20010314102514.L18766@mailbox.ElSegundoCA.NCR.COM> References: <3AAFAC82.6FFB0C78@alacritech.com> <200103141749.LAA45332@craft.austin.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200103141749.LAA45332@craft.austin.ibm.com>; from dcraft@austin.ibm.com on Wed, Mar 14, 2001 at 11:49:04AM -0600 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 982 Lines: 29 I somehow assumed that lcrash was capable of debugging kernel modules. I guess not. Is it possible to incorporate Dave's patch into lcrash distribution because it will be extremely useful for kernel module developer ? Thanks, Moo Kim Moo.Kim@NCR.COM NCR Corporation On Wed, Mar 14, 2001 at 11:49:04AM -0600, Dave Craft wrote: > > >You can't use a module to call panic(). lcrash isn't capable (yet) > >of debugging kernel modules. The resulting crash dump may not be > > Regarding the above statement about inability to > debug kernel modules. Did you guys ever get around > to looking/rewriting the patch I supplied for getting tracebacks > in kernel modules? The patch still applies and still > works. Its buried in the archives of this mailing list. > > Is there any idea on when we'd see comparable > function make it into lcrash? > > -- > Mail : dave@austin.ibm.com Phone : 512-838-8248 > I am Jack's email closing From owner-lkcd@oss.sgi.com Wed Mar 14 10:36:28 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:36:18 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:13080 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:36:10 -0800 Received: from raptor.nova.sgi.com (raptor.nova.sgi.com [169.238.23.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id KAA09406 for ; Wed, 14 Mar 2001 10:46:01 -0800 (PST) mail_from (pearlste@nova.sgi.com) Received: from nova.sgi.com (dhcp-163-154-6-207.engr.sgi.com [163.154.6.207]) by raptor.nova.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id NAA25980; Wed, 14 Mar 2001 13:34:48 -0500 (EST) Message-ID: <3AAFB9A0.1A2A313E@nova.sgi.com> Date: Wed, 14 Mar 2001 13:34:08 -0500 From: Ken Pearlstein Reply-To: pearlste@nova.sgi.com X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0r1 i686) X-Accept-Language: en MIME-Version: 1.0 To: Moo Kim CC: Dave Craft , "Matt D. Robinson" , lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <3AAFAC82.6FFB0C78@alacritech.com> <200103141749.LAA45332@craft.austin.ibm.com> <20010314102514.L18766@mailbox.ElSegundoCA.NCR.COM> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2001 Lines: 56 Moo Kim wrote: Here is the output from lcrash of the trace command ... I called panic when the rmmod was done >> trace e000000049420000 ================================================================ STACK TRACE FOR TASK: 0xe000000049420000 (rmmod) 0 dump_execute+380 [0xe00000000071b9dc] 1 panic+508 [0xe0000000005bfc9c] this trace seems correct .... What part of lcrash is not working in analyzing modules ?? > I somehow assumed that lcrash was capable of debugging > kernel modules. I guess not. Is it possible to > incorporate Dave's patch into lcrash distribution > because it will be extremely useful for kernel module > developer ? > > Thanks, > > Moo Kim Moo.Kim@NCR.COM > NCR Corporation > > On Wed, Mar 14, 2001 at 11:49:04AM -0600, Dave Craft wrote: > > > > >You can't use a module to call panic(). lcrash isn't capable (yet) > > >of debugging kernel modules. The resulting crash dump may not be > > > > Regarding the above statement about inability to > > debug kernel modules. Did you guys ever get around > > to looking/rewriting the patch I supplied for getting tracebacks > > in kernel modules? The patch still applies and still > > works. Its buried in the archives of this mailing list. > > > > Is there any idea on when we'd see comparable > > function make it into lcrash? > > > > -- > > Mail : dave@austin.ibm.com Phone : 512-838-8248 > > I am Jack's email closing -- ========================================================================== Ken Pearlstein Field Tech Support Analyst cell phone 703-362-9793 pager 888 308-0734 C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 ========================================================================= From owner-lkcd@oss.sgi.com Wed Mar 14 10:47:29 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:47:19 -0800 Received: from mg02.austin.ibm.com ([192.35.232.12]:9929 "EHLO mg02.austin.ibm.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:47:01 -0800 Received: from austin.ibm.com (netmail2.austin.ibm.com [9.53.250.97]) by mg02.austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id MAA20286; Wed, 14 Mar 2001 12:50:53 -0600 Received: from craft.austin.ibm.com (craft.austin.ibm.com [9.53.145.12]) by austin.ibm.com (AIX4.3/8.9.3/8.9.3) with ESMTP id MAA28118; Wed, 14 Mar 2001 12:46:52 -0600 Received: (from dcraft@localhost) by craft.austin.ibm.com (AIX4.3/8.9.3/8.7-client1.01) id MAA37960; Wed, 14 Mar 2001 12:46:51 -0600 From: Dave Craft Message-Id: <200103141846.MAA37960@craft.austin.ibm.com> Subject: Re: snia lcrash questions To: pearlste@nova.sgi.com Date: Wed, 14 Mar 2001 12:46:51 -0600 (CST) Cc: Moo.Kim@NCR.COM (Moo Kim), dcraft@austin.ibm.com (Dave Craft), yakker@alacritech.com (Matt D. Robinson), lkcd@oss.sgi.com In-Reply-To: <3AAFB9A0.1A2A313E@nova.sgi.com> from "Ken Pearlstein" at Mar 14, 2001 01:34:08 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2329 Lines: 69 The two routines you cite are in the kernel. If your crash emanates from routines in a module, you won't get a traceback of any routine name in your module. dave > >Moo Kim wrote: > >Here is the output from lcrash of the trace command ... I called panic when >the rmmod was done > >>> trace e000000049420000 >================================================================ >STACK TRACE FOR TASK: 0xe000000049420000 (rmmod) > > 0 dump_execute+380 [0xe00000000071b9dc] > 1 panic+508 [0xe0000000005bfc9c] > >this trace seems correct .... What part of lcrash is not working in >analyzing modules ?? > >> I somehow assumed that lcrash was capable of debugging >> kernel modules. I guess not. Is it possible to >> incorporate Dave's patch into lcrash distribution >> because it will be extremely useful for kernel module >> developer ? >> >> Thanks, >> >> Moo Kim Moo.Kim@NCR.COM >> NCR Corporation >> >> On Wed, Mar 14, 2001 at 11:49:04AM -0600, Dave Craft wrote: >> > >> > >You can't use a module to call panic(). lcrash isn't capable (yet) >> > >of debugging kernel modules. The resulting crash dump may not be >> > >> > Regarding the above statement about inability to >> > debug kernel modules. Did you guys ever get around >> > to looking/rewriting the patch I supplied for getting tracebacks >> > in kernel modules? The patch still applies and still >> > works. Its buried in the archives of this mailing list. >> > >> > Is there any idea on when we'd see comparable >> > function make it into lcrash? >> > >> > -- >> > Mail : dave@austin.ibm.com Phone : 512-838-8248 >> > I am Jack's email closing > >-- >========================================================================== >Ken Pearlstein Field Tech Support Analyst > >cell phone 703-362-9793 pager 888 308-0734 >C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com >14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 >CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 >========================================================================= > > > > Mail : dave@austin.ibm.com Phone : 512-838-8248 I am Jack's email closing From owner-lkcd@oss.sgi.com Wed Mar 14 10:54:18 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:54:09 -0800 Received: from tan7.ncr.com ([192.127.94.7]:45676 "EHLO esssol013.elsegundoca.ncr.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:53:58 -0800 Received: from eswssol002.elsegundoca.ncr.com (eswssol002 [141.206.1.4]) by esssol013.elsegundoca.ncr.com (8.9.2/8.9.2) with ESMTP id KAA29526; Wed, 14 Mar 2001 10:53:47 -0800 (PST) Received: (from kim@localhost) by eswssol002.elsegundoca.ncr.com (8.8.7/8.8.5) id KAA08846; Wed, 14 Mar 2001 10:53:50 -0800 (PST) Date: Wed, 14 Mar 2001 10:53:50 -0800 From: Moo Kim To: Ken Pearlstein Cc: Moo Kim , Dave Craft , "Matt D. Robinson" , lkcd@oss.sgi.com Subject: Re: snia lcrash questions Message-ID: <20010314105350.N18766@mailbox.ElSegundoCA.NCR.COM> References: <3AAFAC82.6FFB0C78@alacritech.com> <200103141749.LAA45332@craft.austin.ibm.com> <20010314102514.L18766@mailbox.ElSegundoCA.NCR.COM> <3AAFB9A0.1A2A313E@nova.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3AAFB9A0.1A2A313E@nova.sgi.com>; from pearlste@nova.sgi.com on Wed, Mar 14, 2001 at 01:34:08PM -0500 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2475 Lines: 68 Hello Ken, I was simply following up with Matt and Dave's email below. I have not tried out (yet) whether lcrash is capable of debugging kernel modules. If lcrash already supports kernel modules for analyzing panic dumps as well as live debugging, that is a great news. Thanks, Moo On Wed, Mar 14, 2001 at 01:34:08PM -0500, Ken Pearlstein wrote: > Moo Kim wrote: > > Here is the output from lcrash of the trace command ... I called panic when > the rmmod was done > > >> trace e000000049420000 > ================================================================ > STACK TRACE FOR TASK: 0xe000000049420000 (rmmod) > > 0 dump_execute+380 [0xe00000000071b9dc] > 1 panic+508 [0xe0000000005bfc9c] > > this trace seems correct .... What part of lcrash is not working in > analyzing modules ?? > > > I somehow assumed that lcrash was capable of debugging > > kernel modules. I guess not. Is it possible to > > incorporate Dave's patch into lcrash distribution > > because it will be extremely useful for kernel module > > developer ? > > > > Thanks, > > > > Moo Kim Moo.Kim@NCR.COM > > NCR Corporation > > > > On Wed, Mar 14, 2001 at 11:49:04AM -0600, Dave Craft wrote: > > > > > > >You can't use a module to call panic(). lcrash isn't capable (yet) > > > >of debugging kernel modules. The resulting crash dump may not be > > > > > > Regarding the above statement about inability to > > > debug kernel modules. Did you guys ever get around > > > to looking/rewriting the patch I supplied for getting tracebacks > > > in kernel modules? The patch still applies and still > > > works. Its buried in the archives of this mailing list. > > > > > > Is there any idea on when we'd see comparable > > > function make it into lcrash? > > > > > > -- > > > Mail : dave@austin.ibm.com Phone : 512-838-8248 > > > I am Jack's email closing > > -- > ========================================================================== > Ken Pearlstein Field Tech Support Analyst > > cell phone 703-362-9793 pager 888 308-0734 > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > ========================================================================= > From owner-lkcd@oss.sgi.com Wed Mar 14 10:59:19 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:59:08 -0800 Received: from smtp.alacritech.com ([209.10.208.82]:22542 "EHLO smtp.alacritech.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:58:59 -0800 Received: from alacritech.com (alpha.alacritech.com [10.1.1.27]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f2EIuBh29719; Wed, 14 Mar 2001 10:56:11 -0800 Message-ID: <3AAFC087.5CE1E152@alacritech.com> Date: Wed, 14 Mar 2001 11:03:35 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: pearlste@nova.sgi.com CC: lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <3AAFA6E2.AA550281@nova.sgi.com> <3AAFAC82.6FFB0C78@alacritech.com> <3AAFAC5E.45D089F6@nova.sgi.com> <3AAFAF27.B9998AB1@nova.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1056 Lines: 30 Ken Pearlstein wrote: > > Ken Pearlstein wrote: > > Fron the snia console ... I did a control-t which put me to the l1 prompt ..... I > typed nmi .... > > The resulting nmi of the system did NOT produce a dump !! > > > "Matt D. Robinson" wrote: > > > > In all cases I used the module panic .... The lcrash using vmdump.1 from the > > module panic worked well !! > > I used the identical panic for vmdump.2 ...... The vmdump.2 size was identical > > to the vmdump.1 size however lcrash thought only the headers were dumped !! > > > > I will go test NMI now .... Ken, I can come over to SGI to check out the SNIA boxes to see what's going on, but you'll have to talk to Kanoj or one of those engineers to figure out what's up for changes. I can't change the kernel they are using without working with them. As far as I know, Simon's not there, but John still is, so maybe he can let me in on what's going on so I can help out. I'm copying him ... I assume you're doing this through an L1? BTW, Bruno might be working on the fixes for this. --Matt From owner-lkcd@oss.sgi.com Wed Mar 14 10:59:29 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 10:59:19 -0800 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:6171 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 10:59:03 -0800 Received: from loco.corp.sgi.com (loco.corp.sgi.com [130.62.172.38]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id LAA08606 for ; Wed, 14 Mar 2001 11:08:54 -0800 (PST) mail_from (tjm@sgi.com) Received: from dslstriker (dsl-striker.corp.sgi.com [192.132.129.237]) by loco.corp.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via SMTP id KAA20337; Wed, 14 Mar 2001 10:58:19 -0800 (PST) Message-ID: <001f01c0acb8$829e8280$ed8184c0@corp.sgi.com> From: "Tom Morano" To: "Moo Kim" , "Ken Pearlstein" Cc: "Dave Craft" , "Matt D. Robinson" , References: <3AAFAC82.6FFB0C78@alacritech.com> <200103141749.LAA45332@craft.austin.ibm.com> <20010314102514.L18766@mailbox.ElSegundoCA.NCR.COM> <3AAFB9A0.1A2A313E@nova.sgi.com> <20010314105350.N18766@mailbox.ElSegundoCA.NCR.COM> Subject: Re: snia lcrash questions Date: Wed, 14 Mar 2001 10:56:41 -0800 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2810 Lines: 88 > > Hello Ken, > > I was simply following up with Matt and Dave's email > below. I have not tried out (yet) whether lcrash > is capable of debugging kernel modules. If lcrash > already supports kernel modules for analyzing panic > dumps as well as live debugging, that is a great news. It doesn't yet, but I do have plans of implementing Dave's suggested modifications. I have been somewhat sidetracked of late and have not had a chance to get back to this. It is on my plate though. Thanks, Tom > > Thanks, > > Moo > > On Wed, Mar 14, 2001 at 01:34:08PM -0500, Ken Pearlstein wrote: > > Moo Kim wrote: > > > > Here is the output from lcrash of the trace command ... I called panic when > > the rmmod was done > > > > >> trace e000000049420000 > > ================================================================ > > STACK TRACE FOR TASK: 0xe000000049420000 (rmmod) > > > > 0 dump_execute+380 [0xe00000000071b9dc] > > 1 panic+508 [0xe0000000005bfc9c] > > > > this trace seems correct .... What part of lcrash is not working in > > analyzing modules ?? > > > > > I somehow assumed that lcrash was capable of debugging > > > kernel modules. I guess not. Is it possible to > > > incorporate Dave's patch into lcrash distribution > > > because it will be extremely useful for kernel module > > > developer ? > > > > > > Thanks, > > > > > > Moo Kim Moo.Kim@NCR.COM > > > NCR Corporation > > > > > > On Wed, Mar 14, 2001 at 11:49:04AM -0600, Dave Craft wrote: > > > > > > > > >You can't use a module to call panic(). lcrash isn't capable (yet) > > > > >of debugging kernel modules. The resulting crash dump may not be > > > > > > > > Regarding the above statement about inability to > > > > debug kernel modules. Did you guys ever get around > > > > to looking/rewriting the patch I supplied for getting tracebacks > > > > in kernel modules? The patch still applies and still > > > > works. Its buried in the archives of this mailing list. > > > > > > > > Is there any idea on when we'd see comparable > > > > function make it into lcrash? > > > > > > > > -- > > > > Mail : dave@austin.ibm.com Phone : 512-838-8248 > > > > I am Jack's email closing > > > > -- > > ========================================================================== > > Ken Pearlstein Field Tech Support Analyst > > > > cell phone 703-362-9793 pager 888 308-0734 > > C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com > > 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 > > CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 > > ========================================================================= > > > From owner-lkcd@oss.sgi.com Wed Mar 14 11:02:08 2001 Received: by oss.sgi.com id ; Wed, 14 Mar 2001 11:01:59 -0800 Received: from smtp.alacritech.com ([209.10.208.82]:24078 "EHLO smtp.alacritech.com") by oss.sgi.com with ESMTP id ; Wed, 14 Mar 2001 11:01:45 -0800 Received: from alacritech.com (alpha.alacritech.com [10.1.1.27]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f2EIxCh29797; Wed, 14 Mar 2001 10:59:12 -0800 Message-ID: <3AAFC13D.A7DB0AD7@alacritech.com> Date: Wed, 14 Mar 2001 11:06:37 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: Dave Craft CC: pearlste@nova.sgi.com, lkcd@oss.sgi.com Subject: Re: snia lcrash questions References: <200103141749.LAA45332@craft.austin.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 650 Lines: 17 Dave Craft wrote: > > >You can't use a module to call panic(). lcrash isn't capable (yet) > >of debugging kernel modules. The resulting crash dump may not be > > Regarding the above statement about inability to > debug kernel modules. Did you guys ever get around > to looking/rewriting the patch I supplied for getting tracebacks > in kernel modules? The patch still applies and still > works. Its buried in the archives of this mailing list. > > Is there any idea on when we'd see comparable > function make it into lcrash? Working on this with Dave directly (right now) ... --Matt From owner-lkcd@oss.sgi.com Mon Mar 19 07:25:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2JFP3m00741 for lkcd-outgoing; Mon, 19 Mar 2001 07:25:03 -0800 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2JFP3M00738 for ; Mon, 19 Mar 2001 07:25:03 -0800 Received: from storigen.com (vmlager1.storigen.com [192.168.0.72]) by XCHANGESERVER.storigen.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id G0SJZ1S1; Mon, 19 Mar 2001 10:25:02 -0500 Message-ID: <3AB62196.EC2CB041@storigen.com> Date: Mon, 19 Mar 2001 10:11:18 -0500 From: Larry Cohen X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: large crash dumps Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 1208 Lines: 41 Just trying to get the final answer on trying to reduce the size of core dumps (my 250 megabytes dumps fill up a disk really fast). I think I read two conflicting messages about this. The FAQ says that the entire memory has to be dumped but in the archive I see a message from Dave Winchell that Mission Critical Linux's mcore can selectively dump pages which dramatically reduces the dump sizes. I also noticed that I could further reduce the LKCD dumps with gzip (by 50%). Would it be possible at least to improve the compression by using zlib ? Its been a challenge but with the hardware I have I could not get mcore working. In order to get lkcd working I had to comment out code in arch/i386/kernel/apic.c. void disable_local_APIC(void) { unsigned long value; clear_local_APIC(); /* * Disable APIC (implies clearing of registers * for 82489DX!). */ #ifdef notdef value = apic_read(APIC_SPIV); value &= ~(1<<8); apic_write_around(APIC_SPIV, value); #endif I'm not really sure why this works or what the side effects are. But if I dont do it I the system will hang trying to write out the dump header. -Larry Cohen From owner-lkcd@oss.sgi.com Mon Mar 19 11:28:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2JJSf307305 for lkcd-outgoing; Mon, 19 Mar 2001 11:28:41 -0800 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2JJSeM07300 for ; Mon, 19 Mar 2001 11:28:40 -0800 Received: from storigen.com (vmlager1.storigen.com [192.168.0.72]) by XCHANGESERVER.storigen.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id G0SJZ17V; Mon, 19 Mar 2001 14:28:34 -0500 Message-ID: <3AB65A79.8B38B275@storigen.com> Date: Mon, 19 Mar 2001 14:14:01 -0500 From: Larry Cohen X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: crash dumps hangs with modules ... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 371 Lines: 14 Yet another twist. When I statically build my code into the kernel and cause a panic I get a dump successfully ... well most of the time (it truncates occasionally without reporting an error). When I load my code as a module and cause a panic the dump hangs after successfully writing out the dump header and I believe some of the dump pages. Any clues? Thanks, Larry From owner-lkcd@oss.sgi.com Tue Mar 20 06:44:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2KEi5A27574 for lkcd-outgoing; Tue, 20 Mar 2001 06:44:05 -0800 Received: from yog-sothoth.sgi.com (eugate.sgi.com [192.48.160.10]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2KEi2M27571 for ; Tue, 20 Mar 2001 06:44:03 -0800 Received: from raptor.nova.sgi.com (raptor.nova.sgi.com [169.238.23.130]) by yog-sothoth.sgi.com (980305.SGI.8.8.8-aspam-6.2/980304.SGI-aspam-europe) via ESMTP id PAA5960135 for ; Tue, 20 Mar 2001 15:44:01 +0100 (CET) mail_from (pearlste@nova.sgi.com) Received: from nova.sgi.com (dhcp-163-154-6-207.engr.sgi.com [163.154.6.207]) by raptor.nova.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id JAA24328 for ; Tue, 20 Mar 2001 09:42:40 -0500 (EST) Message-ID: <3AB76BF3.AE341379@nova.sgi.com> Date: Tue, 20 Mar 2001 09:40:51 -0500 From: Ken Pearlstein Reply-To: pearlste@nova.sgi.com X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.2.14-5.0r1 i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: /sbin/vmdump in ia64 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 722 Lines: 16 This script determines primary_dumpdev by doing an awk on /etc/fstab. This keeps one from using another swap formatted partition. Perhaps primary_dumpdev is what should be configurable instead of DUMPDEV. -- ========================================================================== Ken Pearlstein Field Tech Support Analyst cell phone 703-362-9793 pager 888 308-0734 C/O SILICON GRAPHICS pearlste@amgems.nova.sgi.com 14160 NEWBROOK DRIVE, SUITE 100 voice: (703) 227-8531 430-5403 CHANTILLY, VA 20151 fax: (703) 277-8500 430-5403 ========================================================================= From owner-lkcd@oss.sgi.com Tue Mar 20 14:40:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2KMecB05002 for lkcd-outgoing; Tue, 20 Mar 2001 14:40:38 -0800 Received: from gull.prod.itd.earthlink.net (gull.prod.itd.earthlink.net [207.217.121.85]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2KMecM04999 for ; Tue, 20 Mar 2001 14:40:38 -0800 Received: from alacritech.com (pool0468.cvx32-bradley.dialup.earthlink.net [209.179.158.213]) by gull.prod.itd.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id OAA15958; Tue, 20 Mar 2001 14:40:28 -0800 (PST) Message-ID: <3AB7DDB0.6EF050E2@alacritech.com> Date: Tue, 20 Mar 2001 14:46:08 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.2.16-22 i686) X-Accept-Language: en MIME-Version: 1.0 To: Larry Cohen CC: lkcd@oss.sgi.com Subject: Re: large crash dumps References: <3AB62196.EC2CB041@storigen.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 2280 Lines: 66 Larry Cohen wrote: > > Just trying to get the final answer on trying to reduce the size of core > dumps (my 250 megabytes dumps > fill up a disk really fast). > I think I read two conflicting messages about this. The FAQ says that > the entire memory has to > be dumped but in the archive I see a message from Dave Winchell that > Mission Critical Linux's mcore > can selectively dump pages which dramatically reduces the dump sizes. > I also noticed that I could further reduce the LKCD dumps with gzip (by > 50%). Would it be possible at least > to improve the compression by using zlib ? Sure -- I can add this as an option, but I'll have to see which code I can use in the kernel as the system's coming down. Let me look at this. I did at one point, as there was something in the kernel that was using zlib, but it wasn't a guaranteed part of the kernel and I didn't want to make the vmdump.o module any bigger than it already was. I'll go back and take another look. There's also a patch to add in selective pages. I have to go back into the archives and add it. I'll also look at the APIC code. I'm out on business right now, so I've got nothing better to do in the evenings but code ... :) The last word I got out of Andi is that smp_send_stop() was sufficient, but again, 2.2 vs. 2.4 is different. My list of things to do are: 1) Add the selective page dump code 2) Look into ACPI problem 3) Add fixes for module code to libklib if Tom doesn't do it first 4) Add CONFIG_DISCONTIGMEM code options 5) Add rest of alpha system dependent code --Matt > Its been a challenge but with the hardware I have I could not get mcore > working. > In order to get lkcd working I had to comment out code in > arch/i386/kernel/apic.c. > > void disable_local_APIC(void) > { > unsigned long value; > > clear_local_APIC(); > > /* > * Disable APIC (implies clearing of registers > * for 82489DX!). > */ > #ifdef notdef > value = apic_read(APIC_SPIV); > value &= ~(1<<8); > apic_write_around(APIC_SPIV, value); > #endif > > I'm not really sure why this works or what the side effects are. But if > I dont do it I the system will > hang trying to write out the dump header. > > -Larry Cohen From owner-lkcd@oss.sgi.com Sun Mar 25 18:26:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2Q2Qjp31821 for lkcd-outgoing; Sun, 25 Mar 2001 18:26:45 -0800 Received: from majestix.cmr.no (majestix.cmr.no [129.177.31.53]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2Q2QhM31815 for ; Sun, 25 Mar 2001 18:26:43 -0800 Received: from cmr.no (rusbrus.intra.cmr.no [10.0.1.14]) by majestix.cmr.no id EAA04959 for ; Mon, 26 Mar 2001 04:26:41 +0200 (CEST) Message-ID: <3ABEA8C3.5F2CA47@cmr.no> Date: Mon, 26 Mar 2001 04:26:11 +0200 From: "David J. M. Karlsen" Organization: Christian Michelsen Research AS X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2 i686) X-Accept-Language: no, en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: utils in .deb or at least .tgz Content-Type: multipart/alternative; boundary="------------309503FBFC2BEC6EEB5C0D27" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 1278 Lines: 34 --------------309503FBFC2BEC6EEB5C0D27 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit are these available in any of the given formats? is the 2.4.1 patch comp. with 2.4.2? -- --- David J. M. Karlsen [david@cmr.no] -*- http://datalab.intra.cmr.no fon: [+47] 55 57 43 29 -*- fax: [+47] 55 57 40 41 Christian Michelsen Research AS -*- http://www.cmr.no --------------309503FBFC2BEC6EEB5C0D27 Content-Type: text/html; charset=us-ascii Content-Transfer-Encoding: 7bit are these available in any of the given formats?

is the 2.4.1 patch comp. with 2.4.2?

-- 
---
David J. M. Karlsen [david@cmr.no]      -*-     http://datalab.intra.cmr.no
fon: [+47] 55 57 43 29                  -*-     fax: [+47] 55 57 40 41
Christian Michelsen Research AS         -*-     http://www.cmr.no
  --------------309503FBFC2BEC6EEB5C0D27-- From owner-lkcd@oss.sgi.com Tue Mar 27 09:37:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.3/8.11.3) id f2RHb4h03210 for lkcd-outgoing; Tue, 27 Mar 2001 09:37:04 -0800 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.3/8.11.3) with ESMTP id f2RHb3M03203 for ; Tue, 27 Mar 2001 09:37:03 -0800 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f2RIY5Z11595; Tue, 27 Mar 2001 10:34:06 -0800 Message-ID: <3AC0CC78.95B722C@alacritech.com> Date: Tue, 27 Mar 2001 09:23:04 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.2.17-14 i686) X-Accept-Language: en MIME-Version: 1.0 To: "David J. M. Karlsen" CC: lkcd@oss.sgi.com Subject: Re: utils in .deb or at least .tgz References: <3ABEA8C3.5F2CA47@cmr.no> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 683 Lines: 22 "David J. M. Karlsen" wrote: > > are these available in any of the given formats? > > is the 2.4.1 patch comp. with 2.4.2? There is no .deb package yet (I don't have any Debian packaging utilities installed anywhere), but you can get the latest lkcdutils off of the SourceForge tree: cvs -d:pserver:anonymous@cvs.lkcd.sourceforge.net:/cvsroot/lkcd login cvs -z3 -d:pserver:anonymous@cvs.lkcd.sourceforge.net:/cvsroot/lkcd co lkcdutils The 2.4.1 patch should be compatible with the 2.4.2 tree, with the possibility of some overlaps in the Makefiles. Nothing in the code should really need to change. Sorry for the delay in response. --Matt