From owner-lkcd@oss.sgi.com Tue Oct 2 12:59:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f92Jxbc23827 for lkcd-outgoing; Tue, 2 Oct 2001 12:59:37 -0700 Received: from palrel13.hp.com (palrel13.hp.com [156.153.255.238]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f92JxYD23824 for ; Tue, 2 Oct 2001 12:59:34 -0700 Received: from core.rose.hp.com (core.rose.hp.com [15.43.208.100]) by palrel13.hp.com (Postfix) with ESMTP id D3E821F8C7 for ; Tue, 2 Oct 2001 12:59:28 -0700 (PDT) Received: (from nava@localhost) by core.rose.hp.com (8.9.3 (PHNE_22672)/8.8.6 SMKit7.02) id NAA08351; Tue, 2 Oct 2001 13:00:50 -0700 (PDT) From: Nava Navaruparajah Message-Id: <200110022000.NAA08351@core.rose.hp.com> Subject: module debug supported? To: lkcd@oss.sgi.com Date: Tue, 02 Oct 2001 13:00:50 PDT Cc: nava@core.rose.hp.com X-Mailer: Elm [revision: 212.5] Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, I am new to lkcd and just installed it and tested on Redhat 7.1 and 2.4.2 base on a Pentium system. I just created the system panic with my module type driver. When I generated report I got the following: ================================================================ STACK TRACE FOR TASK: 0xc14c4000 (soe_mgr) 0 dump_execute+189 [0xc016adc9] 1 panic+131 [0xc0116b23] 2 panic+131 [0xc0116b23] TRACE ERROR 0x1 ================================================================ I think since my driver was a module, lcrash cound't find the symbols. Can you show me how I can use lcrash and lkcd to debug module type drivers. ARe there any documents to show the usage of lcrash and create new macros to generate traces. Thanks, Nava -- -------------------------------------------------------------------- Nava Navaruparajah Telephone: (916) 785 1647 Hewlett Packard, SIL 8000 Foothills Blvd., M/S 5668 Roseville, CA 95747 -------------------------------------------------------------------- From owner-lkcd@oss.sgi.com Tue Oct 2 15:30:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f92MUTv28888 for lkcd-outgoing; Tue, 2 Oct 2001 15:30:29 -0700 Received: from atlrel1.hp.com (atlrel1.hp.com [156.153.255.210]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f92MUQD28877 for ; Tue, 2 Oct 2001 15:30:26 -0700 Received: from core.rose.hp.com (core.rose.hp.com [15.43.208.100]) by atlrel1.hp.com (Postfix) with ESMTP id EC0BCA04 for ; Tue, 2 Oct 2001 18:30:24 -0400 (EDT) Received: (from nava@localhost) by core.rose.hp.com (8.9.3 (PHNE_22672)/8.8.6 SMKit7.02) id PAA14743 for lkcd@oss.sgi.com; Tue, 2 Oct 2001 15:31:47 -0700 (PDT) From: Nava Navaruparajah Message-Id: <200110022231.PAA14743@core.rose.hp.com> Subject: module debug supported? kernel module supported? To: lkcd@oss.sgi.com Date: Tue, 02 Oct 2001 15:31:46 PDT X-Mailer: Elm [revision: 212.5] Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, I am new to lkcd and just installed it and tested on Redhat 7.1 and 2.4.2 base on a Pentium system. I just created the system panic with my module type driver. When I generated report I got the following: ================================================================ STACK TRACE FOR TASK: 0xc14c4000 (soe_mgr) 0 dump_execute+189 [0xc016adc9] 1 panic+131 [0xc0116b23] 2 panic+131 [0xc0116b23] TRACE ERROR 0x1 ================================================================ Since my driver is kernel a module, I think, lcrash cound't find the symbols. Can someone help me how I can use lcrash and lkcd to debug module type drivers. Are there any documents to show the usage of lcrash and create new macros to generate traces?. Thanks, Nava -- -------------------------------------------------------------------- Nava Navaruparajah Telephone: (916) 785 1647 Hewlett Packard, SIL 8000 Foothills Blvd., M/S 5668 Roseville, CA 95747 -------------------------------------------------------------------- -- -------------------------------------------------------------------- Nava Navaruparajah Telephone: (916) 785 1647 Hewlett Packard, SIL 8000 Foothills Blvd., M/S 5601 Roseville, CA 95747 -------------------------------------------------------------------- From owner-lkcd@oss.sgi.com Wed Oct 3 03:29:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f93ATFX11083 for lkcd-outgoing; Wed, 3 Oct 2001 03:29:15 -0700 Received: from e2.ny.us.ibm.com (e2.ny.us.ibm.com [32.97.182.102]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f93AT7D11080 for ; Wed, 3 Oct 2001 03:29:07 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e2.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id GAA172404 for ; Wed, 3 Oct 2001 06:26:38 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f93AReX72172 for ; Wed, 3 Oct 2001 06:27:41 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f93ARvk13767 for lkcd@oss.sgi.com; Wed, 3 Oct 2001 15:57:57 +0530 Date: Wed, 3 Oct 2001 15:57:56 +0530 From: Bharata B Rao To: lkcd@oss.sgi.com Subject: [PATCH] fix for incomplete backtrace due to wrong esp value + some cleanups Message-ID: <20011003155756.A13611@in.ibm.com> Reply-To: bharata@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, Here is a patch for dump_base.c and dump_i386.c which does the following: 1. When dump is triggerred from kernel traps(and when regs pointer is valid), the current method of getting esp value is not correct. Currently, esp is collected as dha->dha_regs.esp = (unsigned long) (regs + 1); Here you go down the stack by entire pt_regs structure. In case of kernel mode traps, since stack switch doesn't happen, the esp and xss members of pt_regs won't be filled up by the processor. So here you are going down the stack by additional 8 bytes. This has been fixed in the patch below. I have seen that this eliminates the TRACE ERROR that is observed when bt command is used in lcrash. 2. Move the arch specific inline assembler code in dump_base.c to dump_i386.c 3. Since collecting of esp/eip happens twice move it into __dump_save_panic_regs(), which is called twice. Let me know if I can commit these into cvs. Regards, Bharata. -- drivers/dump/dump_base.c --- lkcd_cvs/2.4/drivers/dump/dump_base.c Wed Sep 26 15:29:25 2001 +++ dump_base.c Wed Oct 3 15:37:37 2001 @@ -912,16 +912,7 @@ * So for that reason, we save the eip/esp * now so we can re-build the trace later. */ - __asm__ __volatile__(" - pushl %%ecx\n - movl %%esp, %%ecx\n - movl %%ecx, %0\n - popl %%ecx\n" - : "=g" (dump_header_asm.dha_esp) - ); - __asm__ __volatile__("pushl %ecx\n"); __dump_save_panic_regs(&dump_header_asm); - __asm__ __volatile__("popl %ecx\n"); #endif /* update header to disk for the last time */ drivers/dump/dump_i386.c --- lkcd_cvs/2.4/drivers/dump/dump_i386.c Tue Sep 25 10:47:03 2001 +++ dump_i386.c Wed Oct 3 15:35:34 2001 @@ -34,23 +34,20 @@ /* * Name: __dump_save_panic_regs() * Func: Save the EIP (really the RA). We may pass an argument later. + * Save ESP also here. */ -void +inline void __dump_save_panic_regs(dump_header_asm_t *dha) { + __asm__ __volatile__("movl %%esp, %0\n" + : "=r" (dha->dha_esp)); /* hate to do this, but ... */ #ifdef CONFIG_FRAME_POINTER - __asm__ __volatile__(" - movl 4(%%esp), %%ecx\n - movl %%ecx, %0\n" - : "=g" (dha->dha_eip) - ); + __asm__ __volatile__("movl 4(%%esp), %0\n" + : "=r" (dha->dha_eip)); #else - __asm__ __volatile__(" - movl (%%esp), %%ecx\n - movl %%ecx, %0\n" - : "=g" (dha->dha_eip) - ); + __asm__ __volatile__("movl (%%esp), %0\n" + : "=r" (dha->dha_eip)); #endif } @@ -62,20 +59,11 @@ __dump_configure_header(dump_header_asm_t *dha, struct pt_regs *regs) { /* save the dump specific esp/eip */ - __asm__ __volatile__(" - pushl %%ecx\n - movl %%esp, %%ecx\n - movl %%ecx, %0\n - popl %%ecx\n" - : "=g" (dha->dha_esp) - ); - __asm__ __volatile__("pushl %ecx\n"); __dump_save_panic_regs(dha); - __asm__ __volatile__("popl %ecx\n"); /* one final check -- modify if we're in user mode */ if ((regs) && (!user_mode(regs))) { - dha->dha_regs.esp = (unsigned long) (regs + 1); + dha->dha_regs.esp = (unsigned long) &(regs->esp); } return (1); From owner-lkcd@oss.sgi.com Wed Oct 3 10:09:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f93H9h119495 for lkcd-outgoing; Wed, 3 Oct 2001 10:09:43 -0700 Received: from philotas.hosting.pacbell.net (philotas.hosting.pacbell.net [216.100.99.24]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f93H9fD19492 for ; Wed, 3 Oct 2001 10:09:41 -0700 Received: from KimPC (LE-WIZCOMMUNICATIONS-cust-2196412.cust-rtr.pacbell.net [63.199.86.102]) by philotas.hosting.pacbell.net id NAA25700; Wed, 3 Oct 2001 13:09:40 -0400 (EDT) [ConcentricHost SMTP Relay 1.7] From: "Kim Le" To: Subject: trying to configure LKCD need help Date: Wed, 3 Oct 2001 10:08:20 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi All, I am trying to configure LKCD for my system (Redhat 6.2) by following instruction from LKCD site. I tested it by openning /dev/vmdump and got correct message (as instructed) but was not able to get any thing after my system crash and reset. The dump file in /var/log/vmdump/ is only 796 bytes and there is an error message "Could not save dump data" displayed at boot time. Has anyone experience this problem? Will LKCD work if system completely hang. Thanks a lot in advance for any help. Kim Ps: I used /dev/hda8 for my swap partition. ln -s /dev/hda8 /dev/vmdump From owner-lkcd@oss.sgi.com Wed Oct 3 22:55:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f945t5402004 for lkcd-outgoing; Wed, 3 Oct 2001 22:55:05 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f945sxD01996 for ; Wed, 3 Oct 2001 22:54:59 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f945tm910755; Wed, 3 Oct 2001 22:55:48 -0700 Date: Wed, 3 Oct 2001 22:55:48 -0700 (PDT) From: "Matt D. Robinson" To: Nava Navaruparajah cc: Subject: Re: module debug supported? kernel module supported? In-Reply-To: <200110022231.PAA14743@core.rose.hp.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 2 Oct 2001, Nava Navaruparajah wrote: |>Hi, |> |> I am new to lkcd and just installed it and tested on Redhat 7.1 and |>2.4.2 base on a Pentium system. I just created the system panic with |>my module type driver. When I generated report I got the following: |>================================================================ |>STACK TRACE FOR TASK: 0xc14c4000 (soe_mgr) |> |> 0 dump_execute+189 [0xc016adc9] |> 1 panic+131 [0xc0116b23] |> 2 panic+131 [0xc0116b23] |>TRACE ERROR 0x1 |>================================================================ |> |>Since my driver is kernel a module, I think, lcrash cound't find the symbols. |>Can someone help me how I can use lcrash and lkcd to debug module type drivers. |>Are there any documents to show the usage of lcrash and create new |>macros to generate traces?. |> |>Thanks, |>Nava Hi, Nava. Can you tell me which version of LKCD you've downloaded, and which version of 'lcrash' you're using? At least one of the issues you're seeing has been fixed in the CVS tree. --Matt From owner-lkcd@oss.sgi.com Wed Oct 3 22:54:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f945sxV01997 for lkcd-outgoing; Wed, 3 Oct 2001 22:54:59 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f945snD01990 for ; Wed, 3 Oct 2001 22:54:50 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f945vvX10760; Wed, 3 Oct 2001 22:57:57 -0700 Date: Wed, 3 Oct 2001 22:57:57 -0700 (PDT) From: "Matt D. Robinson" To: Bharata B Rao cc: Subject: Re: [PATCH] fix for incomplete backtrace due to wrong esp value + some cleanups In-Reply-To: <20011003155756.A13611@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Go ahead and check it in, Bharata. Again, it was two years ago, but the __asm__ required then to use a register for moving %esp at one point, otherwise I wouldn't have used %ecx. :) One more 'cvs update' ... --Matt On Wed, 3 Oct 2001, Bharata B Rao wrote: |>Matt, |>Here is a patch for dump_base.c and dump_i386.c which does the following: |> |>1. When dump is triggerred from kernel traps(and when regs pointer is valid), |>the current method of getting esp value is not correct. |>Currently, esp is collected as dha->dha_regs.esp = (unsigned long) (regs + 1); |>Here you go down the stack by entire pt_regs structure. In case of kernel |>mode traps, since stack switch doesn't happen, the esp and xss members of |>pt_regs won't be filled up by the processor. So here you are going down the |>stack by additional 8 bytes. This has been fixed in the patch below. |> |>I have seen that this eliminates the TRACE ERROR that is observed when |>bt command is used in lcrash. |> |>2. Move the arch specific inline assembler code in dump_base.c to dump_i386.c |> |>3. Since collecting of esp/eip happens twice move it into |>__dump_save_panic_regs(), which is called twice. |> |>Let me know if I can commit these into cvs. |> |>Regards, |>Bharata. |>-- |> |>drivers/dump/dump_base.c |>--- lkcd_cvs/2.4/drivers/dump/dump_base.c Wed Sep 26 15:29:25 2001 |>+++ dump_base.c Wed Oct 3 15:37:37 2001 |>@@ -912,16 +912,7 @@ |> * So for that reason, we save the eip/esp |> * now so we can re-build the trace later. |> */ |>- __asm__ __volatile__(" |>- pushl %%ecx\n |>- movl %%esp, %%ecx\n |>- movl %%ecx, %0\n |>- popl %%ecx\n" |>- : "=g" (dump_header_asm.dha_esp) |>- ); |>- __asm__ __volatile__("pushl %ecx\n"); |> __dump_save_panic_regs(&dump_header_asm); |>- __asm__ __volatile__("popl %ecx\n"); |> #endif |> |> /* update header to disk for the last time */ |> |> |>drivers/dump/dump_i386.c |>--- lkcd_cvs/2.4/drivers/dump/dump_i386.c Tue Sep 25 10:47:03 2001 |>+++ dump_i386.c Wed Oct 3 15:35:34 2001 |>@@ -34,23 +34,20 @@ |> /* |> * Name: __dump_save_panic_regs() |> * Func: Save the EIP (really the RA). We may pass an argument later. |>+ * Save ESP also here. |> */ |>-void |>+inline void |> __dump_save_panic_regs(dump_header_asm_t *dha) |> { |>+ __asm__ __volatile__("movl %%esp, %0\n" |>+ : "=r" (dha->dha_esp)); |> /* hate to do this, but ... */ |> #ifdef CONFIG_FRAME_POINTER |>- __asm__ __volatile__(" |>- movl 4(%%esp), %%ecx\n |>- movl %%ecx, %0\n" |>- : "=g" (dha->dha_eip) |>- ); |>+ __asm__ __volatile__("movl 4(%%esp), %0\n" |>+ : "=r" (dha->dha_eip)); |> #else |>- __asm__ __volatile__(" |>- movl (%%esp), %%ecx\n |>- movl %%ecx, %0\n" |>- : "=g" (dha->dha_eip) |>- ); |>+ __asm__ __volatile__("movl (%%esp), %0\n" |>+ : "=r" (dha->dha_eip)); |> #endif |> } |> |>@@ -62,20 +59,11 @@ |> __dump_configure_header(dump_header_asm_t *dha, struct pt_regs *regs) |> { |> /* save the dump specific esp/eip */ |>- __asm__ __volatile__(" |>- pushl %%ecx\n |>- movl %%esp, %%ecx\n |>- movl %%ecx, %0\n |>- popl %%ecx\n" |>- : "=g" (dha->dha_esp) |>- ); |>- __asm__ __volatile__("pushl %ecx\n"); |> __dump_save_panic_regs(dha); |>- __asm__ __volatile__("popl %ecx\n"); |> |> /* one final check -- modify if we're in user mode */ |> if ((regs) && (!user_mode(regs))) { |>- dha->dha_regs.esp = (unsigned long) (regs + 1); |>+ dha->dha_regs.esp = (unsigned long) &(regs->esp); |> } |> |> return (1); |> From owner-lkcd@oss.sgi.com Wed Oct 3 23:18:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f946IiU02394 for lkcd-outgoing; Wed, 3 Oct 2001 23:18:44 -0700 Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f946IbD02390 for ; Wed, 3 Oct 2001 23:18:38 -0700 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id IAA186980; Thu, 4 Oct 2001 08:18:29 +0200 Received: from d12ml004.de.ibm.com (d12ml004_cs0 [9.165.223.50]) by d12relay02.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f946ITW21768; Thu, 4 Oct 2001 08:18:29 +0200 Importance: Normal Subject: Re: module debug supported? kernel module supported? To: Nava Navaruparajah Cc: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.3 March 21, 2000 Message-ID: From: "Michael Holzheu" Date: Thu, 4 Oct 2001 08:17:21 +0200 X-MIMETrack: Serialize by Router on D12ML004/12/M/IBM(Release 5.0.8 |June 18, 2001) at 04/10/2001 08:18:28 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Nava, There is module support for lcrash in the latest SF CVS version. You can add module symbols using the following command: > nm mymodule.o > mymodule.lst > lcrash ... >> symtab -a mymodule.lst mymodule There is also an additional command "module" which can list all kernel modules and their symbols. Michael ------------------------------------------------------------------------ Linux/390 Development Phone: +49-7031-16-2360, Bld 71032-06-109 Email: holzheu@de.ibm.com Nava Navaruparajah @oss.sgi.com on 10/03/2001 12:31:46 AM Please respond to Nava Navaruparajah Sent by: owner-lkcd@oss.sgi.com To: lkcd@oss.sgi.com cc: Subject: module debug supported? kernel module supported? Hi, I am new to lkcd and just installed it and tested on Redhat 7.1 and 2.4.2 base on a Pentium system. I just created the system panic with my module type driver. When I generated report I got the following: ================================================================ STACK TRACE FOR TASK: 0xc14c4000 (soe_mgr) 0 dump_execute+189 [0xc016adc9] 1 panic+131 [0xc0116b23] 2 panic+131 [0xc0116b23] TRACE ERROR 0x1 ================================================================ Since my driver is kernel a module, I think, lcrash cound't find the symbols. Can someone help me how I can use lcrash and lkcd to debug module type drivers. Are there any documents to show the usage of lcrash and create new macros to generate traces?. Thanks, Nava -- -------------------------------------------------------------------- Nava Navaruparajah Telephone: (916) 785 1647 Hewlett Packard, SIL 8000 Foothills Blvd., M/S 5668 Roseville, CA 95747 -------------------------------------------------------------------- -- -------------------------------------------------------------------- Nava Navaruparajah Telephone: (916) 785 1647 Hewlett Packard, SIL 8000 Foothills Blvd., M/S 5601 Roseville, CA 95747 -------------------------------------------------------------------- From owner-lkcd@oss.sgi.com Thu Oct 4 01:54:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f948sjw05602 for lkcd-outgoing; Thu, 4 Oct 2001 01:54:45 -0700 Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f948sZD05598 for ; Thu, 4 Oct 2001 01:54:35 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id EAA377230; Thu, 4 Oct 2001 04:51:54 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f948sCb82782; Thu, 4 Oct 2001 04:54:13 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f948sBS16062; Thu, 4 Oct 2001 14:24:11 +0530 Date: Thu, 4 Oct 2001 14:24:11 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: lkcd@oss.sgi.com Subject: Re: [PATCH] fix for incomplete backtrace due to wrong esp value + some cleanups Message-ID: <20011004142411.A15722@in.ibm.com> Reply-To: bharata@in.ibm.com References: <20011003155756.A13611@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from yakker@aparity.com on Wed, Oct 03, 2001 at 10:57:57PM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk This code is now checked into cvs.(dump_base.c and dump_i386.c) Regards, Bharata. On Wed, Oct 03, 2001 at 10:57:57PM -0700, Matt D. Robinson wrote: > Go ahead and check it in, Bharata. Again, it was two years > ago, but the __asm__ required then to use a register for > moving %esp at one point, otherwise I wouldn't have used %ecx. :) > > One more 'cvs update' ... > > --Matt > > On Wed, 3 Oct 2001, Bharata B Rao wrote: > |>Matt, > |>Here is a patch for dump_base.c and dump_i386.c which does the following: > |> > |>1. When dump is triggerred from kernel traps(and when regs pointer is valid), > |>the current method of getting esp value is not correct. > |>Currently, esp is collected as dha->dha_regs.esp = (unsigned long) (regs + 1); > |>Here you go down the stack by entire pt_regs structure. In case of kernel > |>mode traps, since stack switch doesn't happen, the esp and xss members of > |>pt_regs won't be filled up by the processor. So here you are going down the > |>stack by additional 8 bytes. This has been fixed in the patch below. > |> > |>I have seen that this eliminates the TRACE ERROR that is observed when > |>bt command is used in lcrash. > |> > |>2. Move the arch specific inline assembler code in dump_base.c to dump_i386.c > |> > |>3. Since collecting of esp/eip happens twice move it into > |>__dump_save_panic_regs(), which is called twice. > |> > |>Let me know if I can commit these into cvs. > |> > |>Regards, > |>Bharata. > |>-- > |> > |>drivers/dump/dump_base.c > |>--- lkcd_cvs/2.4/drivers/dump/dump_base.c Wed Sep 26 15:29:25 2001 > |>+++ dump_base.c Wed Oct 3 15:37:37 2001 > |>@@ -912,16 +912,7 @@ > |> * So for that reason, we save the eip/esp > |> * now so we can re-build the trace later. > |> */ > |>- __asm__ __volatile__(" > |>- pushl %%ecx\n > |>- movl %%esp, %%ecx\n > |>- movl %%ecx, %0\n > |>- popl %%ecx\n" > |>- : "=g" (dump_header_asm.dha_esp) > |>- ); > |>- __asm__ __volatile__("pushl %ecx\n"); > |> __dump_save_panic_regs(&dump_header_asm); > |>- __asm__ __volatile__("popl %ecx\n"); > |> #endif > |> > |> /* update header to disk for the last time */ > |> > |> > |>drivers/dump/dump_i386.c > |>--- lkcd_cvs/2.4/drivers/dump/dump_i386.c Tue Sep 25 10:47:03 2001 > |>+++ dump_i386.c Wed Oct 3 15:35:34 2001 > |>@@ -34,23 +34,20 @@ > |> /* > |> * Name: __dump_save_panic_regs() > |> * Func: Save the EIP (really the RA). We may pass an argument later. > |>+ * Save ESP also here. > |> */ > |>-void > |>+inline void > |> __dump_save_panic_regs(dump_header_asm_t *dha) > |> { > |>+ __asm__ __volatile__("movl %%esp, %0\n" > |>+ : "=r" (dha->dha_esp)); > |> /* hate to do this, but ... */ > |> #ifdef CONFIG_FRAME_POINTER > |>- __asm__ __volatile__(" > |>- movl 4(%%esp), %%ecx\n > |>- movl %%ecx, %0\n" > |>- : "=g" (dha->dha_eip) > |>- ); > |>+ __asm__ __volatile__("movl 4(%%esp), %0\n" > |>+ : "=r" (dha->dha_eip)); > |> #else > |>- __asm__ __volatile__(" > |>- movl (%%esp), %%ecx\n > |>- movl %%ecx, %0\n" > |>- : "=g" (dha->dha_eip) > |>- ); > |>+ __asm__ __volatile__("movl (%%esp), %0\n" > |>+ : "=r" (dha->dha_eip)); > |> #endif > |> } > |> > |>@@ -62,20 +59,11 @@ > |> __dump_configure_header(dump_header_asm_t *dha, struct pt_regs *regs) > |> { > |> /* save the dump specific esp/eip */ > |>- __asm__ __volatile__(" > |>- pushl %%ecx\n > |>- movl %%esp, %%ecx\n > |>- movl %%ecx, %0\n > |>- popl %%ecx\n" > |>- : "=g" (dha->dha_esp) > |>- ); > |>- __asm__ __volatile__("pushl %ecx\n"); > |> __dump_save_panic_regs(dha); > |>- __asm__ __volatile__("popl %ecx\n"); > |> > |> /* one final check -- modify if we're in user mode */ > |> if ((regs) && (!user_mode(regs))) { > |>- dha->dha_regs.esp = (unsigned long) (regs + 1); > |>+ dha->dha_regs.esp = (unsigned long) &(regs->esp); > |> } > |> > |> return (1); > |> -- Bharata B Rao, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Thu Oct 4 12:52:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f94JqLQ21707 for lkcd-outgoing; Thu, 4 Oct 2001 12:52:21 -0700 Received: from ducati.amazon.com (ducati.amazon.com [209.191.164.152]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f94JqFD21701 for ; Thu, 4 Oct 2001 12:52:15 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by ducati.amazon.com (Postfix) with ESMTP id 95515639 for ; Thu, 4 Oct 2001 12:52:12 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id 93B2648062 for ; Thu, 4 Oct 2001 12:52:10 -0700 (PDT) From: "Monty Vanderbilt" To: Subject: Patch to prevent double free in lcrash Date: Thu, 4 Oct 2001 12:52:10 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal Sender: owner-lkcd@oss.sgi.com Precedence: bulk This patch prevents a segment violation from a double free when lcrash can't read data from /dev/mem. --- libklib/kl_util.c Wed Sep 12 12:21:21 2001 +++ libklib/kl_util.c Tue Oct 2 12:54:10 2001 @@ -466,8 +466,9 @@ &size, ptr_module)){ kl_free_block(dump_page); if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } *vaddr= addr_mod; @@ -484,8 +485,9 @@ &size, ptr_module)){ kl_free_block(dump_page); if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } mod_found = 1; @@ -495,8 +497,9 @@ kl_free_block(dump_page); if(!mod_found){ if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } return(0); @@ -534,8 +537,9 @@ GET_BLOCK(vaddr, *size, *ptr); if (KL_ERROR) { if(free_ptr){ kl_free_block(*ptr); + *ptr = 0; } return(1); } } else { Monty VanderBilt From owner-lkcd@oss.sgi.com Fri Oct 5 00:10:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f957A3t30551 for lkcd-outgoing; Fri, 5 Oct 2001 00:10:03 -0700 Received: from d12lmsgate-2.de.ibm.com (d12lmsgate-2.de.ibm.com [195.212.91.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9579qD30533 for ; Fri, 5 Oct 2001 00:09:52 -0700 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate-2.de.ibm.com (1.0.0) with ESMTP id JAA85324; Fri, 5 Oct 2001 09:09:44 +0200 Received: from d12ml004.de.ibm.com (d12ml004_cs0 [9.165.223.50]) by d12relay02.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f9579hF28250; Fri, 5 Oct 2001 09:09:43 +0200 Importance: Normal Subject: Re: Patch to prevent double free in lcrash To: "Monty Vanderbilt" Cc: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.3 March 21, 2000 Message-ID: From: "Michael Holzheu" Date: Fri, 5 Oct 2001 09:08:36 +0200 X-MIMETrack: Serialize by Router on D12ML004/12/M/IBM(Release 5.0.8 |June 18, 2001) at 05/10/2001 09:09:42 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Thanks Monty, I checked in the following, since free_ptr_module is always 0 in kl_get_module(): --- kl_util.c 2001/09/12 19:21:21 1.5 +++ kl_util.c 2001/10/05 07:03:55 @@ -426,7 +426,7 @@ { syment_t *sym_module_list = NULL; void *dump_page = NULL; - int free_ptr_module = 0, mod_found = 0; + int mod_found = 0; kaddr_t dump_modname = 0; kaddr_t addr_mod = 0; size_t size=0; @@ -465,9 +465,6 @@ if(kl_get_structure(addr_mod, "module", &size, ptr_module)){ kl_free_block(dump_page); - if(free_ptr_module){ - kl_free_block(*ptr_module); - } return(1); } *vaddr= addr_mod; @@ -483,9 +480,6 @@ if(kl_get_structure(addr_mod, "module", &size, ptr_module)){ kl_free_block(dump_page); - if(free_ptr_module){ - kl_free_block(*ptr_module); - } return(1); } mod_found = 1; @@ -494,9 +488,6 @@ kl_free_block(dump_page); if(!mod_found){ - if(free_ptr_module){ - kl_free_block(*ptr_module); - } return(1); } return(0); @@ -535,6 +526,7 @@ if (KL_ERROR) { if(free_ptr){ kl_free_block(*ptr); + *ptr = NULL; } return(1); } Regards Michael ------------------------------------------------------------------------ Linux/390 Development Phone: +49-7031-16-2360, Bld 71032-06-109 Email: holzheu@de.ibm.com "Monty Vanderbilt" @oss.sgi.com on 10/04/2001 09:52:10 PM Please respond to "Monty Vanderbilt" Sent by: owner-lkcd@oss.sgi.com To: cc: Subject: Patch to prevent double free in lcrash This patch prevents a segment violation from a double free when lcrash can't read data from /dev/mem. --- libklib/kl_util.c Wed Sep 12 12:21:21 2001 +++ libklib/kl_util.c Tue Oct 2 12:54:10 2001 @@ -466,8 +466,9 @@ &size, ptr_module)){ kl_free_block(dump_page); if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } *vaddr= addr_mod; @@ -484,8 +485,9 @@ &size, ptr_module)){ kl_free_block(dump_page); if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } mod_found = 1; @@ -495,8 +497,9 @@ kl_free_block(dump_page); if(!mod_found){ if(free_ptr_module){ kl_free_block(*ptr_module); + *ptr_module = 0; } return(1); } return(0); @@ -534,8 +537,9 @@ GET_BLOCK(vaddr, *size, *ptr); if (KL_ERROR) { if(free_ptr){ kl_free_block(*ptr); + *ptr = 0; } return(1); } } else { Monty VanderBilt From owner-lkcd@oss.sgi.com Fri Oct 5 00:17:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f957HJA30634 for lkcd-outgoing; Fri, 5 Oct 2001 00:17:19 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f957H8D30629 for ; Fri, 5 Oct 2001 00:17:08 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f957LXg12279; Fri, 5 Oct 2001 00:21:33 -0700 Date: Fri, 5 Oct 2001 00:21:32 -0700 (PDT) From: "Matt D. Robinson" To: Michael Holzheu cc: Monty Vanderbilt , Subject: Re: Patch to prevent double free in lcrash In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Thanks for catching this before me, Michael. Want me to spin 4.0-2? --Matt On Fri, 5 Oct 2001, Michael Holzheu wrote: |>Thanks Monty, |> |>I checked in the following, since free_ptr_module is always 0 in |>kl_get_module(): |> |> |>--- kl_util.c 2001/09/12 19:21:21 1.5 |>+++ kl_util.c 2001/10/05 07:03:55 |>@@ -426,7 +426,7 @@ |> { |> syment_t *sym_module_list = NULL; |> void *dump_page = NULL; |>- int free_ptr_module = 0, mod_found = 0; |>+ int mod_found = 0; |> kaddr_t dump_modname = 0; |> kaddr_t addr_mod = 0; |> size_t size=0; |>@@ -465,9 +465,6 @@ |> if(kl_get_structure(addr_mod, "module", |> &size, ptr_module)){ |> kl_free_block(dump_page); |>- if(free_ptr_module){ |>- kl_free_block(*ptr_module); |>- } |> return(1); |> } |> *vaddr= addr_mod; |>@@ -483,9 +480,6 @@ |> if(kl_get_structure(addr_mod, "module", |> &size, ptr_module)){ |> kl_free_block(dump_page); |>- if(free_ptr_module){ |>- kl_free_block(*ptr_module); |>- } |> return(1); |> } |> mod_found = 1; |>@@ -494,9 +488,6 @@ |> |> kl_free_block(dump_page); |> if(!mod_found){ |>- if(free_ptr_module){ |>- kl_free_block(*ptr_module); |>- } |> return(1); |> } |> return(0); |>@@ -535,6 +526,7 @@ |> if (KL_ERROR) { |> if(free_ptr){ |> kl_free_block(*ptr); |>+ *ptr = NULL; |> } |> return(1); |> } |> |> |>Regards |> |> Michael |> |>------------------------------------------------------------------------ |>Linux/390 Development |>Phone: +49-7031-16-2360, Bld 71032-06-109 |>Email: holzheu@de.ibm.com |> |> |>"Monty Vanderbilt" @oss.sgi.com on 10/04/2001 09:52:10 PM |> |>Please respond to "Monty Vanderbilt" |> |>Sent by: owner-lkcd@oss.sgi.com |> |> |>To: |>cc: |>Subject: Patch to prevent double free in lcrash |> |> |> |>This patch prevents a segment violation from a double free when lcrash |>can't |>read data from /dev/mem. |> |>--- libklib/kl_util.c Wed Sep 12 12:21:21 2001 |>+++ libklib/kl_util.c Tue Oct 2 12:54:10 2001 |>@@ -466,8 +466,9 @@ |> &size, ptr_module)){ |> kl_free_block(dump_page); |> if(free_ptr_module){ |> kl_free_block(*ptr_module); |>+ *ptr_module = 0; |> } |> return(1); |> } |> *vaddr= addr_mod; |>@@ -484,8 +485,9 @@ |> &size, ptr_module)){ |> kl_free_block(dump_page); |> if(free_ptr_module){ |> kl_free_block(*ptr_module); |>+ *ptr_module = 0; |> } |> return(1); |> } |> mod_found = 1; |>@@ -495,8 +497,9 @@ |> kl_free_block(dump_page); |> if(!mod_found){ |> if(free_ptr_module){ |> kl_free_block(*ptr_module); |>+ *ptr_module = 0; |> } |> return(1); |> } |> return(0); |>@@ -534,8 +537,9 @@ |> GET_BLOCK(vaddr, *size, *ptr); |> if (KL_ERROR) { |> if(free_ptr){ |> kl_free_block(*ptr); |>+ *ptr = 0; |> } |> return(1); |> } |> } else { |> |>Monty VanderBilt |> |> |> |> From owner-lkcd@oss.sgi.com Fri Oct 5 11:26:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f95IQcV12922 for lkcd-outgoing; Fri, 5 Oct 2001 11:26:38 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f95IQZD12917 for ; Fri, 5 Oct 2001 11:26:35 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f95IQd220153; Fri, 5 Oct 2001 11:26:40 -0700 Message-ID: <3BBDFC5F.BDF0A349@alacritech.com> Date: Fri, 05 Oct 2001 11:30:55 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Kim Le CC: lkcd@oss.sgi.com Subject: Re: trying to configure LKCD need help References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Kim Le wrote: > > Hi All, > > I am trying to configure LKCD for my system (Redhat 6.2) by following > instruction from LKCD site. I tested it by openning /dev/vmdump and got > correct > message (as instructed) but was not able to get any thing after my system > crash and reset. > The dump file in /var/log/vmdump/ is only 796 bytes and there is an error > message "Could not save dump data" displayed at boot time. > > Has anyone experience this problem? Will LKCD work if system completely > hang. > Thanks a lot in advance for any help. > > Kim > > Ps: I used /dev/hda8 for my swap partition. > ln -s /dev/hda8 /dev/vmdump Hi, Kim. If you try to run 'lcrash' against /dev/vmdump as your vmdump file ('lcrash -d /boot/System.map /dev/vmdump /boot/Kerntypes'), what does it report? Also, can you report what the output for: dd if=/dev/hda8 bs=1 count=1000 skip=4096 | od -x is? This will give a little more insight as to what's in the dump header. --Matt From owner-lkcd@oss.sgi.com Fri Oct 5 15:19:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f95MJZj23312 for lkcd-outgoing; Fri, 5 Oct 2001 15:19:35 -0700 Received: from e33.bld.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f95MJYD23309 for ; Fri, 5 Oct 2001 15:19:34 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e33.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id RAA42418 for ; Fri, 5 Oct 2001 17:17:09 -0500 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f95MIHH230524 for ; Fri, 5 Oct 2001 16:18:17 -0600 Importance: Normal Subject: Question on lcrash usage To: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Fri, 5 Oct 2001 15:18:22 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/05/2001 04:18:16 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk First, if this is the WRONG list to be asking usage questions... please direct me to the correct place. Ok, if you've read this far... Is there a nice (and easy) way to find user address space pages? (or do I have to walk the page tables myself?) - jim From owner-lkcd@oss.sgi.com Mon Oct 8 06:47:14 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f98DlE814694 for lkcd-outgoing; Mon, 8 Oct 2001 06:47:14 -0700 Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f98Dl2D14691 for ; Mon, 8 Oct 2001 06:47:03 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id JAA138120; Mon, 8 Oct 2001 09:44:28 -0400 Received: from sparklet.in.ibm.com (sparklet.in.ibm.com [9.186.133.17]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f98DkhY74620; Mon, 8 Oct 2001 09:46:44 -0400 Received: (from suparna@localhost) by sparklet.in.ibm.com (8.11.0/8.11.0) id f98JN5708176; Mon, 8 Oct 2001 14:23:05 -0500 Date: Mon, 8 Oct 2001 14:23:03 -0500 From: Suparna Bhattacharya To: yakker@alacritech.com Cc: lkcd@oss.sgi.com Subject: Dump driver interface Message-ID: <20011008142303.A1275@in.ibm.com> Reply-To: suparna@in.ibm.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from bsuparna@in.ibm.com on Mon, Oct 08, 2001 at 01:16:56PM +0530 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, To carry on some of the discussions you'd initiated about having a block device dump interface. A question that comes to mind when comparing the proposed interface with the AIX ddump interface, is figuring out the right place to implement some the tasks that a driver needs to perform in order to implement dump support. Obviously we do want to keep the interface as simple and small as possible (to make acceptance easier), and from that angle what you've proposed sounds attractive. At the same time in the context of some of the past discussions just wanted to explore if there are any advantages of splitting up the interface and abstracting some common processing out of the driver into the standard dump code. Looking for inputs/thoughts on this point to understand if its worth making the interface more granular. I'm not sure what the right answer is. Another question is how this interface would be extended for network dumps. Anyway, here goes: The various steps involved in dumping from the perspective of the driver: 1. Set aside resources/memory required for dump time (because the dump code can't share any state with the normal driver path). This could happen: - during driver/device initialization, or - during dump device configuration The latter happens only for configured dump devices, rather than all active devices that support dump. However, it requires an interface into the driver to be called when dump device configuration is taking place. If we have a dedicated dump device, the device initialization could happen at dump configuration time, so no separate interface would be required. 2. Making the device ready for dump, just before starting dump writes. If the device was in active use by the system then this step could involve suspending/quiescing any existing i/o and making the h/w device ready for i/o (could involve device resets in some cases for disruptive dumps), and associated setup for non-interrupt based i/o. Again, if this is a dedicated dump device then it is likely to be ready for dump, though there could be some aspects like bus state to think of in certain panic circumstances. 3. Initiating dump writes to the device (without waiting for completion) There would be a limit on how much i/o can be pushed to the driver/device in one go, without waiting for i/o completion Would use non-interrupt based mode of i/o 4. Checking for completion of earlier dump i/o. May have a timeout upto which to try before returning. 5. Poll in a loop waiting for (4) to succeed (within certain timeout bounds), and submit next batch dump writes (3) and keep repeating till dump is complete. 6. Release device for normal use (opposite of 2), once dump is complete and has been saved in a safe location (in the non-disruptive case) 7. Release resources set aside for dumping if dump device is unconfigured (or device is unconfigured on the system) With bd_op->dump() steps 3-5 happen in one shot every time the dump buffer is written out. Step 2 might happen (if needed/appropriate) once depending on a state flag maintained by the driver. Step 6, I'm not sure about - could be some way to do this the next time the request function is invoked for the device. Step 1 could happen during device initialization and 7 during device shutdown. (Matt, Let me know if I've guessed any of this wrong) Could there be an advantage of pulling up the loop in 5 into the common dump code instead and have separate interfaces for 3 and 4 (could even be a simple rw flag to bd_op->dump()) ? Besides some commonality of code, the poll routine could perform some additional checks / activity (e.g touch_nmi_watchdog()), and with multiple dump devices, some degree of parallelism in i/o can be achieved depending on the kind of h/w support. In general building higher level functionality/infrastructure/protocol support could be a little easier. [BTW, the fact that we send i/o in units of the dump buffer means that some room already exists for some additional checks, but the level of control is sharper with the split approach.] How useful is that ? Any other comments / observations / experiences to share ? Regards Suparna Matt Robinson wrote: > > I would imagine something > like the following: > > > struct block_device_operations { > int (*open) (struct inode *, struct file *); > int (*release) (struct inode *, struct file *); > int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); > int (*check_media_change) (kdev_t); > int (*revalidate) (kdev_t); > int (*dump) (struct inode *, struct file *, const char *, size_t, > loff_t > *); > }; > > It should use the write() constructs (similar to file_operations), but do > any kind of polling required within the function itself. > > IRIX used to call this bddump(), which was an alias to bwrite(), with > DIRECT_IO set (and had real direct I/O available through the block device > path). Of course, I haven't looked at the code in over two years, so > I'm a bit rusty. > > In any event, Suparna, I think we do the compression the same way we > currently do, but throw out anything related to the kiobuf path. Then > we pass in the dump buffer (which we currently fill with compressed > data) down through dump_device->bd_op->dump(). Then dump() does a > poll on the disk (with a timeout) waiting for the data to write out. > > This makes life easy for disruptive as well as non-disruptive dumps. > > The way I did it before was creating the request structures within > vmdump.c, which was quite ugly. But again, it was just for test. > Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 > From owner-lkcd@oss.sgi.com Mon Oct 8 09:58:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f98GwYS21080 for lkcd-outgoing; Mon, 8 Oct 2001 09:58:34 -0700 Received: from d12lmsgate-2.de.ibm.com (d12lmsgate-2.de.ibm.com [195.212.91.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f98GwPD21077 for ; Mon, 8 Oct 2001 09:58:25 -0700 Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22]) by d12lmsgate-2.de.ibm.com (1.0.0) with ESMTP id SAA58718 for ; Mon, 8 Oct 2001 18:58:19 +0200 Received: from d12ml033.de.ibm.com (d12ml033_cs0 [9.165.223.11]) by d12relay01.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f98GwHQ52134 for ; Mon, 8 Oct 2001 18:58:17 +0200 Importance: Normal Subject: Re: Question on lcrash usage To: "James Washer" Cc: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: "Andreas Herrmann" Date: Mon, 8 Oct 2001 18:58:15 +0200 X-MIMETrack: Serialize by Router on D12ML033/12/M/IBM(Release 5.0.8 |June 18, 2001) at 08/10/2001 18:58:18 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk James Washer wrote: >First, if this is the WRONG list to be asking usage questions... please >direct me to the correct place. > >Ok, if you've read this far... > >Is there a nice (and easy) way to find user address space pages? (or do I >have to walk the page tables myself?) > > > - jim If you want to get a list of user pages: Currently there is no nice and easy way to do it. But there is a way to look at memory of user processes. (But currently it is neither really nice nor easy ...) The problem is that address translation of virtual addresses to physical ones is automatically done only for kernel addresses. For user addresses you have to translate the virtual address of the user space using the "vtop" command. You have to know the address of the mm_struct of the user process in interest. Then you can use the received kernel address as input for further commands like "print" or "dump". I give you an example. Obviously, there is much room for improvements with respect to memory mapping. But currently, this is the way it works. Example: I used lcrash as an user process. Using lcrash itself, I want to read the value of the global variable "iter_threshold" (type long int), which is set to 10.000 and which is stored at virtual memory address 0x8169730 of the user process. (I've used gdb to get the address of the variable.) Following the corresponding lcrash session (comments starting with #): >> task ACTIVE TASKS: ADDR UID PID PPID STATE FLAGS CPU NAME =============================================================================== 0xc0256000 0 0 0 0 0 - swapper 0xcffea000 0 1 0 1 0x100 - init 0xcff34000 0 2 1 1 0x40 - kflushd 0xcff32000 0 3 1 1 0x40 - kupdate ... 0xcdddc000 0 3110 3103 1 0 - gdb 0xc47aa000 0 3112 3110 8 0x200010 - lcrash =============================================================================== 61 active task structs found >> task -f 0xc47aa000 ADDR UID PID PPID STATE FLAGS CPU NAME =============================================================================== 0xc47aa000 0 3112 3110 8 0x200010 - lcrash MM:0xca6103c0 # this is the address of lcrash's mm_struct structure THREAD: ESP0:0xc47ac000, ESP:0xc47abed8, EIP:0xc0111786 FS:0, GS:0 =============================================================================== 1 active task struct found >> vtop -m 0xca6103c0 0x8169730 # translate virtual address to physical one VADDR KADDR PADDR PFN ========================================= 0x8169730 0xc1cec730 0x1cec730 30328624 ========================================= # We have to look at KADDR. # PADDR would be translated again, because it does not fall between # PAGE_OFFSET (0xc000000) and value of high_memory (0xcfff0000) >> dump 0xc1cec730 2 0xc1cec730: 00002710 00000000 : .'...... >> print *(long int *) 0xc1cec730 10000 # this is exactly what I expected at this address Hope this is of some use for you. Regards, Andreas -- Linux for eServer Development Tel : +49-7031-16-4640 Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE email : aherrman@de.ibm.com From owner-lkcd@oss.sgi.com Mon Oct 8 23:50:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f996odF06269 for lkcd-outgoing; Mon, 8 Oct 2001 23:50:39 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f996oND06252 for ; Mon, 8 Oct 2001 23:50:23 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f996sjN25754; Mon, 8 Oct 2001 23:54:45 -0700 Message-ID: <3BC29D52.D0202AA5@alacritech.com> Date: Mon, 08 Oct 2001 23:46:42 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: suparna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: Dump driver interface References: <20011008142303.A1275@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Suparna Bhattacharya wrote: > > Matt, > > To carry on some of the discussions you'd initiated about having a block > device dump interface. > > A question that comes to mind when comparing the proposed interface with the > AIX ddump interface, is figuring out the right place to implement > some the tasks that a driver needs to perform in order to implement dump > support. Obviously we do want to keep the interface as simple and small > as possible (to make acceptance easier), and from that angle what you've > proposed sounds attractive. At the same time in the context of some of > the past discussions just wanted to explore if there are any advantages > of splitting up the interface and abstracting some common processing out > of the driver into the standard dump code. > > Looking for inputs/thoughts on this point to understand if its worth > making the interface more granular. I'm not sure what the right answer is. > > Another question is how this interface would be extended for network > dumps. > > Anyway, here goes: > The various steps involved in dumping from the perspective of the driver: > > 1. Set aside resources/memory required for dump time (because the dump > code can't share any state with the normal driver path). This could happen: > - during driver/device initialization, or > - during dump device configuration > The latter happens only for configured dump devices, rather than all > active devices that support dump. > However, it requires an interface into the driver to be called when > dump device configuration is taking place. > > If we have a dedicated dump device, the device initialization could > happen at dump configuration time, so no separate interface would be > required. Setting aside resources may be problematic, as the main developers will balk (I believe) at taking system resources for something you may never use. But then again, not everyone will use LKCD in the base kernel tree, so this may not be an issue (distributions can turn it on if they want it in their releases). > 2. Making the device ready for dump, just before starting dump writes. > If the device was in active use by the system then this step could > involve suspending/quiescing any existing i/o and making the h/w device > ready for i/o (could involve device resets in some cases for disruptive > dumps), and associated setup for non-interrupt based i/o. If you call the dump block device operation, it can make sure the system is silent. The amount of time to verify this should be small enough. It's more making sure that the device's request queue is shut down and not in use, and that the driver can return a ready state. A block device read request may be issued to verify the hardware the first time around. > Again, if this is a dedicated dump device then it is likely to be > ready for dump, though there could be some aspects like bus state to > think of in certain panic circumstances. True. > 3. Initiating dump writes to the device (without waiting for completion) > There would be a limit on how much i/o can be pushed to the driver/device in > one go, without waiting for i/o completion > Would use non-interrupt based mode of i/o I think you can poll and sleep ... even set timeouts if you want. Again, if you time out waiting for a write to complete, there's in almost every case a problem with the drive/media you're targeting. You've already verified that you can read from the device, and that the status of the device is acceptable. For a write to fail after checking all that is pretty fatal. > 4. Checking for completion of earlier dump i/o. May have a timeout > upto which to try before returning. I think timeouts are okay -- something on the order of the slowest attached device (floppy?) > 5. Poll in a loop waiting for (4) to succeed (within certain timeout bounds), > and submit next batch dump writes (3) and keep repeating till dump is > complete. Sure, this sounds all fine. Batch mode, iterative, either way. > 6. Release device for normal use (opposite of 2), once dump is complete and > has been saved in a safe location (in the non-disruptive case) Right. > 7. Release resources set aside for dumping if dump device is unconfigured > (or device is unconfigured on the system) What if you're in a circular dumping loop? Also, re-kick off the request queue. > With bd_op->dump() steps 3-5 happen in one shot every time the dump buffer > is written out. Step 2 might happen (if needed/appropriate) once depending on > a state flag maintained by the driver. Step 6, I'm not sure about - could > be some way to do this the next time the request function is invoked for the > device. Step 1 could happen during device initialization and 7 during device > shutdown. > (Matt, Let me know if I've guessed any of this wrong) This all sounds fine. Just about what I initially proposed, but with some of the looping constructs more clearly defined. > Could there be an advantage of pulling up the loop in 5 into the common dump > code instead and have separate interfaces for 3 and 4 (could even be a simple > rw flag to bd_op->dump()) ? Besides some commonality of code, the poll routine > could perform some additional checks / activity (e.g touch_nmi_watchdog()), > and with multiple dump devices, some degree of parallelism in i/o can be > achieved depending on the kind of h/w support. In general building higher > level functionality/infrastructure/protocol support could be a little easier. > [BTW, the fact that we send i/o in units of the dump buffer means that some > room already exists for some additional checks, but the level of control is > sharper with the split approach.] > How useful is that ? I absolutely think 5 should be common -- that way the driver has little to think about. If each device driver is given the option to dump in its own unique way (beyond location/size/time/etc), that could lead to 5 needing to know whether it's talking to an IDE or SCSI driver, for example. Sounds good ... > Any other comments / observations / experiences to share ? I think you've covered most of it. The real key here is making sure to maintain as little stuff as possible in the dump functionality for each device driver. That device operation is responsible for knowing what to do when it is called, in terms of configuring the driver state, reading flags, and writing out pages of data to disk. For it to do more than that, such as understanding higher level kernel structures, would not be a good thing. Great outline, Suparna. --Matt > Regards > Suparna > > Matt Robinson wrote: > > > > I would imagine something > > like the following: > > > > > > struct block_device_operations { > > int (*open) (struct inode *, struct file *); > > int (*release) (struct inode *, struct file *); > > int (*ioctl) (struct inode *, struct file *, unsigned, unsigned long); > > int (*check_media_change) (kdev_t); > > int (*revalidate) (kdev_t); > > int (*dump) (struct inode *, struct file *, const char *, size_t, > > loff_t > > *); > > }; > > > > It should use the write() constructs (similar to file_operations), but do > > any kind of polling required within the function itself. > > > > IRIX used to call this bddump(), which was an alias to bwrite(), with > > DIRECT_IO set (and had real direct I/O available through the block device > > path). Of course, I haven't looked at the code in over two years, so > > I'm a bit rusty. > > > > In any event, Suparna, I think we do the compression the same way we > > currently do, but throw out anything related to the kiobuf path. Then > > we pass in the dump buffer (which we currently fill with compressed > > data) down through dump_device->bd_op->dump(). Then dump() does a > > poll on the disk (with a timeout) waiting for the data to write out. > > > > This makes life easy for disruptive as well as non-disruptive dumps. > > > > The way I did it before was creating the request structures within > > vmdump.c, which was quite ugly. But again, it was just for test. > > > > Suparna Bhattacharya > IBM Software Lab, India > E-mail : bsuparna@in.ibm.com > Phone : 91-80-5267117, Extn : 3961 > > From owner-lkcd@oss.sgi.com Tue Oct 9 04:30:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f99BUo512800 for lkcd-outgoing; Tue, 9 Oct 2001 04:30:50 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f99BUkD12796 for ; Tue, 9 Oct 2001 04:30:47 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id GAA117840; Tue, 9 Oct 2001 06:28:14 -0500 Received: from sparklet.in.ibm.com ([9.186.133.17]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f99BUYm247396; Tue, 9 Oct 2001 07:30:35 -0400 Received: (from suparna@localhost) by sparklet.in.ibm.com (8.11.0/8.11.0) id f99H6Li03047; Tue, 9 Oct 2001 12:06:21 -0500 Date: Tue, 9 Oct 2001 12:06:20 -0500 From: Suparna Bhattacharya To: "Matt D. Robinson" Cc: lkcd@oss.sgi.com Subject: Re: Dump driver interface Message-ID: <20011009120620.A2365@in.ibm.com> Reply-To: suparna@in.ibm.com References: <20011008142303.A1275@in.ibm.com> <3BC29D52.D0202AA5@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BC29D52.D0202AA5@alacritech.com>; from yakker@alacritech.com on Mon, Oct 08, 2001 at 11:46:42PM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, Oct 08, 2001 at 11:46:42PM -0700, Matt D. Robinson wrote: > > I absolutely think 5 should be common -- that way the driver has > little to think about. If each device driver is given the option > to dump in its own unique way (beyond location/size/time/etc), > that could lead to 5 needing to know whether it's talking to an > IDE or SCSI driver, for example. > > Sounds good ... > > > Any other comments / observations / experiences to share ? > > I think you've covered most of it. The real key here is making sure > to maintain as little stuff as possible in the dump functionality for > each device driver. That device operation is responsible for knowing > what to do when it is called, in terms of configuring the driver > state, reading flags, and writing out pages of data to disk. For it > to do more than that, such as understanding higher level kernel > structures, > would not be a good thing. Exactly ! That was the reason for trying to outline the tasks - so we can bring up common stuff into the generic dump functionality, providing more control & flexibility in the fray. So the write call, 3 is just essentially asynchronous (doesn't do any polling - but just initiates the i/o) and polling/waiting happens through the common dump code (rather than in the driver) as part of 5, which calls 4 (implemented in the driver) to check for completion ( perhaps optionally ask it to wait for completion within legitimate bounds ?). Regards Suparna From owner-lkcd@oss.sgi.com Tue Oct 9 10:47:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f99Hl5A20321 for lkcd-outgoing; Tue, 9 Oct 2001 10:47:05 -0700 Received: from mail.3pardata.com (dnai-216-15-110-218.cust.dnai.com [216.15.110.218]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f99Hl2D20318 for ; Tue, 9 Oct 2001 10:47:02 -0700 Received: from postal.3pardata.com (3pardata.com [192.168.1.19]) by mail.3pardata.com (8.9.3+Sun/8.9.3) with ESMTP id KAA14980; Tue, 9 Oct 2001 10:46:54 -0700 (PDT) Received: from marais (marais.3pardata.com [192.168.1.107]) by postal.3pardata.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id TXYBWX2G; Tue, 9 Oct 2001 10:46:54 -0700 Date: Tue, 9 Oct 2001 10:46:53 -0700 (PDT) From: Castor Fu X-X-Sender: To: Suparna Bhattacharya cc: "Matt D. Robinson" , Subject: Re: Dump driver interface In-Reply-To: <20011009120620.A2365@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 9 Oct 2001, Suparna Bhattacharya wrote: > On Mon, Oct 08, 2001 at 11:46:42PM -0700, Matt D. Robinson wrote: > > > > I think you've covered most of it. The real key here is making sure > > to maintain as little stuff as possible in the dump functionality for > > each device driver. That device operation is responsible for knowing > > what to do when it is called, in terms of configuring the driver > > state, reading flags, and writing out pages of data to disk. For it > > to do more than that, such as understanding higher level kernel > > structures, > > would not be a good thing. One thing that might be nice, (I'm still on the fence on this matter), is to structure the code so it could be standalone code outside of the linux kernel. This would imply factoring out the iteration through pages of memory and device write routines. In this case, various linux kernel parameters might also not be available. From owner-lkcd@oss.sgi.com Tue Oct 9 16:12:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f99NCYF26494 for lkcd-outgoing; Tue, 9 Oct 2001 16:12:34 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f99NCVD26488 for ; Tue, 9 Oct 2001 16:12:31 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f99NC2223873; Tue, 9 Oct 2001 16:12:02 -0700 Message-ID: <3BC38569.61E9B580@alacritech.com> Date: Tue, 09 Oct 2001 16:16:57 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Castor Fu CC: Suparna Bhattacharya , lkcd@oss.sgi.com Subject: Re: Dump driver interface References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Castor Fu wrote: > > On Tue, 9 Oct 2001, Suparna Bhattacharya wrote: > > > On Mon, Oct 08, 2001 at 11:46:42PM -0700, Matt D. Robinson wrote: > > > > > > I think you've covered most of it. The real key here is making sure > > > to maintain as little stuff as possible in the dump functionality for > > > each device driver. That device operation is responsible for knowing > > > what to do when it is called, in terms of configuring the driver > > > state, reading flags, and writing out pages of data to disk. For it > > > to do more than that, such as understanding higher level kernel > > > structures, > > > would not be a good thing. > One thing that might be nice, (I'm still on the fence on this matter), is to > structure the code so it could be standalone code outside of the linux kernel. In what sense? Right now, we're a module, but I suspect you aren't referring to this. > This would imply factoring out the iteration through pages of memory and > device write routines. In this case, various linux kernel parameters > might also not be available. Are you referring to something like LinuxBIOS? Just curious. We have briefly discussed something like this, along with putting in MCL's code. --Matt From owner-lkcd@oss.sgi.com Tue Oct 9 16:23:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f99NN4626785 for lkcd-outgoing; Tue, 9 Oct 2001 16:23:04 -0700 Received: from mail.3pardata.com (dnai-216-15-110-218.cust.dnai.com [216.15.110.218]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f99NN1D26779 for ; Tue, 9 Oct 2001 16:23:02 -0700 Received: from postal.3pardata.com (3pardata.com [192.168.1.19]) by mail.3pardata.com (8.9.3+Sun/8.9.3) with ESMTP id QAA18281; Tue, 9 Oct 2001 16:22:54 -0700 (PDT) Received: from marais (marais.3pardata.com [192.168.1.107]) by postal.3pardata.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id TXYBWYJ1; Tue, 9 Oct 2001 16:22:54 -0700 Date: Tue, 9 Oct 2001 16:22:54 -0700 (PDT) From: Castor Fu X-X-Sender: To: "Matt D. Robinson" cc: Suparna Bhattacharya , Subject: Re: Dump driver interface In-Reply-To: <3BC38569.61E9B580@alacritech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 9 Oct 2001, Matt D. Robinson wrote: > In what sense? Right now, we're a module, but I suspect you aren't referring > to this. > > > This would imply factoring out the iteration through pages of memory and > > device write routines. In this case, various linux kernel parameters > > might also not be available. > > Are you referring to something like LinuxBIOS? Just curious. We have > briefly discussed something like this, along with putting in MCL's code. Well, not LinuxBIOS, but yes, from a custom BIOS. With things like the SMI interrupt one could potentially get at a pretty wedged system from a BIOS. Another factor to consider, (pardon my unfamiliarity with Linux and its VM system), is that it looks like a single virtual memory address is associated with each page in the dump file. It doesn't seem like one could easily access all user pages, because pages from different tasks would have the same address. Is this address somehow unique in the mem_map table? -castor From owner-lkcd@oss.sgi.com Tue Oct 9 18:04:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9A14oq28934 for lkcd-outgoing; Tue, 9 Oct 2001 18:04:50 -0700 Received: from fgwmail5.fujitsu.co.jp (fgwmail5.fujitsu.co.jp [192.51.44.35]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9A145D28925 for ; Tue, 9 Oct 2001 18:04:06 -0700 Received: from m2.gw.fujitsu.co.jp by fgwmail5.fujitsu.co.jp (8.9.3/3.7W-MX0109-Fujitsu Gateway) id KAA00690 for ; Wed, 10 Oct 2001 10:03:56 +0900 (JST) (envelope-from naomi@pst.fujitsu.com) From: naomi@pst.fujitsu.com Received: from naomi.aoi.pst.fujitsu.com by m2.gw.fujitsu.co.jp (8.9.3/3.7W-0110-Fujitsu Domain Master) id KAA14991 for ; Wed, 10 Oct 2001 10:03:51 +0900 (JST) (envelope-from naomi@pst.fujitsu.com) Received: from localhost (IDENT:naomi@localhost [127.0.0.1]) by naomi.aoi.pst.fujitsu.com (8.9.3/8.9.3) with ESMTP id KAA26359 for ; Wed, 10 Oct 2001 10:04:17 +0900 To: lkcd@oss.sgi.com Subject: Re: lcrash sub-commands line completion In-Reply-To: Your message of "Tue, 04 Sep 2001 16:27:53 +0900" <20010904162753R.naomi@pst.fujitsu.com> References: <20010904162753R.naomi@pst.fujitsu.com> X-Mailer: Mew version 1.92.4 on Emacs 19.34 / Mule 2.3 (SUETSUMUHANA) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20011010100417W.naomi@pst.fujitsu.com> Date: Wed, 10 Oct 2001 10:04:17 +0900 X-Dispatcher: imput version 980905(IM100) Lines: 568 Sender: owner-lkcd@oss.sgi.com Precedence: bulk > Hello. > Recently, I think that lcrash should have "sub-commands line completion". > > Lcrash has many sub-commands. And almost sub-commands have parameters such as > filename or symbol name which should be specified. > The present lcrash cannot complete on sub-commands line. > For this reason, we have to memorize sub-commands names and parameters exactly. > It is very inconvenient. > So I'll add completion capability to librl. > > I'm considering as follows. > While editing sub-commands line, if TAB key is pressed, lcrash completes the > line (or do something as bash does). > Lcrash will complete on sub-commands names with behavior almost equivalent to > bash. > And I consider that parameters of sub-commands have different characteristic > each other, I'll add the mechanism let you be able to make your own completion > function. Using this mechanism, you can call the function that behaves as you > want when TAB key is pressed. Hi, all. I implemented the lcrash completion mechanism(first phase) that I mentioned in the above mail. Here is the description of the mechanism. While editing sub-commands line, if TAB key is pressed, lcrash completes the line. - When TAB key is pressed at the head of line, lcrash prints the list of sub-commands names. - When TAB key is pressed in the middle of the first word of line, lcrash completes sub-commands names. * When there is no candidate, prints beep character. * When there is a candidate, prints the string of it. * When there are two or more candidates, + When there is the identical part of string of them, prints the string. + When there isn't the identical part of string of them, prints the list of them. - When TAB key is pressed on the word after the second it of line, lcrash completes sub-commands arguments. Note) This function is not implemented, now lcrash prints beep character. Given below is patch against the files taken from sourceforge cvs of lkcd. Any comments and suggestions are welcomed. diff -Naur lkcdutils/lcrash/cmds/command.c lkcdutils+completion_func/lcrash/cmds/command.c --- lkcdutils/lcrash/cmds/command.c Mon Aug 27 18:41:16 2001 +++ lkcdutils+completion_func/lcrash/cmds/command.c Mon Oct 1 11:50:43 2001 @@ -156,45 +156,15 @@ } /* - * get_cmd() + * line_to_words() */ -static void -get_cmd(command_t *cmd) +void +line_to_words(command_t *cmd) { - int i = 0, len; - char *cp, *ecp, *empty=""; - char *rl_getline(void); - - static char* cmd_buf; - static size_t cmd_len; - - clean_cmd(cmd); - cmd->ofp = stdout; - cmd->efp = stderr; - - if (!ql_have_terminal) { - fprintf(stdout, "\n>> "); - fflush(stdout); - - if ((len = getline(&cmd_buf, &cmd_len, stdin)) <= 0) { - /* stdin closed */ - exit(0); - } - else { /* get rid off the newline character */ - cmd_buf[len - 1] = '\0'; - cmd->command = cmd_buf; - } - } - else { - /* if we hit EOT we take that as a empty command line ] - */ - if(!(cmd->command = rl_getline())) { - cmd->command = empty; - } - } + int i; + char *cp, *ecp; i = 0; - /* Get the command name. Make sure we strip off any * leading blank space */ @@ -207,7 +177,7 @@ cmd->args[0] = 0; cmd->nargs = 0; return; - } + } cp++; } *cp++ = 0; @@ -244,7 +214,6 @@ } ecp++; } - len = (uaddr_t)ecp - (uaddr_t)cp + 1; ecp++; } else { ecp = cp; @@ -257,7 +226,6 @@ } ecp++; } - len = (uaddr_t)ecp - (uaddr_t)cp; } cmd->args[i] = cp; cp = ecp; @@ -271,7 +239,7 @@ * return. */ if (!(*cp)) { - if (i <= 128) { + if (i <= MAX_ARGS) { cmd->args[i] = 0; } cmd->nargs = i; @@ -287,6 +255,46 @@ } /* + * get_cmd() + */ +static void +get_cmd(command_t *cmd) +{ + int len; + char *empty=""; + char *rl_getline(void); + + static char* cmd_buf; + static size_t cmd_len; + + clean_cmd(cmd); + cmd->ofp = stdout; + cmd->efp = stderr; + + if (!ql_have_terminal) { + fprintf(stdout, "\n>> "); + fflush(stdout); + + if ((len = getline(&cmd_buf, &cmd_len, stdin)) <= 0) { + /* stdin closed */ + exit(0); + } + else { /* get rid off the newline character */ + cmd_buf[len - 1] = '\0'; + cmd->command = cmd_buf; + } + } + else { + /* if we hit EOT we take that as a empty command line ] + */ + if(!(cmd->command = rl_getline())) { + cmd->command = empty; + } + } + line_to_words(cmd); +} + +/* * parse_options() */ static int @@ -663,4 +671,159 @@ t[index++] = '\n'; t[index] = 0; return (t); +} + +/* + * complete_cmds() -- Devide the string from the head of command line to the + * position of TAB into words. + * Call the completion function for command names, if TAB + * is pressed on the first word. + * This function is registered to librl by + * rl_register_complete_func(). + * + * Call the completion function for command arguments, if + * TAB is pressed on the word after the second it. + * Note that it is not implemented. + */ +char * +complete_cmds(char *inputline, int tabpos) +{ + static command_t cmd; + static char cline[DEF_LENGTH]; + int help_list(command_t *); + + /* copy string from the head of command line to the position of TAB to + * buffer. (tabpos + 1) does not exceed DEF_LENGTH, so error check is + * not needed. + */ + clean_cmd(&cmd); + strncpy(cline, inputline, tabpos + 1); + cline[tabpos] = '\0'; + cmd.command = cline; + + /* command line is devided into a command name and arguments */ + line_to_words(&cmd); + + /* TAB is pressed on the head of argument */ + if (*cmd.command && inputline[tabpos - 1] == ' ') { + cmd.nargs++; + } + + if (cmd.nargs == 0) { + /* TAB is pressed on the command name */ + if (!(*(cmd.command))) { + /* if TAB is pressed at the head of a command name, + * display the list of command names */ + cmd.ofp = stdout; + fprintf(stdout, "\n"); + help_list(&cmd); /* display the list of command names */ + return(DRAW_NEW_ENTIRE_LINE); + } else { + /* if TAB is pressed in the middle of a command name, + call completion function for a command name */ + return(complete_subcmd_name(cmd.command)); + } + } else { + /* TAB is pressed on the command arguments */ + /* call completion function for command arguments */ + /* -- not implemented -- */ + return(PRINT_BEEP); + } +} + +/* + * complete_subcmd_name() -- This function completes command names. + * When there is no candidate, return PRINT_BEEP. + * When there is a candidate, return the string. + * When there are two or more candidates, return the + * identical part of string of them. + * When there isn't the identical part of string, + * display the list of candidates and return + * DRAW_NEW_ENTIRE_LINE. + */ +#define SV_CNM(i) save_crp[i]->cmd_name +char * +complete_subcmd_name(char *string) +{ + int slen, i, j, index; + static int cmdnum = 0; + static cmd_rec_t **save_crp = (cmd_rec_t **)0; + static char cptstr[DEF_LENGTH]; + cmd_rec_t *ncrp; + int candidates_cnt = 0; + + /* get number of commands */ + if (!cmdnum) { + if ((ncrp = first_cmd_rec())) { + do { + cmdnum++; + } while ((ncrp = next_cmd_rec(ncrp))); + } + } + /* allocate pointer array */ + if (!save_crp) { + save_crp = kl_alloc_block(cmdnum * sizeof(cmd_rec_t *), K_PERM); + if (klib_error) { + fprintf(KL_ERRORFP, "Could not allocate memory for completion\n"); + return(PRINT_BEEP); + } + } + + slen = strlen(string); + cptstr[0] = '\0'; + + /* find candidates */ + ncrp = first_cmd_rec(); + for (i = 0; i < cmdnum; i++) { + if (!strncmp(ncrp->cmd_name, string, slen)) { + save_crp[candidates_cnt] = ncrp; + /* get a string to complete */ + if (candidates_cnt == 0) { + strcpy(cptstr, SV_CNM(candidates_cnt)+slen); + } else if (cptstr[0] != '\0') { + /* if there are two or more candidates, get the identical part + of string of them */ + for (j = 0; cptstr[j] != '\0' && + *(SV_CNM(candidates_cnt)+slen+j) != '\0' && + cptstr[j] == *(SV_CNM(candidates_cnt)+slen+j); j++); + cptstr[j] = '\0'; + } + candidates_cnt++; + } + ncrp = next_cmd_rec(ncrp); + } + + if (candidates_cnt == 0) { + /* there is no candidate */ + return(PRINT_BEEP); + } else if (candidates_cnt == 1) { + /* there is a candidate */ + strcat(cptstr, " "); + return(cptstr); + } else { + /* there are two or more candidates */ + if (cptstr[0] == '\0') { /* there is no the identical part of string */ + goto print_list; + } else { /* there is the identical part of string */ + return(cptstr); + } + } +print_list: + fprintf(stdout, "\n"); + index = candidates_cnt / 4 + (candidates_cnt % 4 ? 1 : 0); + for (i = 0; i < index; i++) { + fprintf(stdout, "%-17s", SV_CNM(i)); + if ((j = index + i) < candidates_cnt) { + fprintf(stdout, "%-17s", SV_CNM(j)); + } + if ((j = index * 2 + i) < candidates_cnt) { + fprintf(stdout, "%-17s", SV_CNM(j)); + } + if ((j = index * 3 + i) < candidates_cnt) { + fprintf(stdout, "%-17s", SV_CNM(j)); + } + fprintf(stdout, "\n"); + } + fflush(stdout); + return(DRAW_NEW_ENTIRE_LINE); } diff -Naur lkcdutils/lcrash/commondefs lkcdutils+completion_func/lcrash/commondefs --- lkcdutils/lcrash/commondefs Mon Aug 27 18:41:15 2001 +++ lkcdutils+completion_func/lcrash/commondefs Mon Oct 1 11:49:30 2001 @@ -15,7 +15,7 @@ HPATH = $(DEPTH)/include LKCDDIR = $(DEPTH)/.. HEADERS = $(HPATH)/command.h $(HPATH)/lcrash.h $(HPATH)/arch/trace.h -EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/libklib/include -I$(TOPDIR)/include \ - -I$(LKCDDIR)/libsial +EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/librl -I$(LKCDDIR)/libklib/include \ + -I$(TOPDIR)/include -I$(LKCDDIR)/libsial include $(LKCDDIR)/Rules.make diff -Naur lkcdutils/lcrash/include/command.h lkcdutils+completion_func/lcrash/include/command.h --- lkcdutils/lcrash/include/command.h Mon Aug 27 18:41:15 2001 +++ lkcdutils+completion_func/lcrash/include/command.h Mon Oct 1 11:52:05 2001 @@ -1,6 +1,8 @@ /* * Copyright 1999 Silicon Graphics, Inc. All rights reserved. */ +#include + #define MAX_ARGS 128 #define MAX_CMDLINE 256 @@ -158,3 +160,5 @@ char *helpformat(char *); int process_cmds(void); int register_cmds(_command_t *); +char *complete_cmds(char *, int); +char *complete_subcmd_name(char *); diff -Naur lkcdutils/lcrash/main.c lkcdutils+completion_func/lcrash/main.c --- lkcdutils/lcrash/main.c Tue Sep 18 10:12:17 2001 +++ lkcdutils+completion_func/lcrash/main.c Mon Oct 1 11:49:37 2001 @@ -51,6 +51,7 @@ { int i; int rl_init(char *, int, int); + void rl_register_complete_func(rl_complete_func_t); program = argv[0]; ofp = stdout; @@ -255,6 +256,8 @@ if(!rl_init(">> ", 0, 0)) { exit(1); } + /* register sub-commands-line completion fuction */ + rl_register_complete_func(complete_cmds); } /* fire up sial interpreter and load macros */ diff -Naur lkcdutils/librl/rl.c lkcdutils+completion_func/librl/rl.c --- lkcdutils/librl/rl.c Mon Aug 27 18:41:16 2001 +++ lkcdutils+completion_func/librl/rl.c Mon Oct 1 11:47:15 2001 @@ -4,7 +4,8 @@ /* Leaner command input module. - Support basic command line editing and history mecanism. + Support basic command line editing, history mechanism and completion + mechanism. History mechanism supported: @@ -40,8 +41,14 @@ ^R : redraw input line ESC-f : forward one word ESC-b : backward one word - ESC-d : delete next work + ESC-d : delete next word ESC-DEL : delete previous word + + + Completion mechanism supported: + + When TAB is pressed while having inputted line, call the function + registered beforehand to complete line. */ #include #include @@ -71,6 +78,7 @@ static char *kwf="\033d"; static char *fw="\033f"; static char *bw="\033b"; +static rl_complete_func_t rl_complete_func; /* function for completing line */ /* setup terminal characteristics and allocate initial stuff @@ -194,6 +202,7 @@ #define KILL_WORD_FORWARD 1016 #define WORD_BACKWARD 1017 #define WORD_FORWARD 1018 +#define COMPLETE_LINE 1019 /* for completing line */ #define NCTRL 16 static int ctrls[NCTRL][2]= @@ -238,6 +247,10 @@ int i; int found=0; + if (c == '\t') { + /* completing line */ + return COMPLETE_LINE; + } /* check the control characters */ for(i=0;i 0) { + /* insert string that returned before position of cursor */ + retstr_len = strlen(ret); + /* resume cursor position */ + curleft(maxpos - save_curpos); + if (maxpos+retstr_len>maxl) { + /* if we exceed maximum command length, bip */ + buz(); + } else { + /* insert string that returned */ + memmove(buf+curpos+retstr_len, buf+curpos, maxl-curpos-retstr_len); + strncpy(buf+curpos, ret, retstr_len); + maxpos+=retstr_len; + showbuf(1); + curright(retstr_len); + } + } + } + } + break; + default: { if(maxpos==maxl) buz(); @@ -574,6 +636,13 @@ return buf; } } +} + +/* register function which complete line */ +void +rl_register_complete_func(rl_complete_func_t complete_func) +{ + rl_complete_func = complete_func; } diff -Naur lkcdutils/librl/rl.h lkcdutils+completion_func/librl/rl.h --- lkcdutils/librl/rl.h Mon Aug 27 18:41:16 2001 +++ lkcdutils+completion_func/librl/rl.h Mon Oct 1 11:47:18 2001 @@ -9,5 +9,12 @@ char *hist_cmd(char *); char *hist_getcmd(int); int hist_init(int, int); -char *rl_getline(); +char *rl_getline(void); int rl_init(char*, int, int); + +#define PRINT_BEEP (char *)-1 +#define DRAW_NEW_ENTIRE_LINE (char *)0 + +typedef char *(*rl_complete_func_t)(char *, int); + +void rl_register_complete_func(rl_complete_func_t); diff -Naur lkcdutils/lkcd_config/commondefs lkcdutils+completion_func/lkcd_config/commondefs --- lkcdutils/lkcd_config/commondefs Thu Sep 6 13:41:17 2001 +++ lkcdutils+completion_func/lkcd_config/commondefs Mon Oct 1 14:08:50 2001 @@ -16,7 +16,7 @@ HPATH = $(DEPTH)/../lcrash/include LKCDDIR = $(DEPTH)/.. HEADERS = $(HPATH)/command.h $(HPATH)/lcrash.h $(HPATH)/arch/trace.h -EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/libklib/include \ +EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/librl -I$(LKCDDIR)/libklib/include \ -I$(TOPDIR)/include -I$(LKCDDIR)/libsial diff -Naur lkcdutils/lkcd_ksyms/commondefs lkcdutils+completion_func/lkcd_ksyms/commondefs --- lkcdutils/lkcd_ksyms/commondefs Thu Sep 6 13:41:21 2001 +++ lkcdutils+completion_func/lkcd_ksyms/commondefs Mon Oct 1 14:09:05 2001 @@ -15,7 +15,7 @@ HPATH = $(DEPTH)/../lcrash/include LKCDDIR = $(DEPTH)/.. HEADERS = $(HPATH)/command.h $(HPATH)/lcrash.h $(HPATH)/arch/trace.h -EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/libklib/include \ +EXTRA_CFLAGS += -I$(HPATH) -I$(LKCDDIR)/librl -I$(LKCDDIR)/libklib/include \ -I$(TOPDIR)/include -I$(LKCDDIR)/libsial Regards, Naomi Haseo From owner-lkcd@oss.sgi.com Wed Oct 10 02:28:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9A9Sg612867 for lkcd-outgoing; Wed, 10 Oct 2001 02:28:42 -0700 Received: from fgwmail5.fujitsu.co.jp (fgwmail5.fujitsu.co.jp [192.51.44.35]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9A9SYD12863 for ; Wed, 10 Oct 2001 02:28:35 -0700 Received: from m3.gw.fujitsu.co.jp by fgwmail5.fujitsu.co.jp (8.9.3/3.7W-MX0109-Fujitsu Gateway) id SAA16522; Wed, 10 Oct 2001 18:28:23 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from classic.aoi.pst.fujitsu.com by m3.gw.fujitsu.co.jp (8.9.3/3.7W-0110-Fujitsu Domain Master) id SAA11045; Wed, 10 Oct 2001 18:28:21 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from doll (doll.aoi.pst.fujitsu.com [172.23.72.214]) by classic.aoi.pst.fujitsu.com (8.9.3/8.9.3) with SMTP id SAA29788; Wed, 10 Oct 2001 18:28:15 +0900 Message-ID: <007d01c1516d$cb03e600$d64817ac@aoi.pst.fujitsu.com> From: "Masashige Kotani" To: "Matt D. Robinson" Cc: Subject: Dump method Date: Wed, 10 Oct 2001 18:26:03 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4522.1200 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Matt. I'd like to know more about dump methods. Could you give me further information about below 3 points ? Because we are planning to support of multiple dump devices as shown in former mail (Message-ID:<006201c13518$869ce140$d64817ac@aoi.pst.fujitsu.com>), I want to consider the program composition being conscious of module structure of dumping method. Point 1: relation of dump devices and dump methods from Message-ID: <3B9563E8.9A432B7B@alacritech.com>: > Here's where I'm going with this. I just finished the code to allow > people to install their own dump compression mechanisms (right now, it'll > be RLE, I have to check in the GZIP compression module, and people can > put in whatever one they want). Do you want to take the next step and > let people have chains of dump mechanisms based on the dump condition? > I realize multiple dump devices is good, but what if you could plug in > your own dump method with it? Then that dump method could query the > available dump devices configured. > > So you'd have: > > dump methods (one standard, but plug-and-play) > dump devices (requires at least one, multiples allowed, maybe > access lists for methods?) > dump compressions (configurable, usable by some methods) > > Would this be the eventual goal? That way, everything is tunable to > their own liking. I figured I'd ask, since if you're going to add in > multiple dump devices, and we've gone to multiple compression types, > you might as well go all the way and add dump methods as well. I > don't know what the rest of the group thinks, but this could be > very useful. I'm not sure about "maybe access lists for methods?". Please give me further description. For example, I want to know the relation between "dump devices" and "dump methods". Does each dump method register dump devices ? For example methodA : /dev/sda5 & /dev/sda6 & /dev/sda7 methodB : /dev/sda8 & /dev/sda9 Or, Does each dump device register dump method ? For example /dev/sda5 : mothodA /dev/sda6 : mothodA /dev/sda7 : mothodA /dev/sda8 : mothodB /dev/sda9 : mothodB Or, Does each dump method share all dump devices? For example shared dump devices : /dev/sda5 & /dev/sda6 & /dev/sda7 methodA : from shared dump devices methodB : from shared dump devices Or, the other way? Point 2: Saving to dump file How about lcrash? If you make own dump method module, you have to build the function dealing with the dump method in lcrash ? Or, should lcrash have the ability of adding function the same as kernel module ? Point 3: How does it goes ? Do you have any plans to complete this facility ? Regards, Masashige From owner-lkcd@oss.sgi.com Wed Oct 10 05:50:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9ACom317946 for lkcd-outgoing; Wed, 10 Oct 2001 05:50:48 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9ACokD17943 for ; Wed, 10 Oct 2001 05:50:46 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message Subject: LKCD problems on SMP configurations? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Wed, 10 Oct 2001 08:50:44 -0400 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE09769F308@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: LKCD problems on SMP configurations? Thread-Index: AcFRiiz3RQo6RVKbTwepPUCuLq9kCA== From: "Tony Dziedzic" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9ACokD17944 Sender: owner-lkcd@oss.sgi.com Precedence: bulk I've integrated the latest LKCD code from SourceForge into our 2.4.4 kernel sources and have noticed that dumping on SMP systems isn't very reliable. The test that I've been using is the Alt-SysRq-C magic key sequence to generate a "sysrq" panic. The symptom that I see is that the system hangs after printing the "Writing dump header ..." message. Is anyone aware of pending issues on SMP systems? I've found that if I comment out the __cli(); disable_local_APIC(); __sti(); sequence in smp_send_stop the hangs do not occur and I can reproducibly generate a crash dump. Does this ring any bells? FWIW, the system that I'm testing on uses a Tyan S2510 motherboard (dual CPU). Thanks, Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Wed Oct 10 07:45:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9AEjGM21051 for lkcd-outgoing; Wed, 10 Oct 2001 07:45:16 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9AEjAD21047 for ; Wed, 10 Oct 2001 07:45:10 -0700 Received: from f02n15e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f9AEeXH56302 for ; Thu, 11 Oct 2001 00:40:33 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n15e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f9AEhuL47400 for ; Thu, 11 Oct 2001 00:43:56 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AE1.0051056E ; Thu, 11 Oct 2001 00:44:58 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: bsuparna@in.ibm.com To: "Tony Dziedzic" cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 10 Oct 2001 20:07:44 +0530 Subject: Re: LKCD problems on SMP configurations? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Tony, Did you try the latest code from CVS ? We no longer call smp_send_stop() as part of dump now ... But we've faced other problems with dumping from interrupt context, which could be encountered Alt-SysRq-c trigger, depending on the dump device type, or rather the driver involved. What device are you dumping to ? I did hack things a little to work around some of the problems to get Alt+SysRq+c dumping working in our test setup, but its probably not quite the right way to do this. In the long run - a dump device interface or second kernel soft boot approach in its absence for panic style dumps and maybe the deferred dump option for non-disruptive dumps are possibilities being looked at. But then you do seem to be able to dump after making the changes you mention)(After all you do seem to be able to dump after making the changes you mention Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 "Tony Dziedzic" on 10/10/2001 06:20:44 PM Please respond to "Tony Dziedzic" To: lkcd@oss.sgi.com cc: (bcc: Suparna Bhattacharya/India/IBM) Subject: LKCD problems on SMP configurations? I've integrated the latest LKCD code from SourceForge into our 2.4.4 kernel sources and have noticed that dumping on SMP systems isn't very reliable. The test that I've been using is the Alt-SysRq-C magic key sequence to generate a "sysrq" panic. The symptom that I see is that the system hangs after printing the "Writing dump header ..." message. Is anyone aware of pending issues on SMP systems? I've found that if I comment out the __cli(); disable_local_APIC(); __sti(); sequence in smp_send_stop the hangs do not occur and I can reproducibly generate a crash dump. Does this ring any bells? FWIW, the system that I'm testing on uses a Tyan S2510 motherboard (dual CPU). Thanks, Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Wed Oct 10 08:50:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9AFon623334 for lkcd-outgoing; Wed, 10 Oct 2001 08:50:49 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9AFogD23315 for ; Wed, 10 Oct 2001 08:50:42 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message Subject: RE: LKCD problems on SMP configurations? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Wed, 10 Oct 2001 11:50:35 -0400 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE097691DC5@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: LKCD problems on SMP configurations? Thread-Index: AcFRmqpCJzGsPnobT2SHEAqWpcE3KgACF7QQ From: "Tony Dziedzic" To: Cc: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9AFogD23317 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Suparna - Thanks for the note about not calling smp_send_stop. While I did have the latest code from CVS, it turned out that I had a stale patch from a previous LKCD left in my copy of panic.c - which was calling smp_send_stop. Tony -----Original Message----- From: bsuparna@in.ibm.com [mailto:bsuparna@in.ibm.com] Sent: Wednesday, October 10, 2001 10:38 AM To: Tony Dziedzic Cc: lkcd@oss.sgi.com Subject: Re: LKCD problems on SMP configurations? Tony, Did you try the latest code from CVS ? We no longer call smp_send_stop() as part of dump now ... But we've faced other problems with dumping from interrupt context, which could be encountered Alt-SysRq-c trigger, depending on the dump device type, or rather the driver involved. What device are you dumping to ? I did hack things a little to work around some of the problems to get Alt+SysRq+c dumping working in our test setup, but its probably not quite the right way to do this. In the long run - a dump device interface or second kernel soft boot approach in its absence for panic style dumps and maybe the deferred dump option for non-disruptive dumps are possibilities being looked at. But then you do seem to be able to dump after making the changes you mention)(After all you do seem to be able to dump after making the changes you mention Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 "Tony Dziedzic" on 10/10/2001 06:20:44 PM Please respond to "Tony Dziedzic" To: lkcd@oss.sgi.com cc: (bcc: Suparna Bhattacharya/India/IBM) Subject: LKCD problems on SMP configurations? I've integrated the latest LKCD code from SourceForge into our 2.4.4 kernel sources and have noticed that dumping on SMP systems isn't very reliable. The test that I've been using is the Alt-SysRq-C magic key sequence to generate a "sysrq" panic. The symptom that I see is that the system hangs after printing the "Writing dump header ..." message. Is anyone aware of pending issues on SMP systems? I've found that if I comment out the __cli(); disable_local_APIC(); __sti(); sequence in smp_send_stop the hangs do not occur and I can reproducibly generate a crash dump. Does this ring any bells? FWIW, the system that I'm testing on uses a Tyan S2510 motherboard (dual CPU). Thanks, Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Fri Oct 12 10:49:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9CHnRc26571 for lkcd-outgoing; Fri, 12 Oct 2001 10:49:27 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9CHnKD26568 for ; Fri, 12 Oct 2001 10:49:20 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9CHmd206482; Fri, 12 Oct 2001 10:48:39 -0700 Message-ID: <3BC72E37.BF7F6A98@alacritech.com> Date: Fri, 12 Oct 2001 10:53:59 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: naomi@pst.fujitsu.com, lkcd@oss.sgi.com Subject: Re: lcrash sub-commands line completion References: <20010904162753R.naomi@pst.fujitsu.com> <20011010100417W.naomi@pst.fujitsu.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello, Naomi-san. I will finish review of this patch and add it into the code as soon as I can (probably Sunday night given my current set of work). --Matt naomi@pst.fujitsu.com wrote: > > > Hello. > > Recently, I think that lcrash should have "sub-commands line completion". > > > > Lcrash has many sub-commands. And almost sub-commands have parameters such as > > filename or symbol name which should be specified. > > The present lcrash cannot complete on sub-commands line. > > For this reason, we have to memorize sub-commands names and parameters exactly. > > It is very inconvenient. > > So I'll add completion capability to librl. > > > > I'm considering as follows. > > While editing sub-commands line, if TAB key is pressed, lcrash completes the > > line (or do something as bash does). > > Lcrash will complete on sub-commands names with behavior almost equivalent to > > bash. > > And I consider that parameters of sub-commands have different characteristic > > each other, I'll add the mechanism let you be able to make your own completion > > function. Using this mechanism, you can call the function that behaves as you > > want when TAB key is pressed. > > Hi, all. > > I implemented the lcrash completion mechanism(first phase) that I mentioned > in the above mail. > > Here is the description of the mechanism. > > While editing sub-commands line, if TAB key is pressed, lcrash completes the > line. > - When TAB key is pressed at the head of line, lcrash prints the list of > sub-commands names. > - When TAB key is pressed in the middle of the first word of line, lcrash > completes sub-commands names. > * When there is no candidate, prints beep character. > * When there is a candidate, prints the string of it. > * When there are two or more candidates, > + When there is the identical part of string of them, prints > the string. > + When there isn't the identical part of string of them, > prints the list of them. > - When TAB key is pressed on the word after the second it of line, lcrash > completes sub-commands arguments. > Note) This function is not implemented, now lcrash prints beep character. > > Given below is patch against the files taken from sourceforge cvs of lkcd. > Any comments and suggestions are welcomed. From owner-lkcd@oss.sgi.com Mon Oct 15 00:08:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9F786H29776 for lkcd-outgoing; Mon, 15 Oct 2001 00:08:06 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9F781D29773 for ; Mon, 15 Oct 2001 00:08:04 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9F7CDo11431; Mon, 15 Oct 2001 00:12:14 -0700 Date: Mon, 15 Oct 2001 00:12:13 -0700 (PDT) From: "Matt D. Robinson" To: "Matt D. Robinson" cc: , Subject: Re: lcrash sub-commands line completion In-Reply-To: <3BC72E37.BF7F6A98@alacritech.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Fri, 12 Oct 2001, Matt D. Robinson wrote: |>Hello, Naomi-san. I will finish review of this patch and add it into the |>code as soon as I can (probably Sunday night given my current set of |>work). |> |>--Matt I've reviewed the patch, I didn't see anything obvious while walking through it, so I've added it to the tree. It's now checked in and available (works pretty well, too, I might add). Thank you, Naomi-san. :) --Matt From owner-lkcd@oss.sgi.com Tue Oct 16 07:20:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9GEKTr32622 for lkcd-outgoing; Tue, 16 Oct 2001 07:20:29 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9GEKRD32619 for ; Tue, 16 Oct 2001 07:20:27 -0700 content-class: urn:content-classes:message Subject: kl_dump_erase commented out in kl_util.c? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Tue, 16 Oct 2001 10:20:25 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE09769F309@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: kl_dump_erase commented out in kl_util.c? Thread-Index: AcFWTbM4J8xeNUusSMe8ciL8pCQIsw== From: "Tony Dziedzic" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9GEKSD32620 Sender: owner-lkcd@oss.sgi.com Precedence: bulk My test system is attempting to copy a "stale" crash dump each time I reboot the system normally after having previously forced a crash dump. The call to kl_dump_erase in kl_util.c is commented out (in an #if 0 conditional), which means if the swapper doesn't overwrite the dump header lcrash will think it needs to copy the crash dump again the next time it runs. This appears to be an artifact from testing that should be removed for the released code. I removed the conditionals and lcrash works as expected. FYI, Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Tue Oct 16 10:48:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9GHmjC04157 for lkcd-outgoing; Tue, 16 Oct 2001 10:48:45 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9GHmgD04154 for ; Tue, 16 Oct 2001 10:48:42 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9GHle225482; Tue, 16 Oct 2001 10:47:40 -0700 Message-ID: <3BCC7421.CD8395BA@alacritech.com> Date: Tue, 16 Oct 2001 10:53:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Tony Dziedzic CC: lkcd@oss.sgi.com Subject: Re: kl_dump_erase commented out in kl_util.c? References: <88D2015B3AF7BF4B91272EC25A9FE09769F309@XCHANGESERVER.storigen.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Tony Dziedzic wrote: > > My test system is attempting to copy a "stale" crash dump each time I > reboot the system normally after having previously forced a crash dump. > The call to kl_dump_erase in kl_util.c is commented out (in an #if 0 > conditional), which means if the swapper doesn't overwrite the dump > header lcrash will think it needs to copy the crash dump again the next > time it runs. > > This appears to be an artifact from testing that should be removed for > the released code. I removed the conditionals and lcrash works as > expected. > > FYI, > Tony Dziedzic > Storigen Systems, Inc. Fixed. Thanks, Tony. --Matt From owner-lkcd@oss.sgi.com Tue Oct 16 16:28:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9GNS6K11918 for lkcd-outgoing; Tue, 16 Oct 2001 16:28:06 -0700 Received: from antigonus.hosting.pacbell.net (antigonus.hosting.pacbell.net [216.100.98.13]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9GNS4D11915 for ; Tue, 16 Oct 2001 16:28:04 -0700 Received: from KimPC (LE-WIZCOMMUNICATIONS-cust-2196412.cust-rtr.pacbell.net [63.199.86.102]) by antigonus.hosting.pacbell.net id TAA27539; Tue, 16 Oct 2001 19:28:03 -0400 (EDT) [ConcentricHost SMTP Relay 1.7] From: "Kim Le" To: Subject: New to lkcd - need help Date: Tue, 16 Oct 2001 16:26:57 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi All, I am new to lkcd and not sure how to use it. I hope anyone can help me to get started. I installed lkcd and able to collect crash dump in to /var/log/vmdump/vmdump.1. Along with that is analysis.1 crash.1. Now my question is how can I perform stack trace to know where my code is crash. I tried to run crash.1 then trace command but it keep complaining that no default task defined. Is there any quick usage guide for lkcd that I can look at ? Thanks in advance for any help. Kim From owner-lkcd@oss.sgi.com Wed Oct 17 02:44:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9H9iX022785 for lkcd-outgoing; Wed, 17 Oct 2001 02:44:33 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9H9iTD22782 for ; Wed, 17 Oct 2001 02:44:29 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9H9mr714546; Wed, 17 Oct 2001 02:48:53 -0700 Date: Wed, 17 Oct 2001 02:48:53 -0700 (PDT) From: "Matt D. Robinson" To: Kim Le cc: Subject: Re: New to lkcd - need help In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk A few questions: - How did the system crash? Does 'stat' show an exception, or a panic case, or ... ? - Based on that, can you 'deftask' to the failing process' address. Let's say that the failing process ID is 1538. From there, you can 'task | grep 1538'. The first address given (0xdefdb000, for example ... it will more than likely be different for your case) can be used as the argument to deftask. Then 't' will report the stack trace for the default task. So, in short, 'stat' to find the failing process/process ID, 'task' to show the current set of processes, using 'grep' to find the right task based on the process ID, failing CPU, etc., and 'deftask' to the base address of the task. Thanks, Kim. --Matt On Tue, 16 Oct 2001, Kim Le wrote: |>Hi All, |> |>I am new to lkcd and not sure how to use it. I hope anyone can help me to |>get started. |> |>I installed lkcd and able to collect crash dump in to |>/var/log/vmdump/vmdump.1. Along with that is analysis.1 crash.1. |> |>Now my question is how can I perform stack trace to know where my code is |>crash. I tried to run crash.1 then trace command but it keep complaining |>that no default task defined. |> |>Is there any quick usage guide for lkcd that I can look at ? |> |>Thanks in advance for any help. |> |>Kim |> From owner-lkcd@oss.sgi.com Wed Oct 17 02:49:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9H9nsp22890 for lkcd-outgoing; Wed, 17 Oct 2001 02:49:54 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9H9njD22887 for ; Wed, 17 Oct 2001 02:49:45 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9H9rx914557; Wed, 17 Oct 2001 02:53:59 -0700 Date: Wed, 17 Oct 2001 02:53:59 -0700 (PDT) From: "Matt D. Robinson" To: Masashige Kotani cc: "Matt D. Robinson" , Subject: Re: Dump method In-Reply-To: <007d01c1516d$cb03e600$d64817ac@aoi.pst.fujitsu.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 10 Oct 2001, Masashige Kotani wrote: |>Hi, Matt. |> |>I'd like to know more about dump methods. |>Could you give me further information about below 3 points ? |> |>Because we are planning to support of multiple dump devices as shown in |>former mail |>(Message-ID:<006201c13518$869ce140$d64817ac@aoi.pst.fujitsu.com>), |>I want to consider the program composition being conscious of |>module structure of dumping method. |> |> |>Point 1: relation of dump devices and dump methods |> |>from Message-ID: <3B9563E8.9A432B7B@alacritech.com>: |> |>> Here's where I'm going with this. I just finished the code to allow |>> people to install their own dump compression mechanisms (right now, it'll |>> be RLE, I have to check in the GZIP compression module, and people can |>> put in whatever one they want). Do you want to take the next step and |>> let people have chains of dump mechanisms based on the dump condition? |>> I realize multiple dump devices is good, but what if you could plug in |>> your own dump method with it? Then that dump method could query the |>> available dump devices configured. |>> |>> So you'd have: |>> |>> dump methods (one standard, but plug-and-play) |>> dump devices (requires at least one, multiples allowed, maybe |>> access lists for methods?) |>> dump compressions (configurable, usable by some methods) |>> |>> Would this be the eventual goal? That way, everything is tunable to |>> their own liking. I figured I'd ask, since if you're going to add in |>> multiple dump devices, and we've gone to multiple compression types, |>> you might as well go all the way and add dump methods as well. I |>> don't know what the rest of the group thinks, but this could be |>> very useful. |> |>I'm not sure about "maybe access lists for methods?". |>Please give me further description. |> |>For example, I want to know the relation between "dump devices" and |>"dump methods". |> |>Does each dump method register dump devices ? |> For example |> methodA : /dev/sda5 & /dev/sda6 & /dev/sda7 |> methodB : /dev/sda8 & /dev/sda9 |> |>Or, Does each dump device register dump method ? |> For example |> /dev/sda5 : mothodA |> /dev/sda6 : mothodA |> /dev/sda7 : mothodA |> /dev/sda8 : mothodB |> /dev/sda9 : mothodB I was proposing this one. Dump devices (/dev/dump/dump1, for example) refers to a dump disk method, followed by either a dump device, network address, or whatever the storage location might be, and any flags, etc. /dev/dump becomes a directory, /dev/dump/dump[0-N] becomes the base dump device for performing open()s, ioctl()s, close()s, etc. |>Or, Does each dump method share all dump devices? |> For example |> shared dump devices : /dev/sda5 & /dev/sda6 & /dev/sda7 |> methodA : from shared dump devices |> methodB : from shared dump devices |> |>Or, the other way? |> |> |>Point 2: Saving to dump file |> |>How about lcrash? |>If you make own dump method module, you have to build the function |>dealing with the dump method in lcrash ? |>Or, should lcrash have the ability of adding function the same as |>kernel module ? All that would be added to the dump header or some identifier so that 'lcrash' can decipher the dump methodology. I think a universal dump header is a good thing to have. |>Point 3: How does it goes ? |> |>Do you have any plans to complete this facility ? This is for the 5.0 time frame. 4.0 is basically done. I'm all for getting this done in the 5.0 release. We have to finish the dump() device driver implementation for at least IDE first before we implement this. Thanks, Masashige-san. I hope this E-mail doesn't arrive too late for any planning needs you may have. --Matt |>Regards, |>Masashige From owner-lkcd@oss.sgi.com Wed Oct 17 08:09:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9HF9Id01482 for lkcd-outgoing; Wed, 17 Oct 2001 08:09:18 -0700 Received: from e4.ny.us.ibm.com (e4.ny.us.ibm.com [32.97.182.104]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9HF91D01474 for ; Wed, 17 Oct 2001 08:09:01 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e4.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id LAA29028; Wed, 17 Oct 2001 11:06:23 -0400 Received: from sparklet.in.ibm.com (sparklet.in.ibm.com [9.186.133.17]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9HF8kA23062; Wed, 17 Oct 2001 11:08:47 -0400 Received: (from suparna@localhost) by sparklet.in.ibm.com (8.11.0/8.11.0) id f9HKj6u07603; Wed, 17 Oct 2001 15:45:06 -0500 Date: Wed, 17 Oct 2001 15:45:04 -0500 From: Suparna Bhattacharya To: yakker@alacritech.com Cc: lkcd@oss.sgi.com Subject: Dump from interrupt context - some hacks to cover a few other cases Message-ID: <20011017154503.A4764@in.ibm.com> Reply-To: suparna@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, Attached is a patch with some hacks related to dumping from interrupt context in the situation where the underlying device driver relies on bottom-halfs/softirqs for its i/o completion processing. We may not hit this case with all drivers. I discovered today that Alt+Sysrq+c dumping seemed to be working on our setup now even when I took out these patches, and spent a lot of time puzzling over why it did, till it turned out that we were using the old aic7xxx driver which doesn't appear to be depending on bottom-halfs for i/o completion. That may be the case with IDE too. The new aic7xxx driver seems to use scsi_done rather than scsi_old_done which does appears to use bottom-halfs. We haven't had a chance to try to test this again with the new aic7xxx driver to verify this today with 4.0 patch. But since I'll be on vacation from day-after (19th-29th), just decided to send this out to you, so you can look through it and see if this makes sense, and perhaps try this out in your setup and let us know what you think. The changes are fairly simple, as you can see: - Set the irq and bh counts to zero during dumping in addition to sti() complete the pretence that we aren't in interrupt context. Restore the original settings back after dump. (See comments in patch) - Avoid wakeups on the dumping thread happening during i/o completion. It is not necessary since we are never scheduled out, and it could result in unexpected side effects when we continue after dump (non-disruptive dump case) if dump i/o happens in the context of the idle thread (which could be the case for Alt+Sysrq+c interrupts). In order to achieve this with minimal intrusion, kiobuf_end_io has been modified to issue wakeups only if an end_io routine has not been specified. This would be in line with the way it happens with bh end_io, and appears to be an acceptable change in general (from what I could tell when I checked with a few people) I don't see any users of kiobuf->end_io in the kernel tree, but there could be some outside. If so, they would be impacted. So that's something to look out for. So dump_iobuf now uses a dummy end_io function, so that it can bypass the need for wakeup. - Also changed dump_kernel_write() to use iobuf->blocks[] instead of allocating b[KIO_MAX_SECTORS] on stack, while at it. - Call dump() with the correct regs from sysrq, instead of panic, so that we can support both disruptive and non-disruptive dumps correctly. This is obviously only a temporary workaround, not a reliable solution for longer term - just until we have the dump driver interface or other solution and a deferred dumping scheme for non-disruptive dumps. But it could possibly be used for getting Alt+Sysrq+c dumps for now. The patch is with respect to the latest lkcd4.0 tree that we tried out earlier today. ------------- PATCH HERE ------------------------------------- diff -ur linux-2.4.8+lkcd/drivers/char/sysrq.c suparna/drivers/char/sysrq.c --- linux-2.4.8+lkcd/drivers/char/sysrq.c Wed Oct 17 16:05:56 2001 +++ suparna/drivers/char/sysrq.c Wed Oct 17 16:02:23 2001 @@ -93,7 +93,7 @@ #if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) case 'c': - panic("sysrq"); + dump("sysrq", pt_regs); break; #endif diff -ur linux-2.4.8+lkcd/drivers/dump/dump_base.c suparna/drivers/dump/dump_base.c --- linux-2.4.8+lkcd/drivers/dump/dump_base.c Wed Oct 17 16:04:18 2001 +++ suparna/drivers/dump/dump_base.c Wed Oct 17 16:01:34 2001 @@ -252,6 +252,10 @@ static char dpcpage[DUMP_DPC_PAGE_SIZE]; /* buffer used for compression */ static unsigned long dump_save_flags; /* save_flags()/restore_flags() */ +/* for dumping from interrupt context (Fixme) */ +static int saved_irq_count; /* remember the current irq nesting level */ +static int saved_bh_count; /* remember if we were in soft irq context */ + /* used for dump compressors */ static struct list_head dump_compress_list = LIST_HEAD_INIT(dump_compress_list); @@ -383,7 +387,8 @@ dump_kernel_write(void) { int err = 0, iosize, i; - unsigned long b[KIO_MAX_SECTORS], blocknr, blocks, limit; + unsigned long *b=dump_iobuf->blocks; + unsigned long blocknr, blocks, limit; /* check the device size to make sure we are in-line */ if (blk_size[MAJOR(dump_device)]) { @@ -565,6 +570,15 @@ unsigned int stage = 0; int cpu = smp_processor_id(); + if (in_interrupt()) { + printk("Dumping from interrupt handler !\n"); + printk("Uncertain scenario - but will try my best\n"); + /* + * Must be an unrelated interrupt, not in the middle of io ! + * If we've panic'ed in the middle of io we should take + * another approach + */ + } /* see if there's something to do before we re-enable interrupts */ (void)__dump_silence_system(stage); @@ -573,6 +587,26 @@ dumping_cpu = cpu; dump_in_progress = TRUE; save_flags(dump_save_flags); + + /* -------------------------------------------------- */ + /* Kludge - dump from interrupt context is unreliable (Fixme) + * + * We do this so that softirqs initiated for dump i/o + * get processed and we don't hang while waiting for i/o + * to complete or in any irq synchronization attempt. + * + * This is not quite legal of course, as it has the side + * effect of making all interrupts & softirqs triggered + * while dump is in progress complete before currently + * pending softirqs and the currently executing interrupt + * code. + */ + saved_irq_count = local_irq_count(cpu); + saved_bh_count = local_bh_count(cpu); + local_irq_count(cpu) = 0; + local_bh_count(cpu) = 0; + /* -----------------------------------------------------*/ + sti(); /* enable interrupts just in case ... */ /* now increment the stage and do stuff after interrupts are enabled */ @@ -597,6 +631,8 @@ /* restore flags and other dump state fields */ restore_flags(dump_save_flags); + local_irq_count(dumping_cpu) = saved_irq_count; + local_bh_count(dumping_cpu) = saved_bh_count; dump_in_progress = FALSE; dump_okay = TRUE; @@ -1003,6 +1039,11 @@ return (-EAGAIN); } +void dump_iobuf_end_io(struct kiobuf *iobuf) +{ + /* No wakeup needed since we've stopped scheduling */ + return; +} /* * Name: dump_open_kdev() * Func: Try to open the kdev_t argument as the real dump device. @@ -1076,6 +1117,7 @@ dump_iobuf->offset = 0; dump_iobuf->length = DUMP_PAGE_SZ; dump_iobuf->nr_pages = (DUMP_PAGE_SZ >> PAGE_SHIFT); + dump_iobuf->end_io = dump_iobuf_end_io; /* assign the new dump file structure */ dump_device = tmp_dump_device; diff -ur linux-2.4.8+lkcd/fs/iobuf.c suparna/fs/iobuf.c --- linux-2.4.8+lkcd/fs/iobuf.c Wed Oct 17 16:05:03 2001 +++ suparna/fs/iobuf.c Wed Oct 17 16:01:49 2001 @@ -17,8 +17,10 @@ if (atomic_dec_and_test(&kiobuf->io_count)) { if (kiobuf->end_io) + /* the end_io fn should take care of waiters too */ kiobuf->end_io(kiobuf); - wake_up(&kiobuf->wait_queue); + else + wake_up(&kiobuf->wait_queue); } } @@ -121,7 +123,6 @@ iobuf->array_len = wanted; return 0; } - void kiobuf_wait_for_io(struct kiobuf *kiobuf) { From owner-lkcd@oss.sgi.com Thu Oct 18 13:12:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9IKCqh05072 for lkcd-outgoing; Thu, 18 Oct 2001 13:12:52 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9IKCoD05068 for ; Thu, 18 Oct 2001 13:12:50 -0700 content-class: urn:content-classes:message Subject: Anyone actively working on gzip compression in LKCD? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Thu, 18 Oct 2001 16:12:48 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE097691DFD@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Anyone actively working on gzip compression in LKCD? Thread-Index: AcFYEUI+uDmsemznS2S3DMzbqcZIvg== From: "Tony Dziedzic" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9IKCoD05069 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Is anyone actively working on getting gzip compression to work with LKCD (dump and lcrash)? Before I spend too much time looking at it I thought I'd check that I'm not duplicating anyone else's effort. I've fixed the bugs that I've found in the drivers/dump gzip code, and the resulting code manages about a 4:1 compression ratio on my test system. Of course, since I can't read the dump with lcrash (yet), it's a little hard to pin any validity on that number ... Thanks, Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Fri Oct 19 01:16:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9J8GRJ20462 for lkcd-outgoing; Fri, 19 Oct 2001 01:16:27 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9J8GND20459 for ; Fri, 19 Oct 2001 01:16:23 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9J8Khq02616 for ; Fri, 19 Oct 2001 01:20:43 -0700 Date: Fri, 19 Oct 2001 01:20:43 -0700 (PDT) From: "Matt D. Robinson" To: Subject: LKCD 4.0 Release Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk You can obtain LKCD version 4.0, the patch and the RPM, from http://lkcd.sourceforge.net/download/4.0/ This includes module support for dumping, a considerable number of changes to improve 'lcrash' as a whole, a move to /dev/dump mechanisms over /dev/vmdump, etc. If you have a previous version of LKCD on your system, please remove it before installing this version. 4.0 and beyond should require few changes to your system configuration. My thanks for the kind folks who have made contributions, additions, bug fixes, or comments to improve LKCD over the last few months. Now that there are a considerable number of people working on LKCD, we hope to have more frequent releases, with major improvements moving forward, until eventually LKCD becomes a default part of the Linux kernel. My special thanks to IBM, who have contributed a lot of time, energy, comments, code, people, and general good will towards LKCD and all of its open source community developers. I can't say enough about them -- it's been great having them work on LKCD. One other last point, we will be moving the mailing list for LKCD from 'lkcd@oss.sgi.com' to 'lkcd-general@lists.sourceforge.net' in the next few weeks. This is a heads-up that the mailing list is going to change slightly. We're migrating everything off of oss.sgi.com onto a more easily maintained server. --Matt P.S. 4.0.1 will probably be around the corner shortly, since there were a few last-minute bugs that people pointed out that, while annoying, didn't need to hold up 4.0. From owner-lkcd@oss.sgi.com Fri Oct 19 10:02:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9JH2tQ00975 for lkcd-outgoing; Fri, 19 Oct 2001 10:02:55 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9JH2qD00972 for ; Fri, 19 Oct 2001 10:02:52 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9JH7Gv03097; Fri, 19 Oct 2001 10:07:16 -0700 Date: Fri, 19 Oct 2001 10:07:16 -0700 (PDT) From: "Matt D. Robinson" To: Tony Dziedzic cc: Subject: Re: Anyone actively working on gzip compression in LKCD? In-Reply-To: <88D2015B3AF7BF4B91272EC25A9FE097691DFD@XCHANGESERVER.storigen.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Thu, 18 Oct 2001, Tony Dziedzic wrote: |>Is anyone actively working on getting gzip compression to work with LKCD |>(dump and lcrash)? Before I spend too much time looking at it I thought |>I'd check that I'm not duplicating anyone else's effort. I currently am, but I'm waiting for some of this stuff with zlib in the kernel to finalize. Looks like you've taken my initial check-ins and moved on them. Cool. |>I've fixed the bugs that I've found in the drivers/dump gzip code, and |>the resulting code manages about a 4:1 compression ratio on my test |>system. Of course, since I can't read the dump with lcrash (yet), it's |>a little hard to pin any validity on that number ... Heh. :) How about this, do you want to do the kernel space, and I'll add the 'lcrash' support? I need to restructure 'lcrash' to handle multiple dump types anyway ... |>Thanks, |>Tony Dziedzic |>Storigen Systems, Inc. Thanks, Tony. --Matt From owner-lkcd@oss.sgi.com Fri Oct 19 11:34:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9JIY5903926 for lkcd-outgoing; Fri, 19 Oct 2001 11:34:05 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9JIXxD03921 for ; Fri, 19 Oct 2001 11:34:00 -0700 content-class: urn:content-classes:message Subject: RE: Anyone actively working on gzip compression in LKCD? MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Fri, 19 Oct 2001 14:33:56 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE09769F313@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Anyone actively working on gzip compression in LKCD? Thread-Index: AcFYv+M9XBUUriwMSgqsJU7UCBvvngACrvgA From: "Tony Dziedzic" To: "Matt D. Robinson" Cc: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9JIY0D03922 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt: That would be fine with me. I'll e-mail you in a separate reply a tarball containing my current drivers/dump directory that writes gzip-compressed dumps (based on LKCD 4 release). I'll also include a quick and dirty version of kl_cmp.c that can read gzip-compressed data. Until the zlib in kernel issue is resolved I simply grabbed a copy of the zlib code from drivers/net and dumped it in drivers/dump as zlib_dump.c. One caveat that Dave Anderson at MCLinux pointed out: dump_gzip will need to avoid calling the gz compression code when the page being written is the current stack page. Since the compression code uses (and modifies) stack variables, the data that is being compressed will be modified, resulting in a subsequent Z_DATA_ERROR when uncompressing that page. The nasty side-effect of this issue is that a backtrace of the current task fails. Other than that, my modified lcrash seemed happy with a gzip-compressed dump. I'll fix this issue within the next few days and get an updated dump_gzip.c out to you. Tony > -----Original Message----- > From: Matt D. Robinson [mailto:yakker@aparity.com] > Sent: Friday, October 19, 2001 1:07 PM > To: Tony Dziedzic > Cc: lkcd@oss.sgi.com > Subject: Re: Anyone actively working on gzip compression in LKCD? > > > > I currently am, but I'm waiting for some of this stuff with > zlib in the kernel to finalize. Looks like you've taken > my initial check-ins and moved on them. Cool. > > Heh. :) How about this, do you want to do the kernel space, and > I'll add the 'lcrash' support? I need to restructure 'lcrash' to > handle multiple dump types anyway ... > > |>Thanks, > |>Tony Dziedzic > |>Storigen Systems, Inc. > > Thanks, Tony. > > --Matt > > From owner-lkcd@oss.sgi.com Mon Oct 22 21:05:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9N45xo24522 for lkcd-outgoing; Mon, 22 Oct 2001 21:05:59 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9N45tD24519 for ; Mon, 22 Oct 2001 21:05:55 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.99.140.24]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id AAA62404 for ; Tue, 23 Oct 2001 00:03:23 -0400 Received: from gateway.beaverton.ibm.com (gateway.beaverton.ibm.com [138.95.180.1]) by westrelay03.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9N45qp60130 for ; Mon, 22 Oct 2001 22:05:53 -0600 Received: from crg8.beaverton.ibm.com (crg8.beaverton.ibm.com [138.95.19.9]) by gateway.beaverton.ibm.com (8.10.0.Beta10/8.10.0.Beta10) with ESMTP id f9N45nG03697; Mon, 22 Oct 2001 21:05:50 -0700 (PDT) Received: (from washer@localhost) by crg8.beaverton.ibm.com (8.10.0.Beta10/8.8.5/token.aware-1.2) id f9N45na05365; Mon, 22 Oct 2001 21:05:49 -0700 (PDT) From: James Washer Message-Id: <200110230405.f9N45na05365@crg8.beaverton.ibm.com> Subject: A couple of quick notes about lkcd4.0 To: lkcd@oss.sgi.com Date: Mon, 22 Oct 2001 21:05:48 -0700 (PDT) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk I hope these are not too small to bring to your attention.. but here's a couple of things I noticed today, after downing lkcd 4 The README.lkcd refers several times to "/sbin/vmdump", however it seems that in rev 4, the should now be /sbin/lkcd Also, the webpage (http://lkcd.sourceforge.net/) has a link to the "Mailing List" which does not work. On a slightly more technical note, installing the kernel patch (lkcd-2.4.8-4.0.diff) resulted in a broken arch/i386/config.in (at least 'make xconfig' thought it was broken). The following block was added near the end of the file: tristate 'Linux Kernel Crash Dump (LKCD) Support' CONFIG_DUMP if [ "$CONFIG_DUMP" = "y" ]; then dep_bool ' LKCD RLE compression' CONFIG_DUMP_COMPRESS_RLE $CONFIG_DUMP elif [ "$CONFIG_DUMP" = "m" ]; then dep_tristate ' LKCD RLE compression' CONFIG_DUMP_COMPRESS_RLE $CONFIG_DUMP fi A quick check of Documentation/kbuild/config-language.txt show's no "elif" ( this is with a 2.4.8 tree). The workaround was trivial, given that I wasn't intending to compile lkcd as a module. Sorry to bring up such silly little things, but I suppose they should get fixed. - jim p.s. If I were to have changes for lkcd in the future, should I submit those as a 'patch' file, to this alias, or is there a different/preferred method. -- James Washer IBM Linux Change Team From owner-lkcd@oss.sgi.com Mon Oct 22 23:05:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9N659K26533 for lkcd-outgoing; Mon, 22 Oct 2001 23:05:09 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9N654D26530 for ; Mon, 22 Oct 2001 23:05:04 -0700 Received: from alacritech.com ([10.1.10.37]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9N63J206949; Mon, 22 Oct 2001 23:03:19 -0700 Message-ID: <3BD507BA.CECB6A@alacritech.com> Date: Mon, 22 Oct 2001 23:01:30 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: James Washer CC: lkcd@oss.sgi.com Subject: Re: A couple of quick notes about lkcd4.0 References: <200110230405.f9N45na05365@crg8.beaverton.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk James Washer wrote: > > I hope these are not too small to bring to your attention.. but here's a > couple of things I noticed today, after downing lkcd 4 > > The README.lkcd refers several times to "/sbin/vmdump", however it seems > that in rev 4, the should now be /sbin/lkcd I've fixed most of these -- it needs more content, and the FAQ really requires some major changes. It's now checked in. > Also, the webpage (http://lkcd.sourceforge.net/) has a link to > the "Mailing List" which does not work. The mailing list has not moved yet -- see my previous E-mail in the announcement. :) > On a slightly more technical note, installing the kernel patch > (lkcd-2.4.8-4.0.diff) resulted in a broken arch/i386/config.in (at > least 'make xconfig' thought it was broken). > > The following block was added near the end of the file: > > tristate 'Linux Kernel Crash Dump (LKCD) Support' CONFIG_DUMP > if [ "$CONFIG_DUMP" = "y" ]; then > dep_bool ' LKCD RLE compression' CONFIG_DUMP_COMPRESS_RLE $CONFIG_DUMP > elif [ "$CONFIG_DUMP" = "m" ]; then > dep_tristate ' LKCD RLE compression' CONFIG_DUMP_COMPRESS_RLE $CONFIG_DUMP > fi > > A quick check of Documentation/kbuild/config-language.txt show's no > "elif" ( this is with a 2.4.8 tree). The workaround was trivial, given > that I wasn't intending to compile lkcd as a module. I've corrected this in the source tree. I've changed 'elif' to an 'fi' followed by an 'if', and that seems to do the trick for now. > Sorry to bring up such silly little things, but I suppose they should get fixed. They're fixed in the CVS tree, and I'll roll a version later this week when I get back from SNW in Orlando. > - jim > > p.s. If I were to have changes for lkcd in the future, should I submit those > as a 'patch' file, to this alias, or is there a different/preferred method. Submit them here -- that's always fine, or if you think you'll be making a lot of modifications, we should talk to discuss the process (it's pretty simple, just to make sure people don't step on each other). We take lots of patches, and in almost every case, we review them, but we don't get everything. Thanks, Jim. :) --Matt > -- > James Washer > IBM Linux Change Team From owner-lkcd@oss.sgi.com Tue Oct 23 06:05:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9ND5w703860 for lkcd-outgoing; Tue, 23 Oct 2001 06:05:58 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9ND5sD03856 for ; Tue, 23 Oct 2001 06:05:54 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: [PATCH] LKCD causes BUG in highmem.h when dumping to a large memory system with HIGHMEM_DEBUG enabled Date: Tue, 23 Oct 2001 09:05:53 -0400 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE09769F315@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: [PATCH] LKCD causes BUG in highmem.h when dumping to a large memory system with HIGHMEM_DEBUG enabled Thread-Index: AcFbw3HC43Hg4GqeQ4ivhWN5p6jk/w== From: "Tony Dziedzic" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9ND5sD03857 Sender: owner-lkcd@oss.sgi.com Precedence: bulk This patch address a BUG in highmem.h that LKCD triggers when dumping to a large memory system with HIGHMEM_DEBUG enabled. The BUG occurs the second time dump_add_page is called for a high memory page, when dump_add_page calls kmap_atomic. If HIGHMEM_DEBUG is enabled kmap_atomic verifies that the caller has released the temporary mapping via a call to kunmap_atomic. Since this hasn't happened, highmem.h BUGs. The patch adds a call to kunmap_atomic at the end of dump_add_page. Note that the patch is not required unless your kernel has HIGHMEM_DEBUG enabled. There is a slight performance hit associated with this patch (due to kunmap_atomic's TLB flush). Those who object to the performance hit may choose to enclose the #ifdef CONFIG_HIGHMEM conditional in a #ifdef HIGHMEM_DEBUG conditional. FYI, Tony Dziedzic Storigen Systems, Inc. --- lkcd/2.4/drivers/dump/dump_base.c Tue Oct 16 05:33:38 2001 +++ new-lkcd/2.4/drivers/dump/dump_base.c Tue Oct 23 08:53:12 2001 @@ -549,6 +549,15 @@ memcpy((void *)(dump_page_buf + *toffset), (const void *)vaddr, size); } +#ifdef CONFIG_HIGHMEM + if (PageHighMem(p)) { + /* + * Since this can be executed from IRQ context, + * reentrance on the same CPU must be avoided: + */ + kunmap_atomic(vaddr, KM_BOUNCE_WRITE); + } +#endif *toffset += size; dump_header.dh_num_pages++; return (0); From owner-lkcd@oss.sgi.com Tue Oct 23 10:15:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9NHFBJ09339 for lkcd-outgoing; Tue, 23 Oct 2001 10:15:11 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9NHF9D09336 for ; Tue, 23 Oct 2001 10:15:09 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA54748 for ; Tue, 23 Oct 2001 13:12:38 -0400 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9NHF8p194450 for ; Tue, 23 Oct 2001 11:15:08 -0600 Importance: Normal Subject: Which first? config or save To: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Tue, 23 Oct 2001 10:14:32 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/23/2001 11:15:07 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk The patch files for sysinit place '/sbin/lkcd config' before '/sbin/lkcd save', however README.lkcd implies the opposite order. Which is correct? From owner-lkcd@oss.sgi.com Tue Oct 23 11:11:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9NIBsV10190 for lkcd-outgoing; Tue, 23 Oct 2001 11:11:54 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9NIBqD10187 for ; Tue, 23 Oct 2001 11:11:52 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: RE: Which first? config or save Date: Tue, 23 Oct 2001 14:11:51 -0400 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE097691E12@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Which first? config or save Thread-Index: AcFb5qlR/1kgr+0gTA6tuzp8Nt6BtwAB11Lw From: "Tony Dziedzic" To: "James Washer" , Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9NIBqD10188 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Correct order is in the sysinit patch: config first, then save. (Save relies on configuration parameters set by config.) Tony Dziedzic Storigen Systems, Inc. > -----Original Message----- > From: James Washer [mailto:washer@us.ibm.com] > Sent: Tuesday, October 23, 2001 1:15 PM > To: lkcd@oss.sgi.com > Subject: Which first? config or save > > > The patch files for sysinit place '/sbin/lkcd config' before > '/sbin/lkcd > save', > however README.lkcd implies the opposite order. > > Which is correct? > > From owner-lkcd@oss.sgi.com Tue Oct 23 14:29:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9NLTNL15946 for lkcd-outgoing; Tue, 23 Oct 2001 14:29:23 -0700 Received: from e33.bld.us.ibm.com (e33.co.us.ibm.com [32.97.110.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9NLTID15934 for ; Tue, 23 Oct 2001 14:29:18 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e33.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id QAA57714; Tue, 23 Oct 2001 16:26:46 -0500 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9NLS0v271470; Tue, 23 Oct 2001 15:28:00 -0600 Importance: Normal Subject: RE: Which first? config or save To: "Tony Dziedzic" Cc: X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Tue, 23 Oct 2001 14:26:24 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/23/2001 03:27:59 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Tony, Are you sure that "Save relies on configuration parameters set by config"?? I've looked ( briefly ) at the dump save (lcrash) and don't see that it requires anything from the config step. What did I miss? - jim "Tony Dziedzic" @oss.sgi.com on 10/23/2001 11:11:51 AM Sent by: owner-lkcd@oss.sgi.com To: James Washer/Beaverton/IBM@IBMUS, cc: Subject: RE: Which first? config or save Correct order is in the sysinit patch: config first, then save. (Save relies on configuration parameters set by config.) Tony Dziedzic Storigen Systems, Inc. > -----Original Message----- > From: James Washer [mailto:washer@us.ibm.com] > Sent: Tuesday, October 23, 2001 1:15 PM > To: lkcd@oss.sgi.com > Subject: Which first? config or save > > > The patch files for sysinit place '/sbin/lkcd config' before > '/sbin/lkcd > save', > however README.lkcd implies the opposite order. > > Which is correct? > > From owner-lkcd@oss.sgi.com Wed Oct 24 00:21:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9O7LQN30159 for lkcd-outgoing; Wed, 24 Oct 2001 00:21:26 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9O7LHD30147 for ; Wed, 24 Oct 2001 00:21:17 -0700 Received: from f02n15e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f9O7IEc488822 for ; Wed, 24 Oct 2001 17:18:14 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n15e.au.ibm.com (8.11.1m3/NCO v4.98) with SMTP id f9O7JmJ72074; Wed, 24 Oct 2001 17:19:48 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AEF.00284289 ; Wed, 24 Oct 2001 17:19:44 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: shaider@in.ibm.com To: lkcd@oss.sgi.com cc: vamsi@linux.ibm.com Message-ID: Date: Wed, 24 Oct 2001 13:13:54 +0530 Subject: dump_rle.o doesn't get compiled for new lkcd patch Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, I'm having some problems with the new lkcd-4.0 patch. The dump_rle.c file in the linux/drivers/dump directory is not getting compiled when I build the kernel. I have configured the kernel with both CONFIG_DUMP and CONFIG_DUMP_COMPRESS_RLE set to y (and not loadable modules). However, the dump_rle.o never gets built. When I try to run the kernel with the new lkcd-4.0kernel patch, I get a failure in lkcd_config.c as it does the ioctl to check for compression. This is because dump_rle.o has not been linked into the kernel. I tried creating a rule in the drivers/dump/Makefile to force the building of dump_rle.o. This caused the compile to happen but the dump_rle.o was never linked into the kernel. Any clues/fix for the same. thanks and regards... Nadeem IBM Software Lab, India From owner-lkcd@oss.sgi.com Wed Oct 24 00:54:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9O7s8030972 for lkcd-outgoing; Wed, 24 Oct 2001 00:54:08 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9O7s2D30965 for ; Wed, 24 Oct 2001 00:54:02 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id CAA128836; Wed, 24 Oct 2001 02:51:25 -0500 Received: from vamsiks.in.ibm.com (vamsiks.in.ibm.com [9.186.133.18]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9O7ron34704; Wed, 24 Oct 2001 03:53:51 -0400 Received: (from vamsi@localhost) by vamsiks.in.ibm.com (8.11.2/8.11.2) id f9O8J5022366; Wed, 24 Oct 2001 13:49:05 +0530 Date: Wed, 24 Oct 2001 13:49:05 +0530 From: "Vamsi Krishna S ." To: shaider@in.ibm.com Cc: lkcd , "Matt D. Robinson" Subject: Re: dump_rle.o doesn't get compiled for new lkcd patch Message-ID: <20011024134905.A22363@in.ibm.com> Reply-To: vamsi@in.ibm.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from shaider@in.ibm.com on Wed, Oct 24, 2001 at 01:13:54PM +0530 Sender: owner-lkcd@oss.sgi.com Precedence: bulk The patch given below should fix it. Matt, If it is okay with you, I will check this in. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi@in.ibm.com On Wed, Oct 24, 2001 at 01:13:54PM +0530, shaider@in.ibm.com wrote: > I'm having some problems with the new lkcd-4.0 patch. The dump_rle.c > file in the linux/drivers/dump directory is not getting compiled when I > build the kernel. I have configured the kernel with both CONFIG_DUMP and > CONFIG_DUMP_COMPRESS_RLE set to y (and not loadable modules). However, the > dump_rle.o never gets built. > -- diff -urN -X dontdiff lkcd_cvs_orig/2.4/Makefile lkcd_cvs/2.4/Makefile --- lkcd_cvs_orig/2.4/Makefile Mon Sep 24 15:09:00 2001 +++ lkcd_cvs/2.4/Makefile Wed Oct 24 13:39:01 2001 @@ -146,7 +146,7 @@ DRIVERS-$(CONFIG_ARCNET) += drivers/net/arcnet/arcnetdrv.o DRIVERS-$(CONFIG_ATM) += drivers/atm/atm.o DRIVERS-$(CONFIG_IDE) += drivers/ide/idedriver.o -DRIVERS-$(CONFIG_DUMP) += drivers/dump/dump.o +DRIVERS-$(CONFIG_DUMP) += drivers/dump/dumpdrv.o DRIVERS-$(CONFIG_SCSI) += drivers/scsi/scsidrv.o DRIVERS-$(CONFIG_FUSION_BOOT) += drivers/message/fusion/fusion.o DRIVERS-$(CONFIG_IEEE1394) += drivers/ieee1394/ieee1394drv.o diff -urN -X dontdiff lkcd_cvs_orig/2.4/drivers/dump/Makefile lkcd_cvs/2.4/drivers/dump/Makefile --- lkcd_cvs_orig/2.4/drivers/dump/Makefile Mon Sep 24 15:49:56 2001 +++ lkcd_cvs/2.4/drivers/dump/Makefile Wed Oct 24 13:39:40 2001 @@ -6,15 +6,16 @@ # the dump directory. # +O_TARGET := dumpdrv.o export-objs := list-multi := dump.o dump-objs := dump_base.o # get the base dump module and compression modules out of the way +obj-$(CONFIG_DUMP) += dump.o obj-$(CONFIG_DUMP_COMPRESS_RLE) += dump_rle.o obj-$(CONFIG_DUMP_COMPRESS_GZIP) += dump_gzip.o -obj-$(CONFIG_DUMP) += dump.o # now deal with each individual architecture. ifeq ($(ARCH),i386) @@ -29,7 +30,7 @@ dump-objs += dump_ia64.o endif +include $(TOPDIR)/Rules.make + dump.o: $(dump-objs) $(LD) -r -o $@ $(dump-objs) - -include $(TOPDIR)/Rules.make From owner-lkcd@oss.sgi.com Wed Oct 24 04:11:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9OBB5n06046 for lkcd-outgoing; Wed, 24 Oct 2001 04:11:05 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9OBAwD06042 for ; Wed, 24 Oct 2001 04:10:58 -0700 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Subject: RE: Which first? config or save Date: Wed, 24 Oct 2001 07:10:57 -0400 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE097691E15@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Which first? config or save Thread-Index: AcFcCcZ3SUq2Q/8LRjSG/DG8NM+CAQAcl01Q From: "Tony Dziedzic" To: "James Washer" Cc: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9OBAxD06043 Sender: owner-lkcd@oss.sgi.com Precedence: bulk I'm not aware of any dependencies in lcrash itself on the configuration parameters set by "/sbin/lkcd config". However, doesn't "/sbin/lkcd save" rely on some values set by the config step? Perhaps I missed something in the /sbin/lkcd script processing. Tony > -----Original Message----- > From: James Washer [mailto:washer@us.ibm.com] > Sent: Tuesday, October 23, 2001 5:26 PM > To: Tony Dziedzic > Cc: lkcd@oss.sgi.com > Subject: RE: Which first? config or save > > > > Tony, > > Are you sure that "Save relies on configuration parameters > set by config"?? > > I've looked ( briefly ) at the dump save (lcrash) and don't > see that it > requires > anything from the config step. > > > What did I miss? > > - jim > > > > "Tony Dziedzic" @oss.sgi.com on 10/23/2001 > 11:11:51 AM > > Sent by: owner-lkcd@oss.sgi.com > > > To: James Washer/Beaverton/IBM@IBMUS, > cc: > Subject: RE: Which first? config or save > > > > Correct order is in the sysinit patch: config first, then save. (Save > relies on configuration parameters set by config.) > > Tony Dziedzic > Storigen Systems, Inc. > > > -----Original Message----- > > From: James Washer [mailto:washer@us.ibm.com] > > Sent: Tuesday, October 23, 2001 1:15 PM > > To: lkcd@oss.sgi.com > > Subject: Which first? config or save > > > > > > The patch files for sysinit place '/sbin/lkcd config' before > > '/sbin/lkcd > > save', > > however README.lkcd implies the opposite order. > > > > Which is correct? > > > > > > > > From owner-lkcd@oss.sgi.com Wed Oct 24 07:53:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9OEr6U12168 for lkcd-outgoing; Wed, 24 Oct 2001 07:53:06 -0700 Received: from e32.bld.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9OEqtD12159 for ; Wed, 24 Oct 2001 07:52:55 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e32.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id KAA80190; Wed, 24 Oct 2001 10:50:18 -0400 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9OEpVP156620; Wed, 24 Oct 2001 08:51:31 -0600 Importance: Normal Subject: RE: Which first? config or save To: "Tony Dziedzic" Cc: X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Wed, 24 Oct 2001 07:47:26 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/24/2001 08:51:31 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk The config_lkcd function in /sbin/lkcd checks to make sure things are sane, then it sets the panic_timeout, and finally calls /sbin/lkcd_config. I've not looked to deeply at /sbin/lkcd_config, but my guess/belief is that it just issues ioctls against /dev/dump to inform the kernel where to dump, etc Looking at the "/sbin/lkcd save" side of things, I don't think any of the 'config' side steps are a prereq. This is further evidenced by the fact that on my system, I do the save first, and have been successfully collecting dumps. So, I think my question still stands... Which one is supposed to happen first?.. And once we decide, we need to either fix the rc.sysinit scripts, and/or fix the README.lkcd. - jim p.s. I hope I'm not bothering people with these'details'. My goal, is to have lkcd easily installable by end-users. Confusion in the documentation will ( I feel) cause users to shy away. "Tony Dziedzic" @oss.sgi.com on 10/24/2001 04:10:57 AM Sent by: owner-lkcd@oss.sgi.com To: James Washer/Beaverton/IBM@IBMUS cc: Subject: RE: Which first? config or save I'm not aware of any dependencies in lcrash itself on the configuration parameters set by "/sbin/lkcd config". However, doesn't "/sbin/lkcd save" rely on some values set by the config step? Perhaps I missed something in the /sbin/lkcd script processing. Tony > -----Original Message----- > From: James Washer [mailto:washer@us.ibm.com] > Sent: Tuesday, October 23, 2001 5:26 PM > To: Tony Dziedzic > Cc: lkcd@oss.sgi.com > Subject: RE: Which first? config or save > > > > Tony, > > Are you sure that "Save relies on configuration parameters > set by config"?? > > I've looked ( briefly ) at the dump save (lcrash) and don't > see that it > requires > anything from the config step. > > > What did I miss? > > - jim > > > > "Tony Dziedzic" @oss.sgi.com on 10/23/2001 > 11:11:51 AM > > Sent by: owner-lkcd@oss.sgi.com > > > To: James Washer/Beaverton/IBM@IBMUS, > cc: > Subject: RE: Which first? config or save > > > > Correct order is in the sysinit patch: config first, then save. (Save > relies on configuration parameters set by config.) > > Tony Dziedzic > Storigen Systems, Inc. > > > -----Original Message----- > > From: James Washer [mailto:washer@us.ibm.com] > > Sent: Tuesday, October 23, 2001 1:15 PM > > To: lkcd@oss.sgi.com > > Subject: Which first? config or save > > > > > > The patch files for sysinit place '/sbin/lkcd config' before > > '/sbin/lkcd > > save', > > however README.lkcd implies the opposite order. > > > > Which is correct? > > > > > > > > From owner-lkcd@oss.sgi.com Thu Oct 25 03:32:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9PAWdS00392 for lkcd-outgoing; Thu, 25 Oct 2001 03:32:39 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9PAWZD00389 for ; Thu, 25 Oct 2001 03:32:35 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id FAA26186 for ; Thu, 25 Oct 2001 05:29:57 -0500 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9PAWPU54830 for ; Thu, 25 Oct 2001 06:32:26 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f9PAT4c21590 for lkcd@oss.sgi.com; Thu, 25 Oct 2001 15:59:04 +0530 Date: Thu, 25 Oct 2001 15:59:03 +0530 From: Bharata B Rao To: lkcd@oss.sgi.com Subject: capturing cpu states on SMP Message-ID: <20011025155903.C21306@in.ibm.com> Reply-To: bharata@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello, This note is just a heads up to avoid duplicating our efforts. We are working on capturing the registers and stack on all the cpus at the time of dumping. This has been found to be crucial to debug problems where some of the cpus on an SMP are hung (executing a tight loop, interrupts disabled). We have this working in the kernel side. We have also added a command to display the saved registers in the lcrash. We need to add some bits to lcrash so that it can look at the right (saved) stack when back tracing. Comments? -- Crash Dump Team, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Thu Oct 25 12:38:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9PJcWf29229 for lkcd-outgoing; Thu, 25 Oct 2001 12:38:32 -0700 Received: from guzzi.amazon.com (guzzi.amazon.com [209.191.164.151]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9PJcTD29223 for ; Thu, 25 Oct 2001 12:38:29 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by guzzi.amazon.com (Postfix) with ESMTP id 1995E26B; Thu, 25 Oct 2001 12:38:26 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id EBB074805A; Thu, 25 Oct 2001 12:38:25 -0700 (PDT) From: "Monty Vanderbilt" To: , Subject: RE: capturing cpu states on SMP Date: Thu, 25 Oct 2001 12:38:25 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) In-Reply-To: <20011025155903.C21306@in.ibm.com> X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Importance: Normal Sender: owner-lkcd@oss.sgi.com Precedence: bulk Great idea! Why is it necessary to capture the stacks? Those pages should already be in the memory dump. With the registers you should be able to seed the backtrace for any cpu. -----Original Message----- From: owner-lkcd@oss.sgi.com [mailto:owner-lkcd@oss.sgi.com]On Behalf Of Bharata B Rao Sent: Thursday, October 25, 2001 3:29 AM To: lkcd@oss.sgi.com Subject: capturing cpu states on SMP Hello, This note is just a heads up to avoid duplicating our efforts. We are working on capturing the registers and stack on all the cpus at the time of dumping. This has been found to be crucial to debug problems where some of the cpus on an SMP are hung (executing a tight loop, interrupts disabled). We have this working in the kernel side. We have also added a command to display the saved registers in the lcrash. We need to add some bits to lcrash so that it can look at the right (saved) stack when back tracing. Comments? -- Crash Dump Team, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Fri Oct 26 00:45:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9Q7jG213570 for lkcd-outgoing; Fri, 26 Oct 2001 00:45:16 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9Q7jA013567 for ; Fri, 26 Oct 2001 00:45:10 -0700 Received: from alacritech.com ([10.1.10.35]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9Q7jMK08698; Fri, 26 Oct 2001 00:45:23 -0700 Message-ID: <3BD913AA.6764287F@alacritech.com> Date: Fri, 26 Oct 2001 00:41:30 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: James Washer CC: Tony Dziedzic , lkcd@oss.sgi.com Subject: Re: Which first? config or save References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk 'lkcd config' followed by 'lkcd save' was the order required a long time ago, due to the way the dump parameters were set. /proc/sys/vmdump/??* were used in the save case. Today, that isn't the case. When 5.0 comes out, dump devices will have independent methods of retrieval based on their configuration. As long as that data doesn't change between configuration and retrieval, it won't be a problem. I can remove that sentence. It's nice to see someone re-validating the documentation based on the current set of code. :) My biggest concern is re-writing everything now, as with 5.0 it's just going to change again to allow for multiple dump devices. Either way, it should be done. The most important thing is to configure/save the dump before swap space is enabled, otherwise you could potentially write over the dump if you're using swap space for storage of a dump. Thanks, guys. --Matt James Washer wrote: > > Tony, > > Are you sure that "Save relies on configuration parameters set by config"?? > > I've looked ( briefly ) at the dump save (lcrash) and don't see that it > requires > anything from the config step. > > What did I miss? > > - jim > > "Tony Dziedzic" @oss.sgi.com on 10/23/2001 > 11:11:51 AM > > Sent by: owner-lkcd@oss.sgi.com > > To: James Washer/Beaverton/IBM@IBMUS, > cc: > Subject: RE: Which first? config or save > > Correct order is in the sysinit patch: config first, then save. (Save > relies on configuration parameters set by config.) > > Tony Dziedzic > Storigen Systems, Inc. > > > -----Original Message----- > > From: James Washer [mailto:washer@us.ibm.com] > > Sent: Tuesday, October 23, 2001 1:15 PM > > To: lkcd@oss.sgi.com > > Subject: Which first? config or save > > > > > > The patch files for sysinit place '/sbin/lkcd config' before > > '/sbin/lkcd > > save', > > however README.lkcd implies the opposite order. > > > > Which is correct? > > > > From owner-lkcd@oss.sgi.com Fri Oct 26 00:48:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9Q7mNn13654 for lkcd-outgoing; Fri, 26 Oct 2001 00:48:23 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9Q7mG013646 for ; Fri, 26 Oct 2001 00:48:16 -0700 Received: from alacritech.com ([10.1.10.35]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9Q7l0K08714; Fri, 26 Oct 2001 00:47:00 -0700 Message-ID: <3BD9140C.941A1D4F@alacritech.com> Date: Fri, 26 Oct 2001 00:43:08 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: vamsi@in.ibm.com CC: shaider@in.ibm.com, lkcd Subject: Re: dump_rle.o doesn't get compiled for new lkcd patch References: <20011024134905.A22363@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Before you check it in ... is this fixed with the config.in modification we just made? --Matt "Vamsi Krishna S ." wrote: > > The patch given below should fix it. > > Matt, > > If it is okay with you, I will check this in. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi@in.ibm.com > > On Wed, Oct 24, 2001 at 01:13:54PM +0530, shaider@in.ibm.com wrote: > > > I'm having some problems with the new lkcd-4.0 patch. The dump_rle.c > > file in the linux/drivers/dump directory is not getting compiled when I > > build the kernel. I have configured the kernel with both CONFIG_DUMP and > > CONFIG_DUMP_COMPRESS_RLE set to y (and not loadable modules). However, the > > dump_rle.o never gets built. > > > > -- > > diff -urN -X dontdiff lkcd_cvs_orig/2.4/Makefile lkcd_cvs/2.4/Makefile > --- lkcd_cvs_orig/2.4/Makefile Mon Sep 24 15:09:00 2001 > +++ lkcd_cvs/2.4/Makefile Wed Oct 24 13:39:01 2001 > @@ -146,7 +146,7 @@ > DRIVERS-$(CONFIG_ARCNET) += drivers/net/arcnet/arcnetdrv.o > DRIVERS-$(CONFIG_ATM) += drivers/atm/atm.o > DRIVERS-$(CONFIG_IDE) += drivers/ide/idedriver.o > -DRIVERS-$(CONFIG_DUMP) += drivers/dump/dump.o > +DRIVERS-$(CONFIG_DUMP) += drivers/dump/dumpdrv.o > DRIVERS-$(CONFIG_SCSI) += drivers/scsi/scsidrv.o > DRIVERS-$(CONFIG_FUSION_BOOT) += drivers/message/fusion/fusion.o > DRIVERS-$(CONFIG_IEEE1394) += drivers/ieee1394/ieee1394drv.o > diff -urN -X dontdiff lkcd_cvs_orig/2.4/drivers/dump/Makefile lkcd_cvs/2.4/drivers/dump/Makefile > --- lkcd_cvs_orig/2.4/drivers/dump/Makefile Mon Sep 24 15:49:56 2001 > +++ lkcd_cvs/2.4/drivers/dump/Makefile Wed Oct 24 13:39:40 2001 > @@ -6,15 +6,16 @@ > # the dump directory. > # > > +O_TARGET := dumpdrv.o > export-objs := > > list-multi := dump.o > dump-objs := dump_base.o > > # get the base dump module and compression modules out of the way > +obj-$(CONFIG_DUMP) += dump.o > obj-$(CONFIG_DUMP_COMPRESS_RLE) += dump_rle.o > obj-$(CONFIG_DUMP_COMPRESS_GZIP) += dump_gzip.o > -obj-$(CONFIG_DUMP) += dump.o > > # now deal with each individual architecture. > ifeq ($(ARCH),i386) > @@ -29,7 +30,7 @@ > dump-objs += dump_ia64.o > endif > > +include $(TOPDIR)/Rules.make > + > dump.o: $(dump-objs) > $(LD) -r -o $@ $(dump-objs) > - > -include $(TOPDIR)/Rules.make From owner-lkcd@oss.sgi.com Fri Oct 26 00:50:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9Q7oTH13727 for lkcd-outgoing; Fri, 26 Oct 2001 00:50:29 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9Q7oQ013722 for ; Fri, 26 Oct 2001 00:50:26 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f9Q7smh18193; Fri, 26 Oct 2001 00:54:48 -0700 Date: Fri, 26 Oct 2001 00:54:48 -0700 (PDT) From: "Matt D. Robinson" To: Tony Dziedzic cc: Subject: Re: [PATCH] LKCD causes BUG in highmem.h when dumping to a large memory system with HIGHMEM_DEBUG enabled In-Reply-To: <88D2015B3AF7BF4B91272EC25A9FE09769F315@XCHANGESERVER.storigen.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 23 Oct 2001, Tony Dziedzic wrote: |>This patch address a BUG in highmem.h that LKCD triggers when dumping to |>a large memory system with HIGHMEM_DEBUG enabled. The BUG occurs the |>second time dump_add_page is called for a high memory page, when |>dump_add_page calls kmap_atomic. If HIGHMEM_DEBUG is enabled |>kmap_atomic verifies that the caller has released the temporary mapping |>via a call to kunmap_atomic. Since this hasn't happened, highmem.h |>BUGs. |> |>The patch adds a call to kunmap_atomic at the end of dump_add_page. |>Note that the patch is not required unless your kernel has HIGHMEM_DEBUG |>enabled. There is a slight performance hit associated with this patch |>(due to kunmap_atomic's TLB flush). Those who object to the performance |>hit may choose to enclose the #ifdef CONFIG_HIGHMEM conditional in a |>#ifdef HIGHMEM_DEBUG conditional. |> |>FYI, |>Tony Dziedzic |>Storigen Systems, Inc. Checked in. --Matt From owner-lkcd@oss.sgi.com Fri Oct 26 10:25:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9QHPNu29571 for lkcd-outgoing; Fri, 26 Oct 2001 10:25:23 -0700 Received: from e34.bld.us.ibm.com (e34.co.us.ibm.com [32.97.110.132]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9QHPC029564 for ; Fri, 26 Oct 2001 10:25:12 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e34.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA20504; Fri, 26 Oct 2001 13:22:34 -0400 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9QHOkN185702; Fri, 26 Oct 2001 11:24:46 -0600 Importance: Normal Subject: Re: Which first? config or save To: "Matt D. Robinson" Cc: Tony Dziedzic , lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Fri, 26 Oct 2001 10:22:26 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/26/2001 11:24:46 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk I kinda suspected the order didn't matter ( by testing it both ways ;-) One concern I still have though, is the requirements for saving the dump before enabling swap. I cannot say for sure that this is an issue with linux, but on other unices I've used, particularly on multi proc boxes with LOTS of filesystems, there will be enough (parallel) fsck activity after a crash, to cause swapping... In other words, many systems add swap BEFORE fscking the disks ( i believe rh71 does this) ... and of course you need to fsck /var/log/dump, before you can mount it, and therefore before you can save the dump... Catch-22 Of course, there are LOTS of workarounds.. like keeping /var/log/dump unmounted during normal operations, so that it is CLEAN after a crash and can be mounted, o r dumping to a secondary ( or unused ) swap partition, etc etc... I only bring this up here in the spirit of robustness, as an issue we need to be aware of. - jim "Matt D. Robinson" @oss.sgi.com on 10/26/2001 12:41:30 AM Sent by: owner-lkcd@oss.sgi.com To: James Washer/Beaverton/IBM@IBMUS cc: Tony Dziedzic , lkcd@oss.sgi.com Subject: Re: Which first? config or save 'lkcd config' followed by 'lkcd save' was the order required a long time ago, due to the way the dump parameters were set. /proc/sys/vmdump/??* were used in the save case. Today, that isn't the case. When 5.0 comes out, dump devices will have independent methods of retrieval based on their configuration. As long as that data doesn't change between configuration and retrieval, it won't be a problem. I can remove that sentence. It's nice to see someone re-validating the documentation based on the current set of code. :) My biggest concern is re-writing everything now, as with 5.0 it's just going to change again to allow for multiple dump devices. Either way, it should be done. The most important thing is to configure/save the dump before swap space is enabled, otherwise you could potentially write over the dump if you're using swap space for storage of a dump. Thanks, guys. --Matt James Washer wrote: > > Tony, > > Are you sure that "Save relies on configuration parameters set by config"?? > > I've looked ( briefly ) at the dump save (lcrash) and don't see that it > requires > anything from the config step. > > What did I miss? > > - jim > > "Tony Dziedzic" @oss.sgi.com on 10/23/2001 > 11:11:51 AM > > Sent by: owner-lkcd@oss.sgi.com > > To: James Washer/Beaverton/IBM@IBMUS, > cc: > Subject: RE: Which first? config or save > > Correct order is in the sysinit patch: config first, then save. (Save > relies on configuration parameters set by config.) > > Tony Dziedzic > Storigen Systems, Inc. > > > -----Original Message----- > > From: James Washer [mailto:washer@us.ibm.com] > > Sent: Tuesday, October 23, 2001 1:15 PM > > To: lkcd@oss.sgi.com > > Subject: Which first? config or save > > > > > > The patch files for sysinit place '/sbin/lkcd config' before > > '/sbin/lkcd > > save', > > however README.lkcd implies the opposite order. > > > > Which is correct? > > > > From owner-lkcd@oss.sgi.com Fri Oct 26 10:46:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9QHkb530233 for lkcd-outgoing; Fri, 26 Oct 2001 10:46:37 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9QHkW030230 for ; Fri, 26 Oct 2001 10:46:32 -0700 Received: from westrelay02.boulder.ibm.com (westrelay02.boulder.ibm.com [9.99.140.23]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id NAA84260 for ; Fri, 26 Oct 2001 13:43:59 -0400 Received: from d03nm038.boulder.ibm.com (d03nm038.boulder.ibm.com [9.99.140.38]) by westrelay02.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9QHkUN173576 for ; Fri, 26 Oct 2001 11:46:30 -0600 Importance: Normal Subject: RE: capturing cpu states on SMP To: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.4 June 8, 2000 Message-ID: From: "James Washer" Date: Fri, 26 Oct 2001 10:42:09 -0700 X-MIMETrack: Serialize by Router on D03NM038/03/M/IBM(Release 5.0.8 |June 18, 2001) at 10/26/2001 11:46:29 AM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk I'm interested in hearing HOW you capture the register information from processors "(executing a tight loop, interrupts disabled)" Care to let me (us) know? - jim "Monty Vanderbilt" @oss.sgi.com on 10/25/2001 12:38:25 PM Sent by: owner-lkcd@oss.sgi.com To: bharata@linux.ibm.com, cc: Subject: RE: capturing cpu states on SMP Great idea! Why is it necessary to capture the stacks? Those pages should already be in the memory dump. With the registers you should be able to seed the backtrace for any cpu. -----Original Message----- From: owner-lkcd@oss.sgi.com [mailto:owner-lkcd@oss.sgi.com]On Behalf Of Bharata B Rao Sent: Thursday, October 25, 2001 3:29 AM To: lkcd@oss.sgi.com Subject: capturing cpu states on SMP Hello, This note is just a heads up to avoid duplicating our efforts. We are working on capturing the registers and stack on all the cpus at the time of dumping. This has been found to be crucial to debug problems where some of the cpus on an SMP are hung (executing a tight loop, interrupts disabled). We have this working in the kernel side. We have also added a command to display the saved registers in the lcrash. We need to add some bits to lcrash so that it can look at the right (saved) stack when back tracing. Comments? -- Crash Dump Team, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Fri Oct 26 12:05:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9QJ5Gr03812 for lkcd-outgoing; Fri, 26 Oct 2001 12:05:16 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9QJ55003809 for ; Fri, 26 Oct 2001 12:05:05 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9QJ5CK17221; Fri, 26 Oct 2001 12:05:12 -0700 Message-ID: <3BD9B525.F137678E@alacritech.com> Date: Fri, 26 Oct 2001 12:10:29 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: James Washer CC: Tony Dziedzic , lkcd@oss.sgi.com Subject: Re: Which first? config or save References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk James Washer wrote: > > I kinda suspected the order didn't matter ( by testing it both ways ;-) > > One concern I still have though, is the requirements for saving the dump > before enabling swap. I cannot say for sure that this is an issue with > linux, but on other unices I've used, particularly on multi proc boxes with > LOTS of filesystems, there will be enough (parallel) fsck activity after a > crash, to cause swapping... In other words, many systems add swap BEFORE > fscking the disks ( i believe rh71 does this) ... and of course you need to > fsck /var/log/dump, before you can mount it, and therefore before you can > save the dump... Catch-22 This has been a problem for years in Unices. If you don't save the dump to a dedicated dump partition, you have the potential to lose it on the way up. The reason for moving the swapon is to make sure that Linux doesn't prove too agressive (which it has been in the past) in using swap when it may not need to. > Of course, there are LOTS of workarounds.. like keeping /var/log/dump > unmounted during normal operations, so that it is CLEAN after a crash and > can be mounted, o r dumping to a secondary ( or unused ) swap partition, > etc etc... Right. All these are options and are usable by any customer. > I only bring this up here in the spirit of robustness, as an issue we need > to be aware of. Yes, we're very aware of this problem ... :) While at SGI, this was an issue we ran into all the time (before XFS, when efs could use up enough memory to touch swap). We didn't see it that often, but it did occur on occasion. One potential item would be to start writing at the _end_ of swap and then moving towards the head of the partition. It's ugly and wierd, but it's an option. It doesn't buy you a whole lot if you really, really have to swap a lot. Most systems these days have sufficient memory during the boot-up process to avoid this problem. Those systems that don't are normally large enough to run a log-volume based filesystem. Those caught in between are normally going to be okay, but there is the very uncommon case where they will be caught by this. --Matt > - jim > > "Matt D. Robinson" @oss.sgi.com on 10/26/2001 > 12:41:30 AM > > Sent by: owner-lkcd@oss.sgi.com > > To: James Washer/Beaverton/IBM@IBMUS > cc: Tony Dziedzic , lkcd@oss.sgi.com > Subject: Re: Which first? config or save > > 'lkcd config' followed by 'lkcd save' was the order required > a long time ago, due to the way the dump parameters were set. > /proc/sys/vmdump/??* were used in the save case. > > Today, that isn't the case. When 5.0 comes out, dump devices > will have independent methods of retrieval based on their > configuration. As long as that data doesn't change between > configuration and retrieval, it won't be a problem. > > I can remove that sentence. It's nice to see someone > re-validating the documentation based on the current > set of code. :) My biggest concern is re-writing everything > now, as with 5.0 it's just going to change again to allow > for multiple dump devices. Either way, it should be done. > > The most important thing is to configure/save the dump before > swap space is enabled, otherwise you could potentially write > over the dump if you're using swap space for storage of a dump. > > Thanks, guys. > > --Matt > > James Washer wrote: > > > > Tony, > > > > Are you sure that "Save relies on configuration parameters set by > config"?? > > > > I've looked ( briefly ) at the dump save (lcrash) and don't see that it > > requires > > anything from the config step. > > > > What did I miss? > > > > - jim > > > > "Tony Dziedzic" @oss.sgi.com on 10/23/2001 > > 11:11:51 AM > > > > Sent by: owner-lkcd@oss.sgi.com > > > > To: James Washer/Beaverton/IBM@IBMUS, > > cc: > > Subject: RE: Which first? config or save > > > > Correct order is in the sysinit patch: config first, then save. (Save > > relies on configuration parameters set by config.) > > > > Tony Dziedzic > > Storigen Systems, Inc. > > > > > -----Original Message----- > > > From: James Washer [mailto:washer@us.ibm.com] > > > Sent: Tuesday, October 23, 2001 1:15 PM > > > To: lkcd@oss.sgi.com > > > Subject: Which first? config or save > > > > > > > > > The patch files for sysinit place '/sbin/lkcd config' before > > > '/sbin/lkcd > > > save', > > > however README.lkcd implies the opposite order. > > > > > > Which is correct? > > > > > > From owner-lkcd@oss.sgi.com Fri Oct 26 13:14:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9QKEhV07327 for lkcd-outgoing; Fri, 26 Oct 2001 13:14:43 -0700 Received: from XCHANGESERVER.storigen.com ([65.193.106.66]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9QKEb007323 for ; Fri, 26 Oct 2001 13:14:37 -0700 content-class: urn:content-classes:message Subject: Thoughts on LKCD and large memory systems MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Date: Fri, 26 Oct 2001 16:14:35 -0400 X-MimeOLE: Produced By Microsoft Exchange V6.0.4712.0 Message-ID: <88D2015B3AF7BF4B91272EC25A9FE09769F317@XCHANGESERVER.storigen.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Thoughts on LKCD and large memory systems Thread-Index: AcFeWtS42nrtvs9lRqy5FbeYmsbh8g== From: "Tony Dziedzic" To: Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by oss.sgi.com id f9QKEb007324 Sender: owner-lkcd@oss.sgi.com Precedence: bulk I've been testing LKCD release 4 on large memory systems and have run into a few problematic areas. I thought I'd throw them to the winds to stimulate some discussion. Issue 1: It takes a LONG time to dump a system with 4Gb of memory Even when using fairly fast disks and assuming something like 20 MB/sec as an optimal transfer rate, you're going to be looking at several minutes to write a crash dump file. In reality the throughput is much less since the data is not being streamed to the drive; my average time is hovering around ten minutes. This is with RLE compression. If you turn on GZIP compression multiply that time by a factor of three or four. In this type of configuration writing an uncompressed (or RLE-compressed) dump makes a lot more sense. The total elapsed time may still be an issue. Issue 2: It takes a LONG time to copy a crash dump for 4Gb of memory Copying the crash dump can take significantly longer than it took to write the crash dump in the first place. I've been seeing copy times that approach (or exceed!) an hour. Part of this is likely due to the relatively simple I/O loop in kl_cmp.c. It may be advisable to consider using a mechanism similar to the LKCD kernel code that writes the dump; i.e., use a large staging buffer to collect the individual pages and write the buffer in one swell foop. An x86 system with 4096-byte pages requires somewhere around one million pairs of write operations during the dump copy (a short write of about 26 bytes for the page header and up to 4096 bytes for the page data). Reducing the number of discrete I/O operations should help quite a bit. Here are a couple of thoughts I had: 1) Add another configurable parameter which tells lcrash the format in which a crash dump should be saved separately from how the dump is written by the kernel; e.g., DUMP_SAVE_COMPRESS. This would allow deferring the expensive GZIP algorithm until after the system has rebooted. 2) Allow the user to configure the system such that the dump copy would proceed in parallel with system restart. Today the execution of rc.sysinit stalls until lcrash finishes copying the crash dump. If the crash dump was written in the swap partition this could be problematic. One thought I had in this area would be to teach lcrash (and the swapper) to co-operatively use the swap pages. Initially all pages would be marked as unavailable, and as lcrash progresses through the swap partition it would release those pages to the swapper. Since the system load (and thus the need for swap space) (theoretically) increases over time, this staged release of swap space should help avoid cases where swap space was needed but lcrash was still busy copying a crash dump. Comments? Tony Dziedzic Storigen Systems, Inc. From owner-lkcd@oss.sgi.com Sat Oct 27 18:43:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9S1hU512151 for lkcd-outgoing; Sat, 27 Oct 2001 18:43:30 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9S1hS012148 for ; Sat, 27 Oct 2001 18:43:28 -0700 Received: from alacritech.com ([10.1.10.37]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9S1hTK02070 for ; Sat, 27 Oct 2001 18:43:29 -0700 Message-ID: <3BDB61D0.C0D40E07@alacritech.com> Date: Sat, 27 Oct 2001 18:39:28 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: Gzip compression code checked in ... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk I've checked in Tony's fixes to complete the gzip compression mechanism, along with a few changes to accomodate current's stack page. So, you should now be able to change your dump configuration parameters (/etc/sysconfig/dump) to allow for gzip compression (DUMP_COMPRESS=2). Feel free to test, I've done some testing myself already. I've also fixed a few other problems along the way, mostly minor modifications that had to get fixed. Assuming this (and a few more documentation changes) covers all the bases, I'm going to make a 4.1 release. Let me know, thanks, everyone. I think that release will be good enough to put out on linux-kernel. --Matt From owner-lkcd@oss.sgi.com Sun Oct 28 23:40:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9T7elx25534 for lkcd-outgoing; Sun, 28 Oct 2001 23:40:47 -0800 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9T7ee025530 for ; Sun, 28 Oct 2001 23:40:40 -0800 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id BAA165466; Mon, 29 Oct 2001 01:27:55 -0600 Received: from vamsiks.in.ibm.com (vamsiks.in.ibm.com [9.186.133.18]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v5.00) with ESMTP id f9T7ULM72376; Mon, 29 Oct 2001 02:30:22 -0500 Received: (from vamsi@localhost) by vamsiks.in.ibm.com (8.11.2/8.11.2) id f9T7u1X02410; Mon, 29 Oct 2001 13:26:01 +0530 Date: Mon, 29 Oct 2001 13:26:01 +0530 From: "Vamsi Krishna S ." To: "Matt D. Robinson" Cc: shaider@in.ibm.com, lkcd Subject: Re: dump_rle.o doesn't get compiled for new lkcd patch Message-ID: <20011029132601.A2400@in.ibm.com> Reply-To: vamsi@in.ibm.com References: <20011024134905.A22363@in.ibm.com> <3BD9140C.941A1D4F@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BD9140C.941A1D4F@alacritech.com>; from yakker@alacritech.com on Fri, Oct 26, 2001 at 12:43:08AM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, I see that you have already checkin my patch. Yes, it is needed and is not affected by the config.in changes. The problem was that there was no target in the drivers/dump/Makefile that depends on dump_rle.o, so it would never have been compiled, irrespective of the config settings. The patch I sent previously added a top level target dumpdrv.o that includes all the .o files ever built in drivers/dump directory (ofcourse based on config settings). Thanks for checking the patch in yourself :-) Regards, Vamsi. On Fri, Oct 26, 2001 at 12:43:08AM -0700, Matt D. Robinson wrote: > Before you check it in ... is this fixed with the config.in > modification we just made? > > --Matt > > "Vamsi Krishna S ." wrote: > > > > The patch given below should fix it. > > > > Matt, > > > > If it is okay with you, I will check this in. > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Lab, Bangalore. > > Ph: +91 80 5262355 Extn: 3959 > > Internet: vamsi@in.ibm.com > > > > On Wed, Oct 24, 2001 at 01:13:54PM +0530, shaider@in.ibm.com wrote: > > > > > I'm having some problems with the new lkcd-4.0 patch. The dump_rle.c > > > file in the linux/drivers/dump directory is not getting compiled when I > > > build the kernel. I have configured the kernel with both CONFIG_DUMP and > > > CONFIG_DUMP_COMPRESS_RLE set to y (and not loadable modules). However, the > > > dump_rle.o never gets built. > > > > > -- Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi@in.ibm.com From owner-lkcd@oss.sgi.com Mon Oct 29 00:02:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9T82LY25923 for lkcd-outgoing; Mon, 29 Oct 2001 00:02:21 -0800 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9T82B025920 for ; Mon, 29 Oct 2001 00:02:11 -0800 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id AAA06185 for ; Mon, 29 Oct 2001 00:01:51 -0800 (PST) mail_from (vamsi_krishna@in.ibm.com) Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f9TIjVc487822; Tue, 30 Oct 2001 04:45:31 +1000 Received: from d23m0060.in.ibm.com (d23m0060.in.ibm.com [9.184.199.175]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f9T7loP24544; Mon, 29 Oct 2001 18:47:51 +1100 Importance: Normal Subject: RE: capturing cpu states on SMP To: "James Washer" Cc: lkcd@oss.sgi.com, mvb@amazon.com, bharata@in.ibm.com X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: "S Vamsikrishna" Date: Mon, 29 Oct 2001 13:20:00 +0530 X-MIMETrack: Serialize by Router on d23m0060/23/M/IBM(Release 5.0.8 |June 18, 2001) at 29/10/2001 01:19:08 PM MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk We send an NMI-class IPI to other cpus to capture the registers and stack. This is the only guaranteed way to ensure that other cpus respond. If they don't respond to NMI, there is absolutely nothing we can do. We don't wait around for our IPI to be handled so even in that case, we don't hang the dump process. We need to capture the stack, even though we would prefer not to. It is an additional compilication we would gladly get rid of. The reason being that the stack could change between the time the registers are captured and the time that page is written out in the dumping process. The time between these two events could be rather long when we do deferred dumps (1). The chages in the stack could be so significant as to render backtracing impossible/totally inaccurate. (1) Deferred dumps: when we desire non-disruptive dumps of a running system for capturing snapshots, we have to ensure that the actual process of writing the dump happens from a known location in the kernel where we have not held any locks reqd in the dump-writing process or disabled interrupts or running inside the disk driver or from a Dynamic Probes' probe handler, where the probe could be from just about any location in the kernel. In these cases what we plan to do is capture the system state (registers/stack for backtracing purposes) immediately on a dump request and wake up a dump daemon (kernel thread) which will do the actual dump writing when it is scheduled. Regards, Crash Dump Team, Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi_krishna@in.ibm.com "James Washer" on 10/26/2001 11:12:09 PM Please respond to "James Washer" To: lkcd@oss.sgi.com cc: (bcc: S Vamsikrishna/India/IBM) Subject: RE: capturing cpu states on SMP I'm interested in hearing HOW you capture the register information from processors "(executing a tight loop, interrupts disabled)" Care to let me (us) know? - jim "Monty Vanderbilt" @oss.sgi.com on 10/25/2001 12:38:25 PM Sent by: owner-lkcd@oss.sgi.com To: bharata@linux.ibm.com, cc: Subject: RE: capturing cpu states on SMP Great idea! Why is it necessary to capture the stacks? Those pages should already be in the memory dump. With the registers you should be able to seed the backtrace for any cpu. -----Original Message----- From: owner-lkcd@oss.sgi.com [mailto:owner-lkcd@oss.sgi.com]On Behalf Of Bharata B Rao Sent: Thursday, October 25, 2001 3:29 AM To: lkcd@oss.sgi.com Subject: capturing cpu states on SMP Hello, This note is just a heads up to avoid duplicating our efforts. We are working on capturing the registers and stack on all the cpus at the time of dumping. This has been found to be crucial to debug problems where some of the cpus on an SMP are hung (executing a tight loop, interrupts disabled). We have this working in the kernel side. We have also added a command to display the saved registers in the lcrash. We need to add some bits to lcrash so that it can look at the right (saved) stack when back tracing. Comments? -- Crash Dump Team, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Mon Oct 29 17:49:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f9U1nsU29857 for lkcd-outgoing; Mon, 29 Oct 2001 17:49:54 -0800 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f9U1nm029854 for ; Mon, 29 Oct 2001 17:49:49 -0800 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.2/8.11.2) with ESMTP id f9U1ncK03205; Mon, 29 Oct 2001 17:49:38 -0800 Message-ID: <3BDE088C.A96AF1F1@alacritech.com> Date: Mon, 29 Oct 2001 17:55:24 -0800 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: vamsi@in.ibm.com CC: shaider@in.ibm.com, lkcd Subject: Re: dump_rle.o doesn't get compiled for new lkcd patch References: <20011024134905.A22363@in.ibm.com> <3BD9140C.941A1D4F@alacritech.com> <20011029132601.A2400@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Vamsi Krishna S ." wrote: > > Hi Matt, > > I see that you have already checkin my patch. Yes, it is needed > and is not affected by the config.in changes. The problem was > that there was no target in the drivers/dump/Makefile that depends > on dump_rle.o, so it would never have been compiled, irrespective > of the config settings. The patch I sent previously added a > top level target dumpdrv.o that includes all the .o files ever > built in drivers/dump directory (ofcourse based on config settings). > > Thanks for checking the patch in yourself :-) Yes, thanks for sending it along. :) I figured I'd check it in after validating what you put in -- yes, you're right, it was necessary (follows the scsidrv.o model). --Matt > Regards, > Vamsi. > > On Fri, Oct 26, 2001 at 12:43:08AM -0700, Matt D. Robinson wrote: > > Before you check it in ... is this fixed with the config.in > > modification we just made? > > > > --Matt > > > > "Vamsi Krishna S ." wrote: > > > > > > The patch given below should fix it. > > > > > > Matt, > > > > > > If it is okay with you, I will check this in. > > > > > > Vamsi Krishna S. > > > Linux Technology Center, > > > IBM Software Lab, Bangalore. > > > Ph: +91 80 5262355 Extn: 3959 > > > Internet: vamsi@in.ibm.com > > > > > > On Wed, Oct 24, 2001 at 01:13:54PM +0530, shaider@in.ibm.com wrote: > > > > > > > I'm having some problems with the new lkcd-4.0 patch. The dump_rle.c > > > > file in the linux/drivers/dump directory is not getting compiled when I > > > > build the kernel. I have configured the kernel with both CONFIG_DUMP and > > > > CONFIG_DUMP_COMPRESS_RLE set to y (and not loadable modules). However, the > > > > dump_rle.o never gets built. > > > > > > > > > -- > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi@in.ibm.com