From owner-lkcd@oss.sgi.com Mon Sep 3 02:22:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f839MAq21266 for lkcd-outgoing; Mon, 3 Sep 2001 02:22:10 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f839M4d21262 for ; Mon, 3 Sep 2001 02:22:04 -0700 Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f839JSo153952; Mon, 3 Sep 2001 19:19:29 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f839LrB34114; Mon, 3 Sep 2001 19:21:53 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABC.00337123 ; Mon, 3 Sep 2001 19:21:52 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: r1vamsi@in.ibm.com To: "Matt D. Robinson" cc: richard.schaal@intel.com, lkcd@oss.sgi.com Message-ID: Date: Mon, 3 Sep 2001 15:25:04 +0530 Subject: Re: LKCD + KDB ? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk When both KDB and LKCD patches are applied, we drop into KDB on an oops. dump_execute will be called after we exit the debugger. If all you want is to disable dump taking after exiting debugger, that is easy enough with editing the dump_okay flag from within the debugger (or add a kdb command to do this) as Matt points out. Assuming there is a good reason for wanting to take the dump from within the debugger, one should add a simple dump command to kdb, which will just call dump_execute with proper regs. What you could do today is to set eip to dump_execute from with in the kernel, editing the stack to push correct params :-) (not as hard as it sounds, really) However, the cleaner approach obviously is to add the kdb dump command, once we understand a little better why exactly would one want to dump from within the debugger (on an oops). Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: r1vamsi@in.ibm.com Please respond to "Matt D. Robinson" To: richard.schaal@intel.com cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) Subject: Re: LKCD + KDB ? Richard Schaal wrote: > > > My question is this - I have been a fan of the kernel debugger for some > time, and have had a bit of difficulty > resolving how to configure both capabilities into my kernel. I guess > what I'd like to have happen is to > have the system enter the debugger on an oops, then have the option of > dumping the system from the debugger, or > to dump the system automatically after the debugger is exited. There's no great way to do this right now. If in kdb you can set the field of 'dump_okay' field to FALSE, then reset it after dropping back from the debugger state, that'd be fine. I guess we could also add in something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, and when dump_execute() gets called, dump_kdb is checked, and if set to TRUE, resets it to FALSE. Then add a kdb command that sets the field for you ... Would that work? --Matt > What is your thinking on this? Did I goof something up in applying the > patches for the two features? > > Thanks, > Richard > > -- > Richard.Schaal@intel.com Intel Corporation > Ph: (408)765-1579 Richard Schaal > Mail Stop SC12-308 > 3600 Juliette Lane > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Mon Sep 3 07:07:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f83E7c008761 for lkcd-outgoing; Mon, 3 Sep 2001 07:07:38 -0700 Received: from exg.allot.com (mail.allot.com [199.203.223.202]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f83E7Zd08742 for ; Mon, 3 Sep 2001 07:07:36 -0700 Received: from allot.com (FELIX [172.16.1.37]) by exg.allot.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id RBFMTW0N; Mon, 3 Sep 2001 17:13:42 +0200 Message-ID: <3B938EAD.9C8D4E92@allot.com> Date: Mon, 03 Sep 2001 17:07:41 +0300 From: Felix Radensky Organization: Allot Communications Ltd. X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.19c i686) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: Using latest CVS sources Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Can someone please explain how can I use the latest CVS sources with kernel 2.2.19. Thanks in advance. Felix. From owner-lkcd@oss.sgi.com Tue Sep 4 00:28:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f847SNT12709 for lkcd-outgoing; Tue, 4 Sep 2001 00:28:23 -0700 Received: from fgwmail7.fujitsu.co.jp (fgwmail7.fujitsu.co.jp [192.51.44.37]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f847SKd12706 for ; Tue, 4 Sep 2001 00:28:20 -0700 Received: from m5.gw.fujitsu.co.jp by fgwmail7.fujitsu.co.jp (8.9.3/3.7W-MX0108-Fujitsu Gateway) id QAA23899 for ; Tue, 4 Sep 2001 16:28:14 +0900 (JST) (envelope-from naomi@pst.fujitsu.com) From: naomi@pst.fujitsu.com Received: from naomi.aoi.pst.fujitsu.com by m5.gw.fujitsu.co.jp (8.9.3/3.7W-0108-Fujitsu Domain Master) id QAA31473 for ; Tue, 4 Sep 2001 16:28:13 +0900 (envelope-from naomi@pst.fujitsu.com) Received: from localhost (IDENT:naomi@localhost [127.0.0.1]) by naomi.aoi.pst.fujitsu.com (8.9.3/8.9.3) with ESMTP id QAA16409 for ; Tue, 4 Sep 2001 16:27:53 +0900 To: lkcd@oss.sgi.com Subject: lcrash sub-commands line completion X-Mailer: Mew version 1.92.4 on Emacs 19.34 / Mule 2.3 (SUETSUMUHANA) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010904162753R.naomi@pst.fujitsu.com> Date: Tue, 04 Sep 2001 16:27:53 +0900 X-Dispatcher: imput version 980905(IM100) Lines: 34 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello. Recently, I think that lcrash should have "sub-commands line completion". Lcrash has many sub-commands. And almost sub-commands have parameters such as filename or symbol name which should be specified. The present lcrash cannot complete on sub-commands line. For this reason, we have to memorize sub-commands names and parameters exactly. It is very inconvenient. So I'll add completion capability to librl. I'm considering as follows. While editing sub-commands line, if TAB key is pressed, lcrash completes the line (or do something as bash does). Lcrash will complete on sub-commands names with behavior almost equivalent to bash. And I consider that parameters of sub-commands have different characteristic each other, I'll add the mechanism let you be able to make your own completion function. Using this mechanism, you can call the function that behaves as you want when TAB key is pressed. As the first phase, I will show the completion on sub-commands names by the middle of the month in September. And as the next phase, I will show the mechanism of sub-commands parameters completion with some sample source using it. Is anybody considering sub-commands line completion? Any comments and suggestions are welcomed. Naomi Haseo From owner-lkcd@oss.sgi.com Tue Sep 4 00:44:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f847ipV12960 for lkcd-outgoing; Tue, 4 Sep 2001 00:44:51 -0700 Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f847ikd12957 for ; Tue, 4 Sep 2001 00:44:46 -0700 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id JAA41950; Tue, 4 Sep 2001 09:44:32 +0200 Received: from d12ml004.de.ibm.com (d12ml004_cs0 [9.165.223.50]) by d12relay02.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f847hC4217940; Tue, 4 Sep 2001 09:43:12 +0200 Importance: Normal Subject: Re: lcrash sub-commands line completion To: naomi@pst.fujitsu.com Cc: lkcd@oss.sgi.com X-Mailer: Lotus Notes Release 5.0.3 March 21, 2000 Message-ID: From: "Michael Holzheu" Date: Tue, 4 Sep 2001 09:40:53 +0200 X-MIMETrack: Serialize by Router on D12ML004/12/M/IBM(Release 5.0.8 |June 18, 2001) at 04/09/2001 09:39:03 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Naomi, Great! Command completion is really a feature, which makes lcrash much more userfriendly! Michael ------------------------------------------------------------------------ Linux/390 Development Phone: +49-7031-16-2360, Bld 71032-06-109 Email: holzheu@de.ibm.com naomi@pst.fujitsu.com@oss.sgi.com on 09/04/2001 09:27:53 AM Please respond to naomi@pst.fujitsu.com Sent by: owner-lkcd@oss.sgi.com To: lkcd@oss.sgi.com cc: Subject: lcrash sub-commands line completion Hello. Recently, I think that lcrash should have "sub-commands line completion". Lcrash has many sub-commands. And almost sub-commands have parameters such as filename or symbol name which should be specified. The present lcrash cannot complete on sub-commands line. For this reason, we have to memorize sub-commands names and parameters exactly. It is very inconvenient. So I'll add completion capability to librl. I'm considering as follows. While editing sub-commands line, if TAB key is pressed, lcrash completes the line (or do something as bash does). Lcrash will complete on sub-commands names with behavior almost equivalent to bash. And I consider that parameters of sub-commands have different characteristic each other, I'll add the mechanism let you be able to make your own completion function. Using this mechanism, you can call the function that behaves as you want when TAB key is pressed. As the first phase, I will show the completion on sub-commands names by the middle of the month in September. And as the next phase, I will show the mechanism of sub-commands parameters completion with some sample source using it. Is anybody considering sub-commands line completion? Any comments and suggestions are welcomed. Naomi Haseo From owner-lkcd@oss.sgi.com Tue Sep 4 01:06:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8486Ag13342 for lkcd-outgoing; Tue, 4 Sep 2001 01:06:10 -0700 Received: from fgwmail7.fujitsu.co.jp (fgwmail7.fujitsu.co.jp [192.51.44.37]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84867d13338 for ; Tue, 4 Sep 2001 01:06:07 -0700 Received: from m5.gw.fujitsu.co.jp by fgwmail7.fujitsu.co.jp (8.9.3/3.7W-MX0108-Fujitsu Gateway) id RAA03736 for ; Tue, 4 Sep 2001 17:06:01 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from classic.aoi.pst.fujitsu.com by m5.gw.fujitsu.co.jp (8.9.3/3.7W-0108-Fujitsu Domain Master) id RAA14789 for ; Tue, 4 Sep 2001 17:06:00 +0900 (envelope-from m-kotani@pst.fujitsu.com) Received: from doll (doll.aoi.pst.fujitsu.com [172.23.72.214]) by classic.aoi.pst.fujitsu.com (8.9.3/8.9.3) with SMTP id RAA06417 for ; Tue, 4 Sep 2001 17:06:00 +0900 Message-ID: <006201c13518$869ce140$d64817ac@aoi.pst.fujitsu.com> From: "Masashige Kotani" To: Subject: multiple dump devices Date: Tue, 4 Sep 2001 17:06:39 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello. Nowadays, I think that the reliability of memory dumping: extracting memory as much as possible will be improved. LKCD uses "only one dump device" in process of memory dump. When it does not have enough capacity for memory dump it is not able to be used by some reasons, memory dumping is failure. - When additional memory devices are attached, The capacity of the dump device must be increased. - To avoid failing memory dump by disk failure, want to add alternative dump devices. In these cases, I consider that LKCD have to be handle multiple dump devices to be useful in different environments. It can improve the following problems: - When it runs short of capacity in one dump device Dump data can be divided and written in two or more dump devices. - When the dump device is broken The LKCD can dump, If it can use at least one among dump devices. I think that such expansion is indispensable for enterprises use, what do you think? LKCD dumps to multiple dump devices with parallel I/O if possible and time of dumping can be decreased. but it is still under examination. --Masashige From owner-lkcd@oss.sgi.com Tue Sep 4 01:16:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f848GQj13505 for lkcd-outgoing; Tue, 4 Sep 2001 01:16:26 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f848GMd13500 for ; Tue, 4 Sep 2001 01:16:22 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f5G641J02782; Fri, 15 Jun 2001 23:04:01 -0700 Message-ID: <3B948D4F.9D7257B1@alacritech.com> Date: Tue, 04 Sep 2001 01:14:07 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: naomi@pst.fujitsu.com CC: lkcd@oss.sgi.com Subject: Re: lcrash sub-commands line completion References: <20010904162753R.naomi@pst.fujitsu.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk This sounds like a great thing to add. I have no problems with it. Note that we used to have a readline capability, but we removed it due to some of the GPL/LGPL licensing conflicts. Please let me know if you complete this in the future. I'm still planning to roll a 4.0 release as soon as I talk to the IBM folks about the last code drop I gave them. For those who are working directly in the tree, you'll note we're now moving from 'vmdump' to 'dump' conventions, and hopefully all the future scripts will use this as well. Also, I spoke to someone at MCL, and we'll see how we can roll in mcore into the LKCD project in some capacity. Have at it, Naomi-san. :) --Matt naomi@pst.fujitsu.com wrote: > > Hello. > Recently, I think that lcrash should have "sub-commands line completion". > > Lcrash has many sub-commands. And almost sub-commands have parameters such as > filename or symbol name which should be specified. > The present lcrash cannot complete on sub-commands line. > For this reason, we have to memorize sub-commands names and parameters exactly. > It is very inconvenient. > So I'll add completion capability to librl. > > I'm considering as follows. > While editing sub-commands line, if TAB key is pressed, lcrash completes the > line (or do something as bash does). > Lcrash will complete on sub-commands names with behavior almost equivalent to > bash. > And I consider that parameters of sub-commands have different characteristic > each other, I'll add the mechanism let you be able to make your own completion > function. Using this mechanism, you can call the function that behaves as you > want when TAB key is pressed. > > As the first phase, I will show the completion on sub-commands names by the > middle of the month in September. > And as the next phase, I will show the mechanism of sub-commands parameters > completion with some sample source using it. > > Is anybody considering sub-commands line completion? > Any comments and suggestions are welcomed. > > Naomi Haseo From owner-lkcd@oss.sgi.com Tue Sep 4 06:53:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84Drpu21154 for lkcd-outgoing; Tue, 4 Sep 2001 06:53:51 -0700 Received: from baucis.sc.intel.com (ns3.intel.com [143.183.152.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84Drjd21140 for ; Tue, 4 Sep 2001 06:53:45 -0700 Received: from SMTP (fmsmsxvs03-1.fm.intel.com [132.233.42.203]) by baucis.sc.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.41 2001/07/09 21:06:22 root Exp $) with SMTP id NAA07723; Tue, 4 Sep 2001 13:53:38 GMT Received: from fmsmsx26.fm.intel.com ([132.233.48.26]) by 132.233.48.203 (Norton AntiVirus for Internet Email Gateways 1.0) ; Tue, 04 Sep 2001 13:53:37 0000 (GMT) Received: by fmsmsx26.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Tue, 4 Sep 2001 06:53:37 -0700 Message-ID: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> From: "Howell, David P" To: "'Masashige Kotani'" , lkcd@oss.sgi.com Subject: RE: multiple dump devices Date: Tue, 4 Sep 2001 06:53:33 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-2022-jp" Sender: owner-lkcd@oss.sgi.com Precedence: bulk We are working on a proposal for redundant dump device support that I plan to share in the next few weeks; I've got a prototype mostly working that can be contributed. Let me know how you are approaching this, I'll send details of what we are doing here later this week. Sounds like a good opportunity for collaboration on this. Regards, Dave Howell -----Original Message----- From: Masashige Kotani [mailto:m-kotani@pst.fujitsu.com] Sent: Tuesday, September 04, 2001 4:07 AM To: lkcd@oss.sgi.com Subject: multiple dump devices Hello. Nowadays, I think that the reliability of memory dumping: extracting memory as much as possible will be improved. LKCD uses "only one dump device" in process of memory dump. When it does not have enough capacity for memory dump it is not able to be used by some reasons, memory dumping is failure. - When additional memory devices are attached, The capacity of the dump device must be increased. - To avoid failing memory dump by disk failure, want to add alternative dump devices. In these cases, I consider that LKCD have to be handle multiple dump devices to be useful in different environments. It can improve the following problems: - When it runs short of capacity in one dump device Dump data can be divided and written in two or more dump devices. - When the dump device is broken The LKCD can dump, If it can use at least one among dump devices. I think that such expansion is indispensable for enterprises use, what do you think? LKCD dumps to multiple dump devices with parallel I/O if possible and time of dumping can be decreased. but it is still under examination. --Masashige From owner-lkcd@oss.sgi.com Tue Sep 4 08:21:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84FLqo24064 for lkcd-outgoing; Tue, 4 Sep 2001 08:21:52 -0700 Received: from socal.sandiegoca.ncr.com (tan7.ncr.com [192.127.94.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84FLjd24060 for ; Tue, 4 Sep 2001 08:21:45 -0700 Received: from eswssol002.elsegundoca.ncr.com (eswssol002 [141.206.1.4]) by socal.sandiegoca.ncr.com (8.9.3+Sun/8.9.2) with ESMTP id IAA11433; Tue, 4 Sep 2001 08:21:37 -0700 (PDT) Received: (from kim@localhost) by eswssol002.elsegundoca.ncr.com (8.9.3+Sun/8.9.2) id IAA18430; Tue, 4 Sep 2001 08:21:35 -0700 (PDT) Date: Tue, 4 Sep 2001 08:21:35 -0700 From: Moo Kim To: r1vamsi@in.ibm.com Cc: "Matt D. Robinson" , richard.schaal@intel.com, lkcd@oss.sgi.com Subject: Re: LKCD + KDB ? Message-ID: <20010904082135.A17366@mailbox.ElSegundoCA.NCR.COM> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from r1vamsi@in.ibm.com on Mon, Sep 03, 2001 at 03:25:04PM +0530 Sender: owner-lkcd@oss.sgi.com Precedence: bulk I agree that adding of dump (or sysdump) command to KDB would be very useful. When the node drops into KDB from an oops, one may not have time to examine the oops problem online (or this person may not be the developer, but test engineer) that one may choose (or being asked) to take a memory dump instead to analyze the problem problem later. Thanks, Moo Kim Moo.Kim@NCR.COM NCR Corporation On Mon, Sep 03, 2001 at 03:25:04PM +0530, r1vamsi@in.ibm.com wrote: > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > dump_execute will be called after we exit the debugger. > > If all you want is to disable dump taking after exiting debugger, that is > easy enough with editing the dump_okay flag from within the debugger (or > add a kdb command to do this) as Matt points out. Assuming there is a good > reason for wanting to take the dump from within the debugger, one should > add a simple dump command to kdb, which will just call dump_execute with > proper regs. What you could do today is to set eip to dump_execute from > with in the kernel, editing the stack to push correct params :-) (not as > hard as it sounds, really) > > However, the cleaner approach obviously is to add the kdb dump command, > once we understand a little better why exactly would one want to dump from > within the debugger (on an oops). > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > > Please respond to "Matt D. Robinson" > > To: richard.schaal@intel.com > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > Subject: Re: LKCD + KDB ? > > > > Richard Schaal wrote: > > > > > > My question is this - I have been a fan of the kernel debugger for some > > time, and have had a bit of difficulty > > resolving how to configure both capabilities into my kernel. I guess > > what I'd like to have happen is to > > have the system enter the debugger on an oops, then have the option of > > dumping the system from the debugger, or > > to dump the system automatically after the debugger is exited. > > There's no great way to do this right now. If in kdb you can set the > field of 'dump_okay' field to FALSE, then reset it after dropping back > from the debugger state, that'd be fine. I guess we could also add in > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > and when dump_execute() gets called, dump_kdb is checked, and if set > to TRUE, resets it to FALSE. Then add a kdb command that sets the > field for you ... > > Would that work? > > --Matt > > > What is your thinking on this? Did I goof something up in applying the > > patches for the two features? > > > > Thanks, > > Richard > > > > -- > > Richard.Schaal@intel.com Intel Corporation > > Ph: (408)765-1579 Richard Schaal > > Mail Stop SC12-308 > > 3600 Juliette Lane > > "I can type faster than I think!" Santa Clara, CA 95052 > From owner-lkcd@oss.sgi.com Tue Sep 4 09:13:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84GDR826197 for lkcd-outgoing; Tue, 4 Sep 2001 09:13:27 -0700 Received: from baucis.sc.intel.com (ns3.intel.com [143.183.152.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84GDLd26192 for ; Tue, 4 Sep 2001 09:13:21 -0700 Received: from SMTP (fmsmsxvs03-1.fm.intel.com [132.233.42.203]) by baucis.sc.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.41 2001/07/09 21:06:22 root Exp $) with SMTP id QAA25092; Tue, 4 Sep 2001 16:13:13 GMT Received: from fmsmsx26.fm.intel.com ([132.233.48.26]) by 132.233.48.203 (Norton AntiVirus for Internet Email Gateways 1.0) ; Tue, 04 Sep 2001 16:13:11 0000 (GMT) Received: by fmsmsx26.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Tue, 4 Sep 2001 09:13:10 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C5786648485A@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'r1vamsi@in.ibm.com'" , "Matt D. Robinson" Cc: "Schaal, Richard" , lkcd@oss.sgi.com Subject: RE: LKCD + KDB ? Date: Tue, 4 Sep 2001 09:13:05 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk I think it would be relatively simple to have the dump_init code register a dump system function with the kernel debugger so that you could dump the system on demand. Note that not all problems are Oops related, and that a hung system, or one that is grossly under performing would be useful to get a snapshot of the activity at that time. Manual entry to the debugger and manual dump would seem to be a useful thing. - System survivability after such a dump would be nice, but not a show stopper at this point. So far as the dumping or not after an oops and entering kdb, there is a differentiation as to the reason for entering the debugger - you might derive a dump/no dump directive from whether you enter the debugger by reason of breakpoint or oops? I used to work for Stratus Computer - at that time, a panic or oops would put us into the debugger, and if we were successful in patching up the problem, the system could resume execution. In Linux, after an oops, maybe a "nodump" command would be useful as well to disable the dumping that might normally occur. Regards, Richard -----Original Message----- From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] Sent: Monday, September 03, 2001 2:55 AM To: Matt D. Robinson Cc: richard.schaal@intel.com; lkcd@oss.sgi.com Subject: Re: LKCD + KDB ? When both KDB and LKCD patches are applied, we drop into KDB on an oops. dump_execute will be called after we exit the debugger. If all you want is to disable dump taking after exiting debugger, that is easy enough with editing the dump_okay flag from within the debugger (or add a kdb command to do this) as Matt points out. Assuming there is a good reason for wanting to take the dump from within the debugger, one should add a simple dump command to kdb, which will just call dump_execute with proper regs. What you could do today is to set eip to dump_execute from with in the kernel, editing the stack to push correct params :-) (not as hard as it sounds, really) However, the cleaner approach obviously is to add the kdb dump command, once we understand a little better why exactly would one want to dump from within the debugger (on an oops). Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: r1vamsi@in.ibm.com Please respond to "Matt D. Robinson" To: richard.schaal@intel.com cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) Subject: Re: LKCD + KDB ? Richard Schaal wrote: > > > My question is this - I have been a fan of the kernel debugger for some > time, and have had a bit of difficulty > resolving how to configure both capabilities into my kernel. I guess > what I'd like to have happen is to > have the system enter the debugger on an oops, then have the option of > dumping the system from the debugger, or > to dump the system automatically after the debugger is exited. There's no great way to do this right now. If in kdb you can set the field of 'dump_okay' field to FALSE, then reset it after dropping back from the debugger state, that'd be fine. I guess we could also add in something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, and when dump_execute() gets called, dump_kdb is checked, and if set to TRUE, resets it to FALSE. Then add a kdb command that sets the field for you ... Would that work? --Matt > What is your thinking on this? Did I goof something up in applying the > patches for the two features? > > Thanks, > Richard > > -- > Richard.Schaal@intel.com Intel Corporation > Ph: (408)765-1579 Richard Schaal > Mail Stop SC12-308 > 3600 Juliette Lane > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Tue Sep 4 16:27:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84NRIc03514 for lkcd-outgoing; Tue, 4 Sep 2001 16:27:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84NR8d03511 for ; Tue, 4 Sep 2001 16:27:08 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f84NMDO00430; Tue, 4 Sep 2001 16:22:13 -0700 Message-ID: <3B9563E8.9A432B7B@alacritech.com> Date: Tue, 04 Sep 2001 16:29:44 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Howell, David P" CC: "'Masashige Kotani'" , lkcd@oss.sgi.com Subject: Re: multiple dump devices References: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Howell, David P" wrote: > > We are working on a proposal for redundant dump device support that I plan > to > share in the next few weeks; I've got a prototype mostly working that can be > > contributed. Let me know how you are approaching this, I'll send details of > what we are doing here later this week. Sounds like a good opportunity for > collaboration on this. > > Regards, > Dave Howell I'm really curious as to the proposal. Sounds like a good idea, the real question becomes, do you want to chain multiple dump devices with multiple dump mechanisms? Here's where I'm going with this. I just finished the code to allow people to install their own dump compression mechanisms (right now, it'll be RLE, I have to check in the GZIP compression module, and people can put in whatever one they want). Do you want to take the next step and let people have chains of dump mechanisms based on the dump condition? I realize multiple dump devices is good, but what if you could plug in your own dump method with it? Then that dump method could query the available dump devices configured. So you'd have: dump methods (one standard, but plug-and-play) dump devices (requires at least one, multiples allowed, maybe access lists for methods?) dump compressions (configurable, usable by some methods) Would this be the eventual goal? That way, everything is tunable to their own liking. I figured I'd ask, since if you're going to add in multiple dump devices, and we've gone to multiple compression types, you might as well go all the way and add dump methods as well. I don't know what the rest of the group thinks, but this could be very useful. I'd definitely like to get some feedback ... this is all doable, as long as the dump compression code is in 'lcrash', and the pages are dumped in a way that we can find the location in memory, this can work pretty sweet for everyone here. --Matt > -----Original Message----- > From: Masashige Kotani [mailto:m-kotani@pst.fujitsu.com] > Sent: Tuesday, September 04, 2001 4:07 AM > To: lkcd@oss.sgi.com > Subject: multiple dump devices > > Hello. > > Nowadays, I think that the reliability of memory dumping: extracting memory > as much as possible will be improved. > > LKCD uses "only one dump device" in process of memory dump. When it does not > have enough capacity for memory dump it is not able to be used by some > reasons, memory dumping is failure. > > - When additional memory devices are attached, The capacity of the dump > device must be increased. > - To avoid failing memory dump by disk failure, want to add alternative dump > devices. > In these cases, I consider that LKCD have to be handle multiple dump devices > to be useful in different environments. > > It can improve the following problems: > - When it runs short of capacity in one dump device > Dump data can be divided and written in two or more dump devices. > - When the dump device is broken > The LKCD can dump, If it can use at least one among dump devices. > > I think that such expansion is indispensable for enterprises use, what do > you think? > > LKCD dumps to multiple dump devices with parallel I/O if possible and time > of dumping can be decreased. but it is still under examination. > > --Masashige From owner-lkcd@oss.sgi.com Tue Sep 4 16:29:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84NT3h03554 for lkcd-outgoing; Tue, 4 Sep 2001 16:29:03 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84NT1d03551 for ; Tue, 4 Sep 2001 16:29:01 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f84NO3O00476; Tue, 4 Sep 2001 16:24:03 -0700 Message-ID: <3B956457.DDBBE9CF@alacritech.com> Date: Tue, 04 Sep 2001 16:31:35 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Felix Radensky CC: lkcd@oss.sgi.com Subject: Re: Using latest CVS sources References: <3B938EAD.9C8D4E92@allot.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Felix Radensky wrote: > > Hi, > > Can someone please explain how can I use the latest CVS sources > with kernel 2.2.19. > > Thanks in advance. > > Felix. Hi, Felix. The latest 2.2 tree is a bit behind what we're currently doing, and I haven't tried applying some of this stuff to 2.2 as of yet. The last state I left the 2.2 tree in was to at least allow you to dump to IDE disks as well, and has the Kerntypes mechanism in place. Is there some feature you're looking for in particular? --Matt From owner-lkcd@oss.sgi.com Tue Sep 4 16:34:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84NYku03702 for lkcd-outgoing; Tue, 4 Sep 2001 16:34:46 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84NYdd03699 for ; Tue, 4 Sep 2001 16:34:39 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f84NThO00620; Tue, 4 Sep 2001 16:29:43 -0700 Message-ID: <3B9565AA.FA9C1B2D@alacritech.com> Date: Tue, 04 Sep 2001 16:37:14 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Schaal, Richard" CC: "'r1vamsi@in.ibm.com'" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C5786648485A@fmsmsx50.fm.intel.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Schaal, Richard" wrote: > > I think it would be relatively simple to have the dump_init code register a > dump system > function with the kernel debugger so that you could dump the system on > demand. Note that > not all problems are Oops related, and that a hung system, or one that is > grossly under performing > would be useful to get a snapshot of the activity at that time. Manual > entry to the debugger > and manual dump would seem to be a useful thing. - System survivability > after such a dump would be > nice, but not a show stopper at this point. You should already be able to do this with dump_function_ptr in the latest code. This should be assigned to dump_execute (at least in the last check-in I made). So if you call that address, you'll get the dump function pointer. > So far as the dumping or not after an oops and entering kdb, there is a > differentiation as to the reason > for entering the debugger - you might derive a dump/no dump directive from > whether you enter the debugger > by reason of breakpoint or oops? I'm curious, how many people drop into kdb, and then want to take a dump? I'd think that this is very useful for developers, but not as useful for customers who want to crash and reboot. > I used to work for Stratus Computer - at that time, a panic or oops would > put us into the debugger, and if we > were successful in patching up the problem, the system could resume > execution. In Linux, after an oops, maybe > a "nodump" command would be useful as well to disable the dumping that might > normally occur. This is fine -- I think these are all reasonable extensions to KDB, and I can work with that developer if need be to make that happen. There's an easy solution, one way or another. --Matt > Regards, > Richard > > -----Original Message----- > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > Sent: Monday, September 03, 2001 2:55 AM > To: Matt D. Robinson > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > Subject: Re: LKCD + KDB ? > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > dump_execute will be called after we exit the debugger. > > If all you want is to disable dump taking after exiting debugger, that is > easy enough with editing the dump_okay flag from within the debugger (or > add a kdb command to do this) as Matt points out. Assuming there is a good > reason for wanting to take the dump from within the debugger, one should > add a simple dump command to kdb, which will just call dump_execute with > proper regs. What you could do today is to set eip to dump_execute from > with in the kernel, editing the stack to push correct params :-) (not as > hard as it sounds, really) > > However, the cleaner approach obviously is to add the kdb dump command, > once we understand a little better why exactly would one want to dump from > within the debugger (on an oops). > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > Please respond to "Matt D. Robinson" > > To: richard.schaal@intel.com > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > Subject: Re: LKCD + KDB ? > > Richard Schaal wrote: > > > > > > My question is this - I have been a fan of the kernel debugger for some > > time, and have had a bit of difficulty > > resolving how to configure both capabilities into my kernel. I guess > > what I'd like to have happen is to > > have the system enter the debugger on an oops, then have the option of > > dumping the system from the debugger, or > > to dump the system automatically after the debugger is exited. > > There's no great way to do this right now. If in kdb you can set the > field of 'dump_okay' field to FALSE, then reset it after dropping back > from the debugger state, that'd be fine. I guess we could also add in > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > and when dump_execute() gets called, dump_kdb is checked, and if set > to TRUE, resets it to FALSE. Then add a kdb command that sets the > field for you ... > > Would that work? > > --Matt > > > What is your thinking on this? Did I goof something up in applying the > > patches for the two features? > > > > Thanks, > > Richard > > > > -- > > Richard.Schaal@intel.com Intel Corporation > > Ph: (408)765-1579 Richard Schaal > > Mail Stop SC12-308 > > 3600 Juliette Lane > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Tue Sep 4 16:48:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f84NmNE03878 for lkcd-outgoing; Tue, 4 Sep 2001 16:48:23 -0700 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f84NmKd03875 for ; Tue, 4 Sep 2001 16:48:20 -0700 Received: (qmail 620 invoked from network); 4 Sep 2001 23:48:17 -0000 Received: from ocs3.intra.ocs.com.au (192.168.255.3) by mail.ocs.com.au with SMTP; 4 Sep 2001 23:48:17 -0000 Received: by ocs3.intra.ocs.com.au (Postfix, from userid 16331) id C1DF630008C; Wed, 5 Sep 2001 09:47:36 +1000 (EST) Received: from ocs3.intra.ocs.com.au (localhost [127.0.0.1]) by ocs3.intra.ocs.com.au (Postfix) with ESMTP id B4F4C9E; Wed, 5 Sep 2001 09:47:36 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: "Matt D. Robinson" Cc: "Schaal, Richard" , lkcd@oss.sgi.com Subject: Re: LKCD + KDB ? In-reply-to: Your message of "Tue, 04 Sep 2001 16:37:14 MST." <3B9565AA.FA9C1B2D@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 05 Sep 2001 09:47:31 +1000 Message-ID: <14389.999647251@ocs3.intra.ocs.com.au> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 04 Sep 2001 16:37:14 -0700, "Matt D. Robinson" wrote: >"Schaal, Richard" wrote: >> I used to work for Stratus Computer - at that time, a panic or oops would >> put us into the debugger, and if we >> were successful in patching up the problem, the system could resume >> execution. In Linux, after an oops, maybe >> a "nodump" command would be useful as well to disable the dumping that might >> normally occur. > >This is fine -- I think these are all reasonable extensions to KDB, and >I can work with that developer if need be to make that happen. There's >an easy solution, one way or another. No need to involve me. Any code can register its own kdb commands as long as it runs after kdb init. IOW, the nodump command can be part of lkcd, no changes to kdb required. Just wrap it in #ifdef CONFIG_KDB. From owner-lkcd@oss.sgi.com Tue Sep 4 17:04:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8504GP04324 for lkcd-outgoing; Tue, 4 Sep 2001 17:04:16 -0700 Received: from thalia.fm.intel.com (fmfdns02.fm.intel.com [132.233.247.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85044d04320 for ; Tue, 4 Sep 2001 17:04:04 -0700 Received: from fmsmsxvs040.fm.intel.com (fmsmsxv040-1.fm.intel.com [132.233.48.108]) by thalia.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id AAA04098 for ; Wed, 5 Sep 2001 00:04:02 GMT Received: from fmsmsx17.intel.com ([132.233.48.17]) by fmsmsxvs040.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001090417030616419 ; Tue, 04 Sep 2001 17:03:06 -0700 Received: by fmsmsx17.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Tue, 4 Sep 2001 17:03:05 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C5786648485D@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" , "Schaal, Richard" Cc: "'r1vamsi@in.ibm.com'" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: RE: LKCD + KDB ? Date: Tue, 4 Sep 2001 17:01:26 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, When you refer to the "latest code", what is that? I don't see anything on source forge as released code, and the latest from the SGI site has patches up to linux-2.4.4 is that what you were referring to? Thanks, Richard -----Original Message----- From: Matt D. Robinson [mailto:yakker@alacritech.com] Sent: Tuesday, September 04, 2001 4:37 PM To: Schaal, Richard Cc: 'r1vamsi@in.ibm.com'; lkcd@oss.sgi.com; akale@users.sourceforge.net; kaos@ocs.com.au Subject: Re: LKCD + KDB ? "Schaal, Richard" wrote: > > I think it would be relatively simple to have the dump_init code register a > dump system > function with the kernel debugger so that you could dump the system on > demand. Note that > not all problems are Oops related, and that a hung system, or one that is > grossly under performing > would be useful to get a snapshot of the activity at that time. Manual > entry to the debugger > and manual dump would seem to be a useful thing. - System survivability > after such a dump would be > nice, but not a show stopper at this point. You should already be able to do this with dump_function_ptr in the latest code. This should be assigned to dump_execute (at least in the last check-in I made). So if you call that address, you'll get the dump function pointer. > So far as the dumping or not after an oops and entering kdb, there is a > differentiation as to the reason > for entering the debugger - you might derive a dump/no dump directive from > whether you enter the debugger > by reason of breakpoint or oops? I'm curious, how many people drop into kdb, and then want to take a dump? I'd think that this is very useful for developers, but not as useful for customers who want to crash and reboot. > I used to work for Stratus Computer - at that time, a panic or oops would > put us into the debugger, and if we > were successful in patching up the problem, the system could resume > execution. In Linux, after an oops, maybe > a "nodump" command would be useful as well to disable the dumping that might > normally occur. This is fine -- I think these are all reasonable extensions to KDB, and I can work with that developer if need be to make that happen. There's an easy solution, one way or another. --Matt > Regards, > Richard > > -----Original Message----- > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > Sent: Monday, September 03, 2001 2:55 AM > To: Matt D. Robinson > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > Subject: Re: LKCD + KDB ? > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > dump_execute will be called after we exit the debugger. > > If all you want is to disable dump taking after exiting debugger, that is > easy enough with editing the dump_okay flag from within the debugger (or > add a kdb command to do this) as Matt points out. Assuming there is a good > reason for wanting to take the dump from within the debugger, one should > add a simple dump command to kdb, which will just call dump_execute with > proper regs. What you could do today is to set eip to dump_execute from > with in the kernel, editing the stack to push correct params :-) (not as > hard as it sounds, really) > > However, the cleaner approach obviously is to add the kdb dump command, > once we understand a little better why exactly would one want to dump from > within the debugger (on an oops). > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > Please respond to "Matt D. Robinson" > > To: richard.schaal@intel.com > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > Subject: Re: LKCD + KDB ? > > Richard Schaal wrote: > > > > > > My question is this - I have been a fan of the kernel debugger for some > > time, and have had a bit of difficulty > > resolving how to configure both capabilities into my kernel. I guess > > what I'd like to have happen is to > > have the system enter the debugger on an oops, then have the option of > > dumping the system from the debugger, or > > to dump the system automatically after the debugger is exited. > > There's no great way to do this right now. If in kdb you can set the > field of 'dump_okay' field to FALSE, then reset it after dropping back > from the debugger state, that'd be fine. I guess we could also add in > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > and when dump_execute() gets called, dump_kdb is checked, and if set > to TRUE, resets it to FALSE. Then add a kdb command that sets the > field for you ... > > Would that work? > > --Matt > > > What is your thinking on this? Did I goof something up in applying the > > patches for the two features? > > > > Thanks, > > Richard > > > > -- > > Richard.Schaal@intel.com Intel Corporation > > Ph: (408)765-1579 Richard Schaal > > Mail Stop SC12-308 > > 3600 Juliette Lane > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Tue Sep 4 17:09:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8509d004558 for lkcd-outgoing; Tue, 4 Sep 2001 17:09:39 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8509Ud04554 for ; Tue, 4 Sep 2001 17:09:30 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8504KO01594; Tue, 4 Sep 2001 17:04:20 -0700 Message-ID: <3B956DC7.F5F3559F@alacritech.com> Date: Tue, 04 Sep 2001 17:11:51 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Schaal, Richard" CC: "'r1vamsi@in.ibm.com'" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C5786648485D@fmsmsx50.fm.intel.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Schaal, Richard" wrote: > > Hi Matt, > When you refer to the "latest code", what is that? I don't see anything on > source forge as released code, and the > latest from the SGI site has patches up to linux-2.4.4 is that what you were > referring to? > > Thanks, > Richard The latest code is in the SourceForge tree ... look in 2.4/drivers/block/dump.c, and you'll see the restructuring changes. 'lcrash' has also changed a bit. I copied the LKCD group on my last check-in. If you didn't get a copy of it, let me know. It touched a bunch of files. I have to check in new scripts and a new dumpconfig utility next (and fix this bloody SMP problem now that I actually have an SMP system again to test against). --Matt > > -----Original Message----- > From: Matt D. Robinson [mailto:yakker@alacritech.com] > Sent: Tuesday, September 04, 2001 4:37 PM > To: Schaal, Richard > Cc: 'r1vamsi@in.ibm.com'; lkcd@oss.sgi.com; akale@users.sourceforge.net; > kaos@ocs.com.au > Subject: Re: LKCD + KDB ? > > "Schaal, Richard" wrote: > > > > I think it would be relatively simple to have the dump_init code register > a > > dump system > > function with the kernel debugger so that you could dump the system on > > demand. Note that > > not all problems are Oops related, and that a hung system, or one that is > > grossly under performing > > would be useful to get a snapshot of the activity at that time. Manual > > entry to the debugger > > and manual dump would seem to be a useful thing. - System survivability > > after such a dump would be > > nice, but not a show stopper at this point. > > You should already be able to do this with dump_function_ptr in the > latest code. This should be assigned to dump_execute (at least in > the last check-in I made). So if you call that address, you'll get > the dump function pointer. > > > So far as the dumping or not after an oops and entering kdb, there is a > > differentiation as to the reason > > for entering the debugger - you might derive a dump/no dump directive from > > whether you enter the debugger > > by reason of breakpoint or oops? > > I'm curious, how many people drop into kdb, and then want to take a dump? > I'd think that this is very useful for developers, but not as useful for > customers who want to crash and reboot. > > > I used to work for Stratus Computer - at that time, a panic or oops would > > put us into the debugger, and if we > > were successful in patching up the problem, the system could resume > > execution. In Linux, after an oops, maybe > > a "nodump" command would be useful as well to disable the dumping that > might > > normally occur. > > This is fine -- I think these are all reasonable extensions to KDB, and > I can work with that developer if need be to make that happen. There's > an easy solution, one way or another. > > --Matt > > > Regards, > > Richard > > > > -----Original Message----- > > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > > Sent: Monday, September 03, 2001 2:55 AM > > To: Matt D. Robinson > > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > > Subject: Re: LKCD + KDB ? > > > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > > dump_execute will be called after we exit the debugger. > > > > If all you want is to disable dump taking after exiting debugger, that is > > easy enough with editing the dump_okay flag from within the debugger (or > > add a kdb command to do this) as Matt points out. Assuming there is a good > > reason for wanting to take the dump from within the debugger, one should > > add a simple dump command to kdb, which will just call dump_execute with > > proper regs. What you could do today is to set eip to dump_execute from > > with in the kernel, editing the stack to push correct params :-) (not as > > hard as it sounds, really) > > > > However, the cleaner approach obviously is to add the kdb dump command, > > once we understand a little better why exactly would one want to dump from > > within the debugger (on an oops). > > > > Regards.. Vamsi. > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Lab, Bangalore. > > Ph: +91 80 5262355 Extn: 3959 > > Internet: r1vamsi@in.ibm.com > > > > Please respond to "Matt D. Robinson" > > > > To: richard.schaal@intel.com > > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > > Subject: Re: LKCD + KDB ? > > > > Richard Schaal wrote: > > > > > > > > > My question is this - I have been a fan of the kernel debugger for some > > > time, and have had a bit of difficulty > > > resolving how to configure both capabilities into my kernel. I guess > > > what I'd like to have happen is to > > > have the system enter the debugger on an oops, then have the option of > > > dumping the system from the debugger, or > > > to dump the system automatically after the debugger is exited. > > > > There's no great way to do this right now. If in kdb you can set the > > field of 'dump_okay' field to FALSE, then reset it after dropping back > > from the debugger state, that'd be fine. I guess we could also add in > > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > > and when dump_execute() gets called, dump_kdb is checked, and if set > > to TRUE, resets it to FALSE. Then add a kdb command that sets the > > field for you ... > > > > Would that work? > > > > --Matt > > > > > What is your thinking on this? Did I goof something up in applying the > > > patches for the two features? > > > > > > Thanks, > > > Richard > > > > > > -- > > > Richard.Schaal@intel.com Intel Corporation > > > Ph: (408)765-1579 Richard Schaal > > > Mail Stop SC12-308 > > > 3600 Juliette Lane > > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Tue Sep 4 22:28:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f855SAo13551 for lkcd-outgoing; Tue, 4 Sep 2001 22:28:10 -0700 Received: from smtp02.vsnl.net (smtp02.vsnl.net [203.197.12.8]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f855Rwd13548 for ; Tue, 4 Sep 2001 22:27:58 -0700 Received: from vsnl.net ([203.199.156.60]) by smtp02.vsnl.net (Netscape Messaging Server 4.15) with ESMTP id GJ69RN01.JCY; Wed, 5 Sep 2001 09:58:35 +0530 Message-ID: <3B959ED0.E33BA3BC@vsnl.net> Date: Wed, 05 Sep 2001 09:11:04 +0530 From: "Amit S. Kale" X-Mailer: Mozilla 4.72 [en] (X11; U; Linux 2.4.6 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: "Schaal, Richard" , "'r1vamsi@in.ibm.com'" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C5786648485D@fmsmsx50.fm.intel.com> <3B956DC7.F5F3559F@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, I have faced several times the problem of crash dumps not being available in kgdb. Many a times I don't have time to debug a panic immediately, so I keep the machine inside the debugger. A crash dump will enable me to save a crash dump and continue testing. I can get back to the core dump later. Usually it's a good idea to save cores for all non-trivial problems once a product goes alpha. If a problem which is supposedly fixed resurfaces, it's very difficult to say whether it's the same problem in absence of a core dump. In ideal world, all problems should be fixed immediately and completely using a debugger and we wouldn't need crash dumps. I guess it's time to think about making kgdb understand lkcd interface. "Matt D. Robinson" wrote: > > "Schaal, Richard" wrote: > > > > Hi Matt, > > When you refer to the "latest code", what is that? I don't see anything on > > source forge as released code, and the > > latest from the SGI site has patches up to linux-2.4.4 is that what you were > > referring to? > > > > Thanks, > > Richard > > The latest code is in the SourceForge tree ... look in > 2.4/drivers/block/dump.c, > and you'll see the restructuring changes. 'lcrash' has also changed a bit. > I copied the LKCD group on my last check-in. If you didn't get a copy of it, > let me know. It touched a bunch of files. > > I have to check in new scripts and a new dumpconfig utility next (and fix > this bloody SMP problem now that I actually have an SMP system again to test > against). > > --Matt > > > > > -----Original Message----- > > From: Matt D. Robinson [mailto:yakker@alacritech.com] > > Sent: Tuesday, September 04, 2001 4:37 PM > > To: Schaal, Richard > > Cc: 'r1vamsi@in.ibm.com'; lkcd@oss.sgi.com; akale@users.sourceforge.net; > > kaos@ocs.com.au > > Subject: Re: LKCD + KDB ? > > > > "Schaal, Richard" wrote: > > > > > > I think it would be relatively simple to have the dump_init code register > > a > > > dump system > > > function with the kernel debugger so that you could dump the system on > > > demand. Note that > > > not all problems are Oops related, and that a hung system, or one that is > > > grossly under performing > > > would be useful to get a snapshot of the activity at that time. Manual > > > entry to the debugger > > > and manual dump would seem to be a useful thing. - System survivability > > > after such a dump would be > > > nice, but not a show stopper at this point. > > > > You should already be able to do this with dump_function_ptr in the > > latest code. This should be assigned to dump_execute (at least in > > the last check-in I made). So if you call that address, you'll get > > the dump function pointer. > > > > > So far as the dumping or not after an oops and entering kdb, there is a > > > differentiation as to the reason > > > for entering the debugger - you might derive a dump/no dump directive from > > > whether you enter the debugger > > > by reason of breakpoint or oops? > > > > I'm curious, how many people drop into kdb, and then want to take a dump? > > I'd think that this is very useful for developers, but not as useful for > > customers who want to crash and reboot. > > > > > I used to work for Stratus Computer - at that time, a panic or oops would > > > put us into the debugger, and if we > > > were successful in patching up the problem, the system could resume > > > execution. In Linux, after an oops, maybe > > > a "nodump" command would be useful as well to disable the dumping that > > might > > > normally occur. > > > > This is fine -- I think these are all reasonable extensions to KDB, and > > I can work with that developer if need be to make that happen. There's > > an easy solution, one way or another. > > > > --Matt > > > > > Regards, > > > Richard > > > > > > -----Original Message----- > > > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > > > Sent: Monday, September 03, 2001 2:55 AM > > > To: Matt D. Robinson > > > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > > > Subject: Re: LKCD + KDB ? > > > > > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > > > dump_execute will be called after we exit the debugger. > > > > > > If all you want is to disable dump taking after exiting debugger, that is > > > easy enough with editing the dump_okay flag from within the debugger (or > > > add a kdb command to do this) as Matt points out. Assuming there is a good > > > reason for wanting to take the dump from within the debugger, one should > > > add a simple dump command to kdb, which will just call dump_execute with > > > proper regs. What you could do today is to set eip to dump_execute from > > > with in the kernel, editing the stack to push correct params :-) (not as > > > hard as it sounds, really) > > > > > > However, the cleaner approach obviously is to add the kdb dump command, > > > once we understand a little better why exactly would one want to dump from > > > within the debugger (on an oops). > > > > > > Regards.. Vamsi. > > > > > > Vamsi Krishna S. > > > Linux Technology Center, > > > IBM Software Lab, Bangalore. > > > Ph: +91 80 5262355 Extn: 3959 > > > Internet: r1vamsi@in.ibm.com > > > > > > Please respond to "Matt D. Robinson" > > > > > > To: richard.schaal@intel.com > > > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > > > Subject: Re: LKCD + KDB ? > > > > > > Richard Schaal wrote: > > > > > > > > > > > > My question is this - I have been a fan of the kernel debugger for some > > > > time, and have had a bit of difficulty > > > > resolving how to configure both capabilities into my kernel. I guess > > > > what I'd like to have happen is to > > > > have the system enter the debugger on an oops, then have the option of > > > > dumping the system from the debugger, or > > > > to dump the system automatically after the debugger is exited. > > > > > > There's no great way to do this right now. If in kdb you can set the > > > field of 'dump_okay' field to FALSE, then reset it after dropping back > > > from the debugger state, that'd be fine. I guess we could also add in > > > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > > > and when dump_execute() gets called, dump_kdb is checked, and if set > > > to TRUE, resets it to FALSE. Then add a kdb command that sets the > > > field for you ... > > > > > > Would that work? > > > > > > --Matt > > > > > > > What is your thinking on this? Did I goof something up in applying the > > > > patches for the two features? > > > > > > > > Thanks, > > > > Richard > > > > > > > > -- > > > > Richard.Schaal@intel.com Intel Corporation > > > > Ph: (408)765-1579 Richard Schaal > > > > Mail Stop SC12-308 > > > > 3600 Juliette Lane > > > > "I can type faster than I think!" Santa Clara, CA 95052 -- Amit S. Kale Linux Consultant, Pune, India. (kgdb@vsnl.net) Linux kernel source level debugger http://kgdb.sourceforge.net/ Translation filesystem http://trfs.sourceforge.net/ From owner-lkcd@oss.sgi.com Tue Sep 4 22:49:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f855nT913939 for lkcd-outgoing; Tue, 4 Sep 2001 22:49:29 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f855nLd13924 for ; Tue, 4 Sep 2001 22:49:21 -0700 Received: from f02n16e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f855jPT212220; Wed, 5 Sep 2001 15:45:25 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f855mq571344; Wed, 5 Sep 2001 15:48:52 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABE.001FEE76 ; Wed, 5 Sep 2001 15:48:46 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: r1vamsi@in.ibm.com To: Keith Owens cc: "Matt D. Robinson" , "Schaal, Richard" , lkcd@oss.sgi.com Message-ID: Date: Wed, 5 Sep 2001 11:12:59 +0530 Subject: Re: LKCD + KDB ? (link/init order) Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Keith, Lkcd could very well register kdb command and do what ever. However, when lkcd is linked into the kernel (which is the case most of the time), how can it be sure that kdb is initialized before lkcd's init (where in it could call kdb_register()) ? Is there any other way to ensure correct ordering of init calls, besides linking the objects in the desired sequence in the Makefiles? Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: r1vamsi@in.ibm.com Keith Owens on 09/05/2001 05:17:31 AM Please respond to Keith Owens To: "Matt D. Robinson" cc: "Schaal, Richard" , lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) Subject: Re: LKCD + KDB ? On Tue, 04 Sep 2001 16:37:14 -0700, "Matt D. Robinson" wrote: >"Schaal, Richard" wrote: >> I used to work for Stratus Computer - at that time, a panic or oops would >> put us into the debugger, and if we >> were successful in patching up the problem, the system could resume >> execution. In Linux, after an oops, maybe >> a "nodump" command would be useful as well to disable the dumping that might >> normally occur. > >This is fine -- I think these are all reasonable extensions to KDB, and >I can work with that developer if need be to make that happen. There's >an easy solution, one way or another. No need to involve me. Any code can register its own kdb commands as long as it runs after kdb init. IOW, the nodump command can be part of lkcd, no changes to kdb required. Just wrap it in #ifdef CONFIG_KDB. From owner-lkcd@oss.sgi.com Tue Sep 4 23:20:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f856Kh314406 for lkcd-outgoing; Tue, 4 Sep 2001 23:20:43 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f856Eid14341 for ; Tue, 4 Sep 2001 23:20:21 -0700 Received: from f02n16e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f8565KT127370; Wed, 5 Sep 2001 16:05:21 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8568k535402; Wed, 5 Sep 2001 16:08:46 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABE.0021C384 ; Wed, 5 Sep 2001 16:08:47 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: r1vamsi@in.ibm.com To: "Matt D. Robinson" cc: "Schaal, Richard" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Message-ID: Date: Wed, 5 Sep 2001 11:24:31 +0530 Subject: Re: LKCD + KDB ? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Richard, I agree with you completely on the rationale for wanting to dump from kdb. In fact, one could choose to trigger a dump (after which the system will likely continue to run), not just from KDB, which requires manual intervention, but from other debugging tools such as the IBM Dynamic Probes, where this could be done automatically. We are building "non-disruptive" dumps capability into lkcd, which will let the system continue normal execution after the dump is taken. These features will probably find more use when dumps are used for debugging other problem situations like performace related problems besides oops/panics. Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: r1vamsi@in.ibm.com "Matt D. Robinson" on 09/05/2001 05:07:14 AM Please respond to "Matt D. Robinson" To: "Schaal, Richard" cc: S Vamsikrishna/India/IBM@IBMIN, lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? "Schaal, Richard" wrote: > > I think it would be relatively simple to have the dump_init code register a > dump system > function with the kernel debugger so that you could dump the system on > demand. Note that > not all problems are Oops related, and that a hung system, or one that is > grossly under performing > would be useful to get a snapshot of the activity at that time. Manual > entry to the debugger > and manual dump would seem to be a useful thing. - System survivability > after such a dump would be > nice, but not a show stopper at this point. You should already be able to do this with dump_function_ptr in the latest code. This should be assigned to dump_execute (at least in the last check-in I made). So if you call that address, you'll get the dump function pointer. > So far as the dumping or not after an oops and entering kdb, there is a > differentiation as to the reason > for entering the debugger - you might derive a dump/no dump directive from > whether you enter the debugger > by reason of breakpoint or oops? I'm curious, how many people drop into kdb, and then want to take a dump? I'd think that this is very useful for developers, but not as useful for customers who want to crash and reboot. > I used to work for Stratus Computer - at that time, a panic or oops would > put us into the debugger, and if we > were successful in patching up the problem, the system could resume > execution. In Linux, after an oops, maybe > a "nodump" command would be useful as well to disable the dumping that might > normally occur. This is fine -- I think these are all reasonable extensions to KDB, and I can work with that developer if need be to make that happen. There's an easy solution, one way or another. --Matt > Regards, > Richard > > -----Original Message----- > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > Sent: Monday, September 03, 2001 2:55 AM > To: Matt D. Robinson > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > Subject: Re: LKCD + KDB ? > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > dump_execute will be called after we exit the debugger. > > If all you want is to disable dump taking after exiting debugger, that is > easy enough with editing the dump_okay flag from within the debugger (or > add a kdb command to do this) as Matt points out. Assuming there is a good > reason for wanting to take the dump from within the debugger, one should > add a simple dump command to kdb, which will just call dump_execute with > proper regs. What you could do today is to set eip to dump_execute from > with in the kernel, editing the stack to push correct params :-) (not as > hard as it sounds, really) > > However, the cleaner approach obviously is to add the kdb dump command, > once we understand a little better why exactly would one want to dump from > within the debugger (on an oops). > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > Please respond to "Matt D. Robinson" > > To: richard.schaal@intel.com > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > Subject: Re: LKCD + KDB ? > > Richard Schaal wrote: > > > > > > My question is this - I have been a fan of the kernel debugger for some > > time, and have had a bit of difficulty > > resolving how to configure both capabilities into my kernel. I guess > > what I'd like to have happen is to > > have the system enter the debugger on an oops, then have the option of > > dumping the system from the debugger, or > > to dump the system automatically after the debugger is exited. > > There's no great way to do this right now. If in kdb you can set the > field of 'dump_okay' field to FALSE, then reset it after dropping back > from the debugger state, that'd be fine. I guess we could also add in > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > and when dump_execute() gets called, dump_kdb is checked, and if set > to TRUE, resets it to FALSE. Then add a kdb command that sets the > field for you ... > > Would that work? > > --Matt > > > What is your thinking on this? Did I goof something up in applying the > > patches for the two features? > > > > Thanks, > > Richard > > > > -- > > Richard.Schaal@intel.com Intel Corporation > > Ph: (408)765-1579 Richard Schaal > > Mail Stop SC12-308 > > 3600 Juliette Lane > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Tue Sep 4 23:52:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f856qBV14802 for lkcd-outgoing; Tue, 4 Sep 2001 23:52:11 -0700 Received: from pneumatic-tube.sgi.com (pneumatic-tube.sgi.com [204.94.214.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f856q9d14799 for ; Tue, 4 Sep 2001 23:52:09 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id XAA09709 for ; Tue, 4 Sep 2001 23:50:33 -0700 (PDT) mail_from (kaos@ocs.com.au) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by nodin.corp.sgi.com (8.11.4/8.11.2/nodin-1.0) with ESMTP id f856p7F39410330; Tue, 4 Sep 2001 23:51:07 -0700 (PDT) Received: by kao2.melbourne.sgi.com (Postfix, from userid 16331) id 81393300095; Wed, 5 Sep 2001 16:50:20 +1000 (EST) Received: from kao2.melbourne.sgi.com (localhost [127.0.0.1]) by kao2.melbourne.sgi.com (Postfix) with ESMTP id E904AA6; Wed, 5 Sep 2001 16:50:20 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: r1vamsi@in.ibm.com Cc: lkcd@oss.sgi.com Subject: Re: LKCD + KDB ? (link/init order) In-reply-to: Your message of "Wed, 05 Sep 2001 11:12:59 +0530." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 05 Sep 2001 16:50:14 +1000 Message-ID: <28201.999672614@kao2.melbourne.sgi.com> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 5 Sep 2001 11:12:59 +0530, r1vamsi@in.ibm.com wrote: >Lkcd could very well register kdb command and do what ever. However, when >lkcd is linked into the kernel (which is the case most of the time), how >can it be sure that kdb is initialized before lkcd's init (where in it >could call kdb_register()) ? kdb is initialized just after mem_init(), in init/main.c::start_kernel(). If lkcd is called from start_kernel() then call it after kdb. If lkcd uses __initcall then it is initialized long after kdb has started. >Is there any other way to ensure correct ordering of init calls, besides >linking the objects in the desired sequence in the Makefiles? Either hand code the call sequence in start_kernel() or use __initcall and control the init order using the link order in the makefiles. Those are the only two choices. From owner-lkcd@oss.sgi.com Wed Sep 5 06:21:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f85DLfp21614 for lkcd-outgoing; Wed, 5 Sep 2001 06:21:41 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85DLZd21611 for ; Wed, 5 Sep 2001 06:21:36 -0700 Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f85DJ0o73274 for ; Wed, 5 Sep 2001 23:19:00 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f85DLN930066 for ; Wed, 5 Sep 2001 23:21:23 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABE.00495DC8 ; Wed, 5 Sep 2001 23:21:21 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: bsuparna@in.ibm.com To: "Matt D. Robinson" cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 5 Sep 2001 18:39:09 +0530 Subject: Latest lkcd code and planned changes Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, Responding to two of your notes (about the last lkcd code drop) in one shot. >I'm still planning to roll a 4.0 release as soon as I talk to >the IBM folks about the last code drop I gave them. We've tried this code out only a UP, and have just started trying it on a SMP system. We had a few initial hiccups with dump configuration with the original scripts, but using test.c and modifying the device number as you suggested did the trick. As you've mentioned below new scripts and a new dumpconfig utility would be required for the 4.0 release. Before moving to SMP, we decided to first merge in our changes to enable system continuation after a dump, by making the other CPUs spin for the duration of the dump and then release them, rather than making them stop. (We are now using dprobes to trigger the dump from a probe point to test our changes.) I'm hoping that we can include this in the 4.0 release together with SMP problem fixes that you are working on. When are you planning on the release ? The next thing that we are trying to implement is to get non-disruptive dumps to work from any context, including interrupt context, based on some of the ideas we'd discussed earlier. We are attempting to get this to work with the current basic dump i/o model for the non-disruptive dumps case. (We may need to relook at it later once the dump driver interface is in place, though only for devices that implement/register such an interface) Will discuss this in more detail after we've tried out a few things ... >For those who are working directly in the tree, you'll note we're >now moving from 'vmdump' to 'dump' conventions, and hopefully all >the future scripts will use this as well. BTW, I did try directly accessing the CVS tree, which works. >Also, I spoke to someone at MCL, and we'll see how we can roll in >mcore into the LKCD project in some capacity. That's good news! We wanted to check with you on this. Do we now have a contact at MCL whom we can work with to do this, so that we have a fallback standalone dump feature ? >The latest code is in the SourceForge tree ... look in >2.4/drivers/block/dump.c, >and you'll see the restructuring changes. 'lcrash' has also >changed a bit.I copied the LKCD group on my last check-in. >If you didn't get a copy of it,let me know. It touched a bunch >of files. >I have to check in new scripts and a new dumpconfig utility next >(and fix this bloody SMP problem now that I actually have an SMP >system again to test against). Do let us know how this goes. We had to give some thought to a few of the SMP issues for the non-disruptive case (not that we're sure if we've got it right or thought of all subtle race possibilities ! ), so it would be interesting to discuss this more (I remember you mentioned fixing the CPU 0 special cases when we talked last). Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 From owner-lkcd@oss.sgi.com Wed Sep 5 10:38:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f85Hcrk27489 for lkcd-outgoing; Wed, 5 Sep 2001 10:38:53 -0700 Received: from exg.allot.com (mail.allot.com [199.203.223.202]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85Hcld27486 for ; Wed, 5 Sep 2001 10:38:48 -0700 Received: from allot.com (FELIX [172.16.1.37]) by exg.allot.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id SKQVL6G1; Wed, 5 Sep 2001 20:44:58 +0200 Message-ID: <3B966319.56C10FE1@allot.com> Date: Wed, 05 Sep 2001 20:38:33 +0300 From: Felix Radensky Organization: Allot Communications Ltd. X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.2.19c i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: lkcd@oss.sgi.com Subject: Re: Using latest CVS sources References: <3B938EAD.9C8D4E92@allot.com> <3B956457.DDBBE9CF@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Matt I'm mostly looking for a more reliable dumps. I used version 3.1.3 with kernel 2.2.18 and noticed that dumps are not always created. E.g. if crash occurred in net_bh, the system just hanged and no dump was created. On the other hand, crash which occured at module loading stage, was dumped successfully. I was hoping that latest CVS code will allow more reliable dump creation in all contexts. Thanks. Felix. "Matt D. Robinson" wrote: > Felix Radensky wrote: > > > > Hi, > > > > Can someone please explain how can I use the latest CVS sources > > with kernel 2.2.19. > > > > Thanks in advance. > > > > Felix. > > Hi, Felix. The latest 2.2 tree is a bit behind what we're currently > doing, and I haven't tried applying some of this stuff to 2.2 as of > yet. The last state I left the 2.2 tree in was to at least allow you > to dump to IDE disks as well, and has the Kerntypes mechanism in place. > > Is there some feature you're looking for in particular? > > --Matt From owner-lkcd@oss.sgi.com Wed Sep 5 12:32:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f85JWew30014 for lkcd-outgoing; Wed, 5 Sep 2001 12:32:40 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85JW5d30007 for ; Wed, 5 Sep 2001 12:32:06 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f85JQvO28172; Wed, 5 Sep 2001 12:26:57 -0700 Message-ID: <3B967E47.32BAE3D9@alacritech.com> Date: Wed, 05 Sep 2001 12:34:31 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: bsuparna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: Latest lkcd code and planned changes References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk bsuparna@in.ibm.com wrote: > > Matt, > > Responding to two of your notes (about the last lkcd code drop) in one > shot. > > >I'm still planning to roll a 4.0 release as soon as I talk to > >the IBM folks about the last code drop I gave them. > > We've tried this code out only a UP, and have just started trying it on a > SMP system. We had a few initial hiccups with dump configuration with the > original scripts, but using test.c and modifying the device number as you > suggested did the trick. As you've mentioned below new scripts and a new > dumpconfig utility would be required for the 4.0 release. The dump configuration utility is checked in. It's in lkcdutils/lkcd_config. All the appropriate scripts/spec files have changed to use it. There's even a manual page, if you can believe that. > Before moving to SMP, we decided to first merge in our changes to enable > system continuation after a dump, by making the other CPUs spin for the > duration of the dump and then release them, rather than making them stop. > (We are now using dprobes to trigger the dump from a probe point to test > our changes.) How's this working? I'd like to get this into the tree if at all possible so we can get rid of the current "stop" method and get rid of the SMP bugs. > I'm hoping that we can include this in the 4.0 release together with SMP > problem fixes that you are working on. When are you planning on the release > ? I'm ready to release it now, believe it or not. I don't have to release the dump_gzip.c code just yet, as I'm still improving it, but at least everything will work with the new methodology. > The next thing that we are trying to implement is to get non-disruptive > dumps to work from any context, including interrupt context, based on some > of the ideas we'd discussed earlier. We are attempting to get this to work > with the current basic dump i/o model for the non-disruptive dumps case. > (We may need to relook at it later once the dump driver interface is in > place, though only for devices that implement/register such an interface) > Will discuss this in more detail after we've tried out a few things ... Okay. Hey, I was thinking. Right now, we open up /dev/dump (227,0) to do our ioctl()s against. If we make that our major number by default, we could have multiple dump instantiations in the kernel by working against the minor number. Would that work for you, David, or how were you planning to do this? > >For those who are working directly in the tree, you'll note we're > >now moving from 'vmdump' to 'dump' conventions, and hopefully all > >the future scripts will use this as well. > > BTW, I did try directly accessing the CVS tree, which works. Great. > >Also, I spoke to someone at MCL, and we'll see how we can roll in > >mcore into the LKCD project in some capacity. > > That's good news! We wanted to check with you on this. Do we now have a > contact at MCL whom we can work with to do this, so that we have a fallback > standalone dump feature ? I believe so. I've just started communicating with Mike Keefe. He's sent me a patch (among other things), and I'm in the process of review and seeing how we can integrate it, and then mcore. > >The latest code is in the SourceForge tree ... look in > >2.4/drivers/block/dump.c, > >and you'll see the restructuring changes. 'lcrash' has also > >changed a bit.I copied the LKCD group on my last check-in. > >If you didn't get a copy of it,let me know. It touched a bunch > >of files. > >I have to check in new scripts and a new dumpconfig utility next > >(and fix this bloody SMP problem now that I actually have an SMP > >system again to test against). > > Do let us know how this goes. We had to give some thought to a few of the > SMP issues for the non-disruptive case (not that we're sure if we've got it > right or thought of all subtle race possibilities ! ), so it would be > interesting to discuss this more (I remember you mentioned fixing the CPU 0 > special cases when we talked last). I've checked in almost everything you can imagine now: - lkcd_config - new /sbin/lkcd (instead of /sbin/vmdump) - modifications to rc.sysinit scripts - manual page modifications for lcrash/lkcd_config - updated spec file to build new lkcdutils-4.0 - all 2.4 code is checked in, all header mods done The _only_ things left to fix on my plate includes: SMP issue (not always dumping) gzip dump compression changes (kernel/lcrash) After those two are done, then we talk about multiple dump devices, multiple dump methods, integrating all your non-disruptive dumping code, new kdb/kgdb/dprobes hooks, and adding in the dump() functionality to the block_device_operations structure, and then finishing up an IDE dump function. Should be fun! > Regards > Suparna Thanks, Suparna. :) BTW, I'll be out on #lkcd late tonight to discuss some of this. For those that are curious, we're currently connecting to irc.kernel.org/#lkcd with IRC to talk about this stuff pretty late in the evening. --Matt From owner-lkcd@oss.sgi.com Wed Sep 5 15:44:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f85MiIn01349 for lkcd-outgoing; Wed, 5 Sep 2001 15:44:18 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85Mi2d01340 for ; Wed, 5 Sep 2001 15:44:02 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f85Md8O01891; Wed, 5 Sep 2001 15:39:08 -0700 Message-ID: <3B96AB52.7D527466@alacritech.com> Date: Wed, 05 Sep 2001 15:46:42 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: r1vamsi@in.ibm.com CC: "Schaal, Richard" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk As far as 'kdb' and 'lkcd' is concerned (excluding 'kgdb' for the moment), anyone hankering to work on this? Otherwise, it goes on the list of things-to-do. I've got a few things on my plate at the moment so I can't go off and do this right now. Later, yes, but if someone wants this in 4.0, please speak up now so it's on the list of included items. :) --Matt r1vamsi@in.ibm.com wrote: > > Richard, > > I agree with you completely on the rationale for wanting to dump from kdb. > In fact, one could choose to trigger a dump (after which the system will > likely continue to run), not just from KDB, which requires manual > intervention, but from other debugging tools such as the IBM Dynamic > Probes, where this could be done automatically. > > We are building "non-disruptive" dumps capability into lkcd, which will let > the system continue normal execution after the dump is taken. > > These features will probably find more use when dumps are used for > debugging other problem situations like performace related problems besides > oops/panics. > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > "Matt D. Robinson" on 09/05/2001 05:07:14 AM > > Please respond to "Matt D. Robinson" > > To: "Schaal, Richard" > cc: S Vamsikrishna/India/IBM@IBMIN, lkcd@oss.sgi.com, > akale@users.sourceforge.net, kaos@ocs.com.au > Subject: Re: LKCD + KDB ? > > "Schaal, Richard" wrote: > > > > I think it would be relatively simple to have the dump_init code register > a > > dump system > > function with the kernel debugger so that you could dump the system on > > demand. Note that > > not all problems are Oops related, and that a hung system, or one that is > > grossly under performing > > would be useful to get a snapshot of the activity at that time. Manual > > entry to the debugger > > and manual dump would seem to be a useful thing. - System survivability > > after such a dump would be > > nice, but not a show stopper at this point. > > You should already be able to do this with dump_function_ptr in the > latest code. This should be assigned to dump_execute (at least in > the last check-in I made). So if you call that address, you'll get > the dump function pointer. > > > So far as the dumping or not after an oops and entering kdb, there is a > > differentiation as to the reason > > for entering the debugger - you might derive a dump/no dump directive > from > > whether you enter the debugger > > by reason of breakpoint or oops? > > I'm curious, how many people drop into kdb, and then want to take a dump? > I'd think that this is very useful for developers, but not as useful for > customers who want to crash and reboot. > > > I used to work for Stratus Computer - at that time, a panic or oops would > > put us into the debugger, and if we > > were successful in patching up the problem, the system could resume > > execution. In Linux, after an oops, maybe > > a "nodump" command would be useful as well to disable the dumping that > might > > normally occur. > > This is fine -- I think these are all reasonable extensions to KDB, and > I can work with that developer if need be to make that happen. There's > an easy solution, one way or another. > > --Matt > > > Regards, > > Richard > > > > -----Original Message----- > > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > > Sent: Monday, September 03, 2001 2:55 AM > > To: Matt D. Robinson > > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > > Subject: Re: LKCD + KDB ? > > > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > > dump_execute will be called after we exit the debugger. > > > > If all you want is to disable dump taking after exiting debugger, that is > > easy enough with editing the dump_okay flag from within the debugger (or > > add a kdb command to do this) as Matt points out. Assuming there is a > good > > reason for wanting to take the dump from within the debugger, one should > > add a simple dump command to kdb, which will just call dump_execute with > > proper regs. What you could do today is to set eip to dump_execute from > > with in the kernel, editing the stack to push correct params :-) (not as > > hard as it sounds, really) > > > > However, the cleaner approach obviously is to add the kdb dump command, > > once we understand a little better why exactly would one want to dump > from > > within the debugger (on an oops). > > > > Regards.. Vamsi. > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Lab, Bangalore. > > Ph: +91 80 5262355 Extn: 3959 > > Internet: r1vamsi@in.ibm.com > > > > Please respond to "Matt D. Robinson" > > > > To: richard.schaal@intel.com > > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > > Subject: Re: LKCD + KDB ? > > > > Richard Schaal wrote: > > > > > > > > > My question is this - I have been a fan of the kernel debugger for some > > > time, and have had a bit of difficulty > > > resolving how to configure both capabilities into my kernel. I guess > > > what I'd like to have happen is to > > > have the system enter the debugger on an oops, then have the option of > > > dumping the system from the debugger, or > > > to dump the system automatically after the debugger is exited. > > > > There's no great way to do this right now. If in kdb you can set the > > field of 'dump_okay' field to FALSE, then reset it after dropping back > > from the debugger state, that'd be fine. I guess we could also add in > > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > > and when dump_execute() gets called, dump_kdb is checked, and if set > > to TRUE, resets it to FALSE. Then add a kdb command that sets the > > field for you ... > > > > Would that work? > > > > --Matt > > > > > What is your thinking on this? Did I goof something up in applying the > > > patches for the two features? > > > > > > Thanks, > > > Richard > > > > > > -- > > > Richard.Schaal@intel.com Intel Corporation > > > Ph: (408)765-1579 Richard Schaal > > > Mail Stop SC12-308 > > > 3600 Juliette Lane > > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Wed Sep 5 15:49:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f85Mnsk01455 for lkcd-outgoing; Wed, 5 Sep 2001 15:49:54 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f85Mnhd01451 for ; Wed, 5 Sep 2001 15:49:43 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f85MilO02101; Wed, 5 Sep 2001 15:44:47 -0700 Message-ID: <3B96ACA5.9399AB57@alacritech.com> Date: Wed, 05 Sep 2001 15:52:21 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Amit S. Kale" CC: "Schaal, Richard" , "'r1vamsi@in.ibm.com'" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C5786648485D@fmsmsx50.fm.intel.com> <3B956DC7.F5F3559F@alacritech.com> <3B959ED0.E33BA3BC@vsnl.net> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Amit S. Kale" wrote: > > Hi Matt, > > I have faced several times the problem of crash dumps not being > available > in kgdb. Many a times I don't have time to debug a panic immediately, so > I keep the machine inside the debugger. A crash dump will enable me to > save > a crash dump and continue testing. I can get back to the core dump > later. > > Usually it's a good idea to save cores for all non-trivial problems once > a product goes alpha. If a problem which is supposedly fixed resurfaces, > it's very difficult to say whether it's the same problem in absence of a > core > dump. > > In ideal world, all problems should be fixed immediately and completely > using a debugger and we wouldn't need crash dumps. > > I guess it's time to think about making kgdb understand lkcd interface. This should be a pretty straightfoward thing to add. If you don't want to do it, let me know, so I can put it on my list. I think it's a high enough priority to get done for the next release, if humanly possible. Your latest stuff is in the source tarballs on kgdb.sourceforge.net? --Matt > "Matt D. Robinson" wrote: > > > > "Schaal, Richard" wrote: > > > > > > Hi Matt, > > > When you refer to the "latest code", what is that? I don't see anything on > > > source forge as released code, and the > > > latest from the SGI site has patches up to linux-2.4.4 is that what you were > > > referring to? > > > > > > Thanks, > > > Richard > > > > The latest code is in the SourceForge tree ... look in > > 2.4/drivers/block/dump.c, > > and you'll see the restructuring changes. 'lcrash' has also changed a bit. > > I copied the LKCD group on my last check-in. If you didn't get a copy of it, > > let me know. It touched a bunch of files. > > > > I have to check in new scripts and a new dumpconfig utility next (and fix > > this bloody SMP problem now that I actually have an SMP system again to test > > against). > > > > --Matt > > > > > > > > -----Original Message----- > > > From: Matt D. Robinson [mailto:yakker@alacritech.com] > > > Sent: Tuesday, September 04, 2001 4:37 PM > > > To: Schaal, Richard > > > Cc: 'r1vamsi@in.ibm.com'; lkcd@oss.sgi.com; akale@users.sourceforge.net; > > > kaos@ocs.com.au > > > Subject: Re: LKCD + KDB ? > > > > > > "Schaal, Richard" wrote: > > > > > > > > I think it would be relatively simple to have the dump_init code register > > > a > > > > dump system > > > > function with the kernel debugger so that you could dump the system on > > > > demand. Note that > > > > not all problems are Oops related, and that a hung system, or one that is > > > > grossly under performing > > > > would be useful to get a snapshot of the activity at that time. Manual > > > > entry to the debugger > > > > and manual dump would seem to be a useful thing. - System survivability > > > > after such a dump would be > > > > nice, but not a show stopper at this point. > > > > > > You should already be able to do this with dump_function_ptr in the > > > latest code. This should be assigned to dump_execute (at least in > > > the last check-in I made). So if you call that address, you'll get > > > the dump function pointer. > > > > > > > So far as the dumping or not after an oops and entering kdb, there is a > > > > differentiation as to the reason > > > > for entering the debugger - you might derive a dump/no dump directive from > > > > whether you enter the debugger > > > > by reason of breakpoint or oops? > > > > > > I'm curious, how many people drop into kdb, and then want to take a dump? > > > I'd think that this is very useful for developers, but not as useful for > > > customers who want to crash and reboot. > > > > > > > I used to work for Stratus Computer - at that time, a panic or oops would > > > > put us into the debugger, and if we > > > > were successful in patching up the problem, the system could resume > > > > execution. In Linux, after an oops, maybe > > > > a "nodump" command would be useful as well to disable the dumping that > > > might > > > > normally occur. > > > > > > This is fine -- I think these are all reasonable extensions to KDB, and > > > I can work with that developer if need be to make that happen. There's > > > an easy solution, one way or another. > > > > > > --Matt > > > > > > > Regards, > > > > Richard > > > > > > > > -----Original Message----- > > > > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > > > > Sent: Monday, September 03, 2001 2:55 AM > > > > To: Matt D. Robinson > > > > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > > > > Subject: Re: LKCD + KDB ? > > > > > > > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > > > > dump_execute will be called after we exit the debugger. > > > > > > > > If all you want is to disable dump taking after exiting debugger, that is > > > > easy enough with editing the dump_okay flag from within the debugger (or > > > > add a kdb command to do this) as Matt points out. Assuming there is a good > > > > reason for wanting to take the dump from within the debugger, one should > > > > add a simple dump command to kdb, which will just call dump_execute with > > > > proper regs. What you could do today is to set eip to dump_execute from > > > > with in the kernel, editing the stack to push correct params :-) (not as > > > > hard as it sounds, really) > > > > > > > > However, the cleaner approach obviously is to add the kdb dump command, > > > > once we understand a little better why exactly would one want to dump from > > > > within the debugger (on an oops). > > > > > > > > Regards.. Vamsi. > > > > > > > > Vamsi Krishna S. > > > > Linux Technology Center, > > > > IBM Software Lab, Bangalore. > > > > Ph: +91 80 5262355 Extn: 3959 > > > > Internet: r1vamsi@in.ibm.com > > > > > > > > Please respond to "Matt D. Robinson" > > > > > > > > To: richard.schaal@intel.com > > > > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > > > > Subject: Re: LKCD + KDB ? > > > > > > > > Richard Schaal wrote: > > > > > > > > > > > > > > > My question is this - I have been a fan of the kernel debugger for some > > > > > time, and have had a bit of difficulty > > > > > resolving how to configure both capabilities into my kernel. I guess > > > > > what I'd like to have happen is to > > > > > have the system enter the debugger on an oops, then have the option of > > > > > dumping the system from the debugger, or > > > > > to dump the system automatically after the debugger is exited. > > > > > > > > There's no great way to do this right now. If in kdb you can set the > > > > field of 'dump_okay' field to FALSE, then reset it after dropping back > > > > from the debugger state, that'd be fine. I guess we could also add in > > > > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > > > > and when dump_execute() gets called, dump_kdb is checked, and if set > > > > to TRUE, resets it to FALSE. Then add a kdb command that sets the > > > > field for you ... > > > > > > > > Would that work? > > > > > > > > --Matt > > > > > > > > > What is your thinking on this? Did I goof something up in applying the > > > > > patches for the two features? > > > > > > > > > > Thanks, > > > > > Richard > > > > > > > > > > -- > > > > > Richard.Schaal@intel.com Intel Corporation > > > > > Ph: (408)765-1579 Richard Schaal > > > > > Mail Stop SC12-308 > > > > > 3600 Juliette Lane > > > > > "I can type faster than I think!" Santa Clara, CA 95052 > > -- > Amit S. Kale > Linux Consultant, Pune, India. (kgdb@vsnl.net) > Linux kernel source level debugger http://kgdb.sourceforge.net/ > Translation filesystem http://trfs.sourceforge.net/ From owner-lkcd@oss.sgi.com Wed Sep 5 22:30:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f865UP708194 for lkcd-outgoing; Wed, 5 Sep 2001 22:30:25 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f865U2d08188 for ; Wed, 5 Sep 2001 22:30:03 -0700 Received: from f02n15e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f865KMT350160; Thu, 6 Sep 2001 15:20:22 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n15e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f865NlI125110; Thu, 6 Sep 2001 15:23:47 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABF.001DA6A3 ; Thu, 6 Sep 2001 15:23:52 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: vamsi_krishna@in.ibm.com To: "Matt D. Robinson" cc: "Schaal, Richard" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au, hanrahat@us.ibm.com, richardj_moore@uk.ibm.com Message-ID: Date: Thu, 6 Sep 2001 11:04:32 +0530 Subject: Re: LKCD + KDB ? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk I will have a go at it. What is the time frame for 4.0? Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: r1vamsi@in.ibm.com "Matt D. Robinson" on 09/06/2001 04:16:42 AM Please respond to "Matt D. Robinson" To: S Vamsikrishna/India/IBM@IBMIN cc: "Schaal, Richard" , lkcd@oss.sgi.com, akale@users.sourceforge.net, kaos@ocs.com.au Subject: Re: LKCD + KDB ? As far as 'kdb' and 'lkcd' is concerned (excluding 'kgdb' for the moment), anyone hankering to work on this? Otherwise, it goes on the list of things-to-do. I've got a few things on my plate at the moment so I can't go off and do this right now. Later, yes, but if someone wants this in 4.0, please speak up now so it's on the list of included items. :) --Matt r1vamsi@in.ibm.com wrote: > > Richard, > > I agree with you completely on the rationale for wanting to dump from kdb. > In fact, one could choose to trigger a dump (after which the system will > likely continue to run), not just from KDB, which requires manual > intervention, but from other debugging tools such as the IBM Dynamic > Probes, where this could be done automatically. > > We are building "non-disruptive" dumps capability into lkcd, which will let > the system continue normal execution after the dump is taken. > > These features will probably find more use when dumps are used for > debugging other problem situations like performace related problems besides > oops/panics. > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: r1vamsi@in.ibm.com > > "Matt D. Robinson" on 09/05/2001 05:07:14 AM > > Please respond to "Matt D. Robinson" > > To: "Schaal, Richard" > cc: S Vamsikrishna/India/IBM@IBMIN, lkcd@oss.sgi.com, > akale@users.sourceforge.net, kaos@ocs.com.au > Subject: Re: LKCD + KDB ? > > "Schaal, Richard" wrote: > > > > I think it would be relatively simple to have the dump_init code register > a > > dump system > > function with the kernel debugger so that you could dump the system on > > demand. Note that > > not all problems are Oops related, and that a hung system, or one that is > > grossly under performing > > would be useful to get a snapshot of the activity at that time. Manual > > entry to the debugger > > and manual dump would seem to be a useful thing. - System survivability > > after such a dump would be > > nice, but not a show stopper at this point. > > You should already be able to do this with dump_function_ptr in the > latest code. This should be assigned to dump_execute (at least in > the last check-in I made). So if you call that address, you'll get > the dump function pointer. > > > So far as the dumping or not after an oops and entering kdb, there is a > > differentiation as to the reason > > for entering the debugger - you might derive a dump/no dump directive > from > > whether you enter the debugger > > by reason of breakpoint or oops? > > I'm curious, how many people drop into kdb, and then want to take a dump? > I'd think that this is very useful for developers, but not as useful for > customers who want to crash and reboot. > > > I used to work for Stratus Computer - at that time, a panic or oops would > > put us into the debugger, and if we > > were successful in patching up the problem, the system could resume > > execution. In Linux, after an oops, maybe > > a "nodump" command would be useful as well to disable the dumping that > might > > normally occur. > > This is fine -- I think these are all reasonable extensions to KDB, and > I can work with that developer if need be to make that happen. There's > an easy solution, one way or another. > > --Matt > > > Regards, > > Richard > > > > -----Original Message----- > > From: r1vamsi@in.ibm.com [mailto:r1vamsi@in.ibm.com] > > Sent: Monday, September 03, 2001 2:55 AM > > To: Matt D. Robinson > > Cc: richard.schaal@intel.com; lkcd@oss.sgi.com > > Subject: Re: LKCD + KDB ? > > > > When both KDB and LKCD patches are applied, we drop into KDB on an oops. > > dump_execute will be called after we exit the debugger. > > > > If all you want is to disable dump taking after exiting debugger, that is > > easy enough with editing the dump_okay flag from within the debugger (or > > add a kdb command to do this) as Matt points out. Assuming there is a > good > > reason for wanting to take the dump from within the debugger, one should > > add a simple dump command to kdb, which will just call dump_execute with > > proper regs. What you could do today is to set eip to dump_execute from > > with in the kernel, editing the stack to push correct params :-) (not as > > hard as it sounds, really) > > > > However, the cleaner approach obviously is to add the kdb dump command, > > once we understand a little better why exactly would one want to dump > from > > within the debugger (on an oops). > > > > Regards.. Vamsi. > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Lab, Bangalore. > > Ph: +91 80 5262355 Extn: 3959 > > Internet: r1vamsi@in.ibm.com > > > > Please respond to "Matt D. Robinson" > > > > To: richard.schaal@intel.com > > cc: lkcd@oss.sgi.com (bcc: S Vamsikrishna/India/IBM) > > Subject: Re: LKCD + KDB ? > > > > Richard Schaal wrote: > > > > > > > > > My question is this - I have been a fan of the kernel debugger for some > > > time, and have had a bit of difficulty > > > resolving how to configure both capabilities into my kernel. I guess > > > what I'd like to have happen is to > > > have the system enter the debugger on an oops, then have the option of > > > dumping the system from the debugger, or > > > to dump the system automatically after the debugger is exited. > > > > There's no great way to do this right now. If in kdb you can set the > > field of 'dump_okay' field to FALSE, then reset it after dropping back > > from the debugger state, that'd be fine. I guess we could also add in > > something for kdb, a one-time thing, so kdb can set dump_kdb to TRUE, > > and when dump_execute() gets called, dump_kdb is checked, and if set > > to TRUE, resets it to FALSE. Then add a kdb command that sets the > > field for you ... > > > > Would that work? > > > > --Matt > > > > > What is your thinking on this? Did I goof something up in applying the > > > patches for the two features? > > > > > > Thanks, > > > Richard > > > > > > -- > > > Richard.Schaal@intel.com Intel Corporation > > > Ph: (408)765-1579 Richard Schaal > > > Mail Stop SC12-308 > > > 3600 Juliette Lane > > > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Wed Sep 5 23:10:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f866AcB08671 for lkcd-outgoing; Wed, 5 Sep 2001 23:10:38 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f866Aad08668 for ; Wed, 5 Sep 2001 23:10:36 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f866Bc716570; Wed, 5 Sep 2001 23:11:38 -0700 Date: Wed, 5 Sep 2001 23:11:38 -0700 (PDT) From: "Matt D. Robinson" To: cc: "Matt D. Robinson" , "Schaal, Richard" , , , , , Subject: Re: LKCD + KDB ? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Thu, 6 Sep 2001 vamsi_krishna@in.ibm.com wrote: |>I will have a go at it. What is the time frame for 4.0? |> |>Regards.. Vamsi. First off, thanks, Vamsi ... if you can get it done, great. It's as soon as I've got something from Suparna for LKCD, and 'lcrash' can go right now as-is. I'd like to get non-disruptive dumping in there, and if at all possible, the MCL changes will go in along with the gzip compression code. I'm looking at about a week. I'd like to not stretch it out too much further if at all possible. Let's say 9/14. --Matt From owner-lkcd@oss.sgi.com Thu Sep 6 00:09:20 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8679Kr09917 for lkcd-outgoing; Thu, 6 Sep 2001 00:09:20 -0700 Received: from fgwmail6.fujitsu.co.jp (fgwmail6.fujitsu.co.jp [192.51.44.36]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8679Gd09914 for ; Thu, 6 Sep 2001 00:09:17 -0700 Received: from m3.gw.fujitsu.co.jp by fgwmail6.fujitsu.co.jp (8.9.3/3.7W-MX0108-Fujitsu Gateway) id QAA20789; Thu, 6 Sep 2001 16:08:59 +0900 (JST) (envelope-from naomi@pst.fujitsu.com) From: naomi@pst.fujitsu.com Received: from naomi.aoi.pst.fujitsu.com by m3.gw.fujitsu.co.jp (8.9.3/3.7W-0108-Fujitsu Domain Master) id QAA16055; Thu, 6 Sep 2001 16:08:55 +0900 (JST) (envelope-from naomi@pst.fujitsu.com) Received: from localhost (IDENT:naomi@localhost [127.0.0.1]) by naomi.aoi.pst.fujitsu.com (8.9.3/8.9.3) with ESMTP id QAA18666; Thu, 6 Sep 2001 16:08:31 +0900 To: yakker@alacritech.com Cc: lkcd@oss.sgi.com Subject: Re: lcrash sub-commands line completion In-Reply-To: Your message of "Tue, 04 Sep 2001 01:14:07 -0700" <3B948D4F.9D7257B1@alacritech.com> References: <3B948D4F.9D7257B1@alacritech.com> X-Mailer: Mew version 1.92.4 on Emacs 19.34 / Mule 2.3 (SUETSUMUHANA) Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010906160831D.naomi@pst.fujitsu.com> Date: Thu, 06 Sep 2001 16:08:31 +0900 X-Dispatcher: imput version 980905(IM100) Lines: 31 Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Matt-san. Since I have not finished testing this yet, I think that it is difficult to roll it into 4.0. I'd appreciate it if you would roll it into the next release (4.1 or more?). Naomi.Haseo From: "Matt D. Robinson" Subject: Re: lcrash sub-commands line completion Date: Tue, 04 Sep 2001 01:14:07 -0700 > This sounds like a great thing to add. I have no problems with it. > Note that we used to have a readline capability, but we removed it > due to some of the GPL/LGPL licensing conflicts. > > Please let me know if you complete this in the future. I'm still > planning to roll a 4.0 release as soon as I talk to the IBM folks > about the last code drop I gave them. > > For those who are working directly in the tree, you'll note we're > now moving from 'vmdump' to 'dump' conventions, and hopefully all > the future scripts will use this as well. > > Also, I spoke to someone at MCL, and we'll see how we can roll in > mcore into the LKCD project in some capacity. > > Have at it, Naomi-san. :) > > --Matt From owner-lkcd@oss.sgi.com Thu Sep 6 00:23:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f867Ntq10276 for lkcd-outgoing; Thu, 6 Sep 2001 00:23:55 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f867Npd10273 for ; Thu, 6 Sep 2001 00:23:51 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f867SQJ16651; Thu, 6 Sep 2001 00:28:26 -0700 Message-ID: <3B9723FA.C8F4ADB0@alacritech.com> Date: Thu, 06 Sep 2001 00:21:30 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: naomi@pst.fujitsu.com CC: lkcd@oss.sgi.com Subject: Re: lcrash sub-commands line completion References: <3B948D4F.9D7257B1@alacritech.com> <20010906160831D.naomi@pst.fujitsu.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk 4.1 (or 4.0.1) is fine -- I don't suspect it'll be that long between releases, as there's a number of people working on LKCD right now, and everyone wants to get their stuff rolled in and turned over more quickly (which I agree with). --Matt naomi@pst.fujitsu.com wrote: > > Hi, Matt-san. > > Since I have not finished testing this yet, I think that it is difficult > to roll it into 4.0. > I'd appreciate it if you would roll it into the next release (4.1 or more?). > > Naomi.Haseo > > From: "Matt D. Robinson" > Subject: Re: lcrash sub-commands line completion > Date: Tue, 04 Sep 2001 01:14:07 -0700 > > > This sounds like a great thing to add. I have no problems with it. > > Note that we used to have a readline capability, but we removed it > > due to some of the GPL/LGPL licensing conflicts. > > > > Please let me know if you complete this in the future. I'm still > > planning to roll a 4.0 release as soon as I talk to the IBM folks > > about the last code drop I gave them. > > > > For those who are working directly in the tree, you'll note we're > > now moving from 'vmdump' to 'dump' conventions, and hopefully all > > the future scripts will use this as well. > > > > Also, I spoke to someone at MCL, and we'll see how we can roll in > > mcore into the LKCD project in some capacity. > > > > Have at it, Naomi-san. :) > > > > --Matt From owner-lkcd@oss.sgi.com Thu Sep 6 00:25:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f867PXo10331 for lkcd-outgoing; Thu, 6 Sep 2001 00:25:33 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f867PWd10328 for ; Thu, 6 Sep 2001 00:25:32 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f867UCJ16661 for ; Thu, 6 Sep 2001 00:30:12 -0700 Message-ID: <3B972464.609F31D8@alacritech.com> Date: Thu, 06 Sep 2001 00:23:16 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: lkcd@oss.sgi.com Subject: Query regarding LKCD kernel patch and RPM/tar.gz ... Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk I'd like to start releasing the LKCD kernel patch as part of the base RPM and/or include it alongside the release. The whole point behind moving to number synchronization between the kernel patch and the RPM/tar.gz is to make sure things are in line. If the kernel patch is released in the lkcdutils RPM/tar.gz, this becomes much easier. Is this a problem for anyone, especially those rolling their own distributions? If it is, let me know, as I'm flexible. --Matt From owner-lkcd@oss.sgi.com Thu Sep 6 02:26:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f869Qa912382 for lkcd-outgoing; Thu, 6 Sep 2001 02:26:36 -0700 Received: from fgwmail7.fujitsu.co.jp (fgwmail7.fujitsu.co.jp [192.51.44.37]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f869QJd12379 for ; Thu, 6 Sep 2001 02:26:19 -0700 Received: from m4.gw.fujitsu.co.jp by fgwmail7.fujitsu.co.jp (8.9.3/3.7W-MX0108-Fujitsu Gateway) id SAA15521; Thu, 6 Sep 2001 18:26:07 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from classic.aoi.pst.fujitsu.com by m4.gw.fujitsu.co.jp (8.9.3/3.7W-0108-Fujitsu Domain Master) id SAA07839; Thu, 6 Sep 2001 18:25:58 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from doll (doll.aoi.pst.fujitsu.com [172.23.72.214]) by classic.aoi.pst.fujitsu.com (8.9.3/8.9.3) with SMTP id SAA05841; Thu, 6 Sep 2001 18:25:48 +0900 Message-ID: <008401c136b6$05e00600$d64817ac@aoi.pst.fujitsu.com> From: "Masashige Kotani" To: "Matt D. Robinson" Cc: , "Howell, David P" References: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> <3B9563E8.9A432B7B@alacritech.com> Subject: Re: multiple dump devices Date: Thu, 6 Sep 2001 18:26:31 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-lkcd@oss.sgi.com Precedence: bulk > "Howell, David P" wrote: > > > > We are working on a proposal for redundant dump device support that I plan > > to > > share in the next few weeks; I've got a prototype mostly working that can be > > > > contributed. Let me know how you are approaching this, I'll send details of > > what we are doing here later this week. Sounds like a good opportunity for > > collaboration on this. > > > > Regards, > > Dave Howell > > I'm really curious as to the proposal. Sounds like a good idea, the > real question becomes, do you want to chain multiple dump devices with > multiple dump mechanisms? > > Here's where I'm going with this. I just finished the code to allow > people to install their own dump compression mechanisms (right now, it'll > be RLE, I have to check in the GZIP compression module, and people can > put in whatever one they want). Do you want to take the next step and > let people have chains of dump mechanisms based on the dump condition? > I realize multiple dump devices is good, but what if you could plug in > your own dump method with it? Then that dump method could query the > available dump devices configured. > > So you'd have: > > dump methods (one standard, but plug-and-play) > dump devices (requires at least one, multiples allowed, maybe > access lists for methods?) > dump compressions (configurable, usable by some methods) Do you mean as follows, Matt? "Dump methods" means how to use devices configured for dump device to save memory dump, and each of them should be pluggable? (single device as standard, concatenating devices as single dump device, mirroring devices for redundancy ...) Each "dump devices" should be independently configurable about type of compression and dump method ? --Masashige > Would this be the eventual goal? That way, everything is tunable to > their own liking. I figured I'd ask, since if you're going to add in > multiple dump devices, and we've gone to multiple compression types, > you might as well go all the way and add dump methods as well. I > don't know what the rest of the group thinks, but this could be > very useful. > > I'd definitely like to get some feedback ... this is all doable, > as long as the dump compression code is in 'lcrash', and the pages > are dumped in a way that we can find the location in memory, this > can work pretty sweet for everyone here. > > --Matt From owner-lkcd@oss.sgi.com Thu Sep 6 02:26:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f869Qh512396 for lkcd-outgoing; Thu, 6 Sep 2001 02:26:43 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f868hQd11581 for ; Thu, 6 Sep 2001 01:43:27 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f868ih016827; Thu, 6 Sep 2001 01:44:43 -0700 Date: Thu, 6 Sep 2001 01:44:43 -0700 (PDT) From: "Matt D. Robinson" To: Kapish K cc: Subject: Re: Re: lcrash and vmdump In-Reply-To: <200109041939.PAA18654@www23.ureach.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk I'm copying 'lkcd@oss.sgi.com', as someone may find this useful. On Tue, 4 Sep 2001, Kapish K wrote: |>Hello, |> Yes, I understand that - but what I am looking for is to be |>able to know where to look for what.. and the syntax of the |>various commands aren't very well explained or illustrate in the |>online help under lcrash - for example, I need to walk through a |>list fo struct tasks - how do I know the start or for say, one |>particular process, how do I get to the address for the start of |>the task_struct and once given that, how can I use the walk |>comamnd. tried using it, but could not quite understand it.. |>same with mmap - how to know the mmap_list .. lcrash does nto |>say anything about that... also, how to know which pages have |>been mapped or used by which process at that point in time.. |>questin like these are what I am looking answers for... any |>place where some doc exists or do I necessarily have to look at |>code only?? All good questions ... To get the tasks, run 'task'. You'll see the addresses of the structures in the first column. You can then run 'task ' to see the task. 'task -f ' show a little bit more data, and if you want to show _everything_, run 'px *(struct task_struct *)', and then you'll see all the fields. 'px' or 'print' shows you everything in the structure. There's also 'walk', such as 'walk task_struct next_task ', which will walk through the next_task pointers for you. Try 'prev_task' instead of 'next_task' if you're curious. Or read the code in lkcdutils/lcrash/cmds/cmd_walk.c. With 'mmap', you'll need to know where in memory an mm_struct is. For example, look at the 'active_mm' field in the task struct when you run the 'px' command previous listed. If it isn't NULL, you can run 'mmap -f ', where is the address listed as the task_struct.active_mm. All this really involves looking at the code. There are some things you can do as far as debugging is concerned to get a quicker answer, but crash dump analysis is really the science of reading kernel code to figure out why the memory information is wrong. It's not that simple to do, especially on more complex kernels. Linux in many ways is far easier than most OSes; there isn't that much complexity in the SMP code compared to, say, IRIX. But give it some time ... Kernel crash dump analysis really works as follows: 1) figure out what was running (straightforward enough); 2) figure out what those tasks were doing while running; 3) figure out what caused the crash to occur; 4) figure out where in the code the crash occurred; 5) figure out why the crash occurred (hardest to do) Steps 1 - 4 simply involve looking at the crash dump header, seeing what 'stat' says, what the last running task was, and dumping out the stack trace of the task (with 'trace'). Then, look at where in the stack trace the crash occurred by lining it up with the kernel code. Finally (step 5), figure out what would have caused the problem to occur in the first place, and walk back to the condition that caused the trigger (such as a task setting a pointer to NULL while you were accessing it in another task). This last step is tough because you don't always know how a structure/field is set. Memory can be clobbered by other tasks, race conditions can exist, etc., etc., etc. It requires some intrinsic knowledge of reading kernel code to decipher the failing condition. The five most important commands: "stat", "task", "trace", "dis" and "dump". These will be your primary commands when trying to figure out a crash dump. I've taught a number of classes for years all over the world on this very topic, and I've walked users through dozens of example dumps. Perhaps half the class gets it; of that half, maybe one, rarely two take it to the next level and can actually debug a kernel dump from scratch, and those people have normally been doing support for quite some time. This isn't something you can just pick up one day and expect it to be simple ... BUT ... _Everyone_ can return a crash dump report, which is the first step towards solving a kernel crash. That by itself can help a company create a support database, so if they see the same type of system crash over and over and ... show up in their customer support cases, they can quickly review the solution from the first crash dump report (in the call) and fire off a patch, tunable, fix, or other information. The biggest reason for having crash reports is supportability. I hate seeing customers have a crash, after a crash, after another crash, etc., until someone finally figures out the issue, where in the meantime, the customer has lost data, time, resources, etc. This is why the LKCD tries to create a crash dump report upon rebooting. Here's an example of commands to outline your questions. Look for the >> prompt where I type the commands. I hope this helps somewhat. --Matt ---------------------------------------------------------------------------- [root@watereye /root]# lcrash map = /boot/System.map, vmdump = /dev/mem, outfile = stdout, kerntypes = /boot/K erntypes Please wait... Loading system map ............................... Done. Loading type info (Kerntypes) ... Done. Loading ksyms from dump ....... Done. >> task ACTIVE TASKS: ADDR UID PID PPID STATE FLAGS NAME =============================================================================== 0xc02dc000 0 0 0 0 0 swapper 0xdfffc000 0 1 0 1 0x100 init 0xdfff2000 0 2 1 1 0x40 keventd 0xdffee000 0 3 0 1 0x40 ksoftirqd_CPU0 0xdffe4000 0 4 0 1 0x840 kswapd 0xdffe2000 0 5 0 1 0x840 kreclaimd 0xdffe0000 0 6 0 1 0x40 bdflush 0xdffde000 0 7 0 1 0x40 kupdated 0xdffaa000 0 8 1 1 0x40 khubd 0xdfc8c000 0 360 1 1 0x140 syslogd 0xdf7fa000 0 369 1 1 0x140 klogd 0xdf702000 99 383 1 1 0x140 identd 0xdf9e0000 99 387 383 1 0x40 identd 0xdf89c000 99 391 387 1 0x40 identd 0xdf7f4000 99 392 387 1 0x40 identd 0xdf7f0000 99 393 387 1 0x40 identd 0xdf6b8000 0 401 1 1 0x40 atd 0xdf634000 0 415 1 1 0x40 crond 0xdf616000 0 429 1 1 0x40 inetd 0xdf0f2000 0 443 1 1 0x140 httpd 0xdef2a000 99 450 443 1 0x140 httpd 0xdef20000 99 451 443 1 0x140 httpd 0xdeecc000 99 452 443 1 0x140 httpd 0xdeebe000 99 453 443 1 0x140 httpd 0xdeeb2000 99 454 443 1 0x140 httpd 0xdee4c000 99 455 443 1 0x140 httpd 0xdee3e000 99 456 443 1 0x140 httpd 0xdee32000 99 457 443 1 0x140 httpd 0xdeb2c000 0 501 1 1 0x140 sshd 0xdfb7c000 0 505 1 1 0x100 mingetty 0xdf30a000 0 506 1 1 0x100 mingetty 0xdf3dc000 0 507 1 1 0x100 mingetty 0xdf48a000 0 508 1 1 0x100 mingetty 0xdf382000 0 509 1 1 0x100 mingetty 0xdeb26000 0 510 1 1 0x100 mingetty 0xdeb24000 0 629 1 1 0x100 getty 0xdeb20000 0 1896 501 1 0x140 sshd 0xdd806000 0 1898 1896 1 0x100 bash 0xdd7de000 0 1929 1898 0 0x100 lcrash =============================================================================== 39 active task structs found >> task -f 0xdeb2c000 ADDR UID PID PPID STATE FLAGS NAME =============================================================================== 0xdeb2c000 0 501 1 1 0x140 sshd MM:0xdff82260 THREAD: ESP0:0xdeb2e000, ESP:0xdeb2dea8, EIP:0xc0110c42 FS:0, GS:0 =============================================================================== 1 active task struct found >> px *(struct task_struct *)0xdeb2c000 struct task_struct { state = 0x1 flags = 0x140 sigpending = 0x0 addr_limit = mm_segment_t { seg = 0xc0000000 } exec_domain = 0xc02c3ce0 need_resched = 0x0 ptrace = 0x0 lock_depth = 0xffffffff counter = 0xb nice = 0x0 policy = 0x0 mm = 0xdff82260 has_cpu = 0x0 processor = 0x0 cpus_allowed = 0xffffffff run_list = struct list_head { next = (nil) prev = 0xdeb2003c } sleep_time = 0x911c92 next_task = 0xdfb7c000 prev_task = 0xdee32000 active_mm = 0xdff82260 binfmt = 0xc02c5ddc exit_code = 0x0 exit_signal = 0x11 pdeath_signal = 0x0 personality = 0x0 did_exec = 0x0 pid = 0x1f5 pgrp = 0x1f5 tty_old_pgrp = 0x0 session = 0x1f5 tgid = 0x1f5 leader = 0x1 p_opptr = 0xdfffc000 p_pptr = 0xdfffc000 p_cptr = 0xdeb20000 p_ysptr = 0xdfb7c000 p_osptr = 0xdf0f2000 thread_group = struct list_head { next = 0xdeb2c098 prev = 0xdeb2c098 } pidhash_next = (nil) pidhash_pprev = 0xc0327a50 wait_chldexit = wait_queue_head_t { lock = (null){ lock = 0x1 } task_list = (null){ next = 0xdeb2c0ac prev = 0xdeb2c0ac } } vfork_done = (nil) rt_priority = 0x0 it_real_value = 0x57e40 it_prof_value = 0x0 it_virt_value = 0x0 it_real_incr = 0x0 it_prof_incr = 0x0 it_virt_incr = 0x0 real_timer = struct timer_list { list = struct list_head { next = 0xc032e18c prev = 0xdf6b9f7c } expires = 0x9458aa data = 0xdeb2c000 function = 0xc0116b84 } times = struct tms { tms_utime = 0x151 tms_stime = 0x2 tms_cutime = 0xb9b tms_cstime = 0x256 } start_time = 0x693 per_cpu_utime = { [0] 0x151 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 } per_cpu_stime = { [0] 0x2 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 } min_flt = 0xb6 maj_flt = 0x12 nswap = 0x0 cmin_flt = 0xa3b7 cmaj_flt = 0x1798e cnswap = 0x0 swappable = 0x0 uid = 0x0 euid = 0x0 suid = 0x0 fsuid = 0x0 gid = 0x0 egid = 0x0 sgid = 0x0 fsgid = 0x0 ngroups = 0x0 groups = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 } cap_effective = 0xfffffeff cap_inheritable = 0x0 cap_permitted = 0xfffffeff keep_capabilities = 0x0 user = 0xc02c492c rlim = { [0] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [1] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [2] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [3] struct rlimit { rlim_cur = 0x800000 rlim_max = 0xffffffff } [4] struct rlimit { rlim_cur = 0x0 rlim_max = 0x7fffffff } [5] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [6] struct rlimit { rlim_cur = 0x4000 rlim_max = 0x4000 } [7] struct rlimit { rlim_cur = 0x400 rlim_max = 0x400 } [8] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [9] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } [10] struct rlimit { rlim_cur = 0xffffffff rlim_max = 0xffffffff } } used_math = 0x1 comm = "sshd" link_count = 0x0 tty = (nil) locks = 0x0 semundo = (nil) semsleeping = (nil) thread = struct thread_struct { esp0 = 0xdeb2e000 eip = 0xc0110c42 esp = 0xdeb2dea8 fs = 0x0 gs = 0x0 debugreg = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 } cr2 = 0x0 trap_no = 0x0 error_code = 0x0 i387 = union i387_union { fsave = struct i387_fsave_struct { cwd = 0x37f swd = 0x0 twd = 0x0 fip = 0x0 fcs = 0x402d5408 foo = 0x0 fos = 0x0 st_space = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x80000000 [19] 0x3fff } status = 0x0 } fxsave = struct i387_fxsave_struct { cwd = 0x37f swd = 0x0 twd = 0x0 fop = 0x0 fip = 0x0 fcs = 0x0 foo = 0x402d5408 fos = 0x0 mxcsr = 0x0 reserved = 0x0 st_space = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x80000000 [18] 0x3fff [19] 0x0 [20] 0x0 [21] 0x80000000 [22] 0x3fff [23] 0x0 [24] 0x7ae14800 [25] 0xa147ae14 [26] 0x3fff [27] 0x0 [28] 0x7ae14800 [29] 0xa147ae14 [30] 0x3fff [31] 0x0 } xmm_space = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 } padding = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 [32] 0x0 [33] 0x0 [34] 0x0 [35] 0x0 [36] 0x0 [37] 0x0 [38] 0x0 [39] 0x0 [40] 0x0 [41] 0x0 [42] 0x0 [43] 0x0 [44] 0x0 [45] 0x0 [46] 0x0 [47] 0x0 [48] 0x0 [49] 0x0 [50] 0x0 [51] 0x0 [52] 0x0 [53] 0x0 [54] 0x0 [55] 0x0 } } soft = struct i387_soft_struct { cwd = 0x37f swd = 0x0 twd = 0x0 fip = 0x0 fcs = 0x402d5408 foo = 0x0 fos = 0x0 st_space = { [0] 0x0 [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x80000000 [19] 0x3fff } ftop = 0x0 changed = 0x0 lookahead = 0x0 no_update = 0x0 rm = 0x0 alimit = 0x0 info = 0x80000000 entry_eip = 0x3fff } } vm86_info = (nil) screen_bitmap = 0x0 v86flags = 0x0 v86mask = 0x0 v86mode = 0x0 saved_esp0 = 0x0 ioperm = 0x0 io_bitmap = { [0] 0xffffffff [1] 0x0 [2] 0x0 [3] 0x0 [4] 0x0 [5] 0x0 [6] 0x0 [7] 0x0 [8] 0x0 [9] 0x0 [10] 0x0 [11] 0x0 [12] 0x0 [13] 0x0 [14] 0x0 [15] 0x0 [16] 0x0 [17] 0x0 [18] 0x0 [19] 0x0 [20] 0x0 [21] 0x0 [22] 0x0 [23] 0x0 [24] 0x0 [25] 0x0 [26] 0x0 [27] 0x0 [28] 0x0 [29] 0x0 [30] 0x0 [31] 0x0 [32] 0x0 } } fs = 0xdfd04ae0 files = 0xdee34be0 sigmask_lock = spinlock_t { lock = 0x1 } sig = 0xdefa3aa0 blocked = sigset_t { sig = { [0] 0x0 [1] 0x0 } } pending = struct sigpending { head = (nil) tail = 0xdeb2c648 signal = sigset_t { sig = { [0] 0x0 [1] 0x0 } } } sas_ss_sp = 0x0 sas_ss_size = 0x0 notifier = 0x0 notifier_data = (nil) notifier_mask = (nil) parent_exec_id = 0x6 self_exec_id = 0x7 alloc_lock = spinlock_t { lock = 0x1 } } >> px (*(struct task_struct *)0xdeb2c000)->active_mm 0xdff82260 >> mmap -f 0xdff82260 ADDR MM_COUNT MAP_COUNT MMAP =========================================== 0xdff82260 1 18 0xded50760 START_CODE:0x8048000, END_CODE:0x80760ce START_DATA:0x80770e0, END_DATA:0x80790f8 START_BRK:0x807f62c, START_STACK:0xbffffe10 ARG_START:0xbffffee7, ARG_END:0xbffffeec TOTAL_VM:0x191 =========================================== 1 active mm_struct struct found >> whatis mm_struct struct mm_struct { struct vm_area_struct *mmap; struct vm_area_struct *mmap_avl; struct vm_area_struct *mmap_cache; pgd_t *pgd; atomic_t mm_users; atomic_t mm_count; int map_count; struct rw_semaphore { long int count; spinlock_t wait_lock; struct list_head { struct list_head *next; struct list_head *prev; } wait_list; } mmap_sem; spinlock_t page_table_lock; struct list_head { struct list_head *next; struct list_head *prev; } mmlist; long unsigned int start_code; long unsigned int end_code; long unsigned int start_data; long unsigned int end_data; long unsigned int start_brk; long unsigned int brk; long unsigned int start_stack; long unsigned int arg_start; long unsigned int arg_end; long unsigned int env_start; long unsigned int env_end; long unsigned int rss; long unsigned int total_vm; long unsigned int locked_vm; long unsigned int def_flags; long unsigned int cpu_vm_mask; long unsigned int swap_address; unsigned int dumpable :1; mm_context_t context; }; >> trace 0xdeb2c000 ================================================================ STACK TRACE FOR TASK: 0xdeb2c000(sshd) 0 schedule+1142 [0xc0110c42] 1 schedule_timeout+18 [0xc01106b2] 2 do_select+179 [0xc01413c3] 3 sys_select+1073 [0xc014199d] 4 system_call+44 [0xc0106d84] ================================================================ ---------------------------------------------------------------------------- |>TIA From owner-lkcd@oss.sgi.com Thu Sep 6 08:07:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f86F7nB18916 for lkcd-outgoing; Thu, 6 Sep 2001 08:07:49 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f86F7ed18907 for ; Thu, 6 Sep 2001 08:07:41 -0700 Received: from f02n16e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f86F3xp102082 for ; Fri, 7 Sep 2001 01:03:59 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f86F7QL89016 for ; Fri, 7 Sep 2001 01:07:26 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ABF.00531250 ; Fri, 7 Sep 2001 01:07:21 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: bsuparna@in.ibm.com To: "Matt D. Robinson" cc: lkcd@oss.sgi.com, ssubodh@in.ibm.com Message-ID: Date: Thu, 6 Sep 2001 20:15:42 +0500 Subject: Re: Latest lkcd code and planned changes Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Sorry, I got disconnected from lkcd and couldn't connect back in ... >The dump configuration utility is checked in. >It's in lkcdutils/lkcd_config. >All the appropriate scripts/spec files have changed to use it. >There's even a manual page, if you can believe that. OK. Have pulled it over. Subodh is trying it out ... >>Before moving to SMP, we decided to first merge in our changes to enable >>system continuation after a dump, by making the other CPUs spin for the >>duration of the dump and then release them, rather than making them stop. >>(We are now using dprobes to trigger the dump from a probe point to test >>our changes.) > >How's this working? I'd like to get this into the tree if at all possible >so we can get rid of the current "stop" method and get rid of the SMP bugs. Hmm.. I guess what you are saying here is that if we have something that works correctly for our case, we could simply use that for the panic dump situation as well, and fall into the actual stop cpu + restart code after the dump is through. Is that correct ? As I mentioned, we aren't quite there yet in terms of perfect behaviour on SMP in all cases. We have it a little easier for non-disruptive dumps in some respect, because we can assume that the system is operational, yet there still are a few things to think about. The changes that we've made and tried out so far, simply involved issuing smp_call_function with dump_spin() rather than with stop_this_cpu() in for the non-disruptive case, where dump_spin() just spins in a busy loop as long as a flag is enabled (that is, for the duration of the dump), and turning of this flag after the dump is through. Well, we also make sure that NMI watchdog doesn't complain about this spin, on those CPUs. I'm not sure if we need to modify the cpu_online_map as yet, if we are stopping all scheduling - have to check that out. That's probably not enough. When we tried this earlier (with your older patch) things seemed to work (irrespective of which CPU triggered the dump). But then, we just may not have hit some conditions. It seems like we should change the irq affinities to make sure disk interrupts don't go to the spinning CPUs. (i.e of course if we keep interrupts disabled on those CPUs while they spin; Need to think of what would happen if we keep interrupts enabled ) [If we do change irq affinities, then for non-disruptive dump we would have to save the old values and restore them after we are through; we can't assume which irq this would be, so perhaps we could do this for all]. If the smp_call_function_interrupt (or its equivalent) didn't ack the irq right away, there is a possibility that the arbitration priority for apic of the spinning cpus would be higher than the dumping cpu, so interrupts would always go to the dumping cpu, but that's kind of a fragile approach, I guess, and may lead to other problems. In any case the solution needs to be more generic than that. In the panic/forced dump case, could there be problems with smp_call_function waiting for the other CPUs to enter the IPI interrupt handler ? Using an NMI interrupt like kdb does was an option we'd explored earlier, but this could abruptly interrupt a CPU while it holds some locks or state which could be required by the dump i/o path, so spinning inside the NMI interrupt doesn't seem suitable without any added caution. A part of getting this right for all cases, is ensuring that dump can be triggered from any context, i.e. whatever each of the CPUs may be doing at that instant, and that's an aspect that we are looking into next. How much of that solution would be acceptable/usable in a panic type dump may be a question, but for the moment, we could at least get some more cases working before we have a complete solution there (with your block device dump interface plus mcore). >I believe so. I've just started communicating with Mike Keefe. He's >sent me a patch (among other things), and I'm in the process of review >and seeing how we can integrate it, and then mcore. Good ! Hope we can start discussing this more soon. >I've checked in almost everything you can imagine now: > > - all 2.4 code is checked in, all header mods done Did you change anything in the 2.4 tree that isn't in the patch you'd sent me earlier ? Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 From owner-lkcd@oss.sgi.com Thu Sep 6 12:28:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f86JSFV23789 for lkcd-outgoing; Thu, 6 Sep 2001 12:28:15 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f86JS6d23778 for ; Thu, 6 Sep 2001 12:28:06 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f86JNAO01536; Thu, 6 Sep 2001 12:23:14 -0700 Message-ID: <3B97CEE5.ED870202@alacritech.com> Date: Thu, 06 Sep 2001 12:30:45 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Masashige Kotani CC: lkcd@oss.sgi.com, "Howell, David P" Subject: Re: multiple dump devices References: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> <3B9563E8.9A432B7B@alacritech.com> <008401c136b6$05e00600$d64817ac@aoi.pst.fujitsu.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Masashige Kotani wrote: > > > "Howell, David P" wrote: > > > > > > We are working on a proposal for redundant dump device support that I > plan > > > to > > > share in the next few weeks; I've got a prototype mostly working that > can be > > > > > > contributed. Let me know how you are approaching this, I'll send details > of > > > what we are doing here later this week. Sounds like a good opportunity > for > > > collaboration on this. > > > > > > Regards, > > > Dave Howell > > > > I'm really curious as to the proposal. Sounds like a good idea, the > > real question becomes, do you want to chain multiple dump devices with > > multiple dump mechanisms? > > > > Here's where I'm going with this. I just finished the code to allow > > people to install their own dump compression mechanisms (right now, it'll > > be RLE, I have to check in the GZIP compression module, and people can > > put in whatever one they want). Do you want to take the next step and > > let people have chains of dump mechanisms based on the dump condition? > > I realize multiple dump devices is good, but what if you could plug in > > your own dump method with it? Then that dump method could query the > > available dump devices configured. > > > > So you'd have: > > > > dump methods (one standard, but plug-and-play) > > dump devices (requires at least one, multiples allowed, maybe > > access lists for methods?) > > dump compressions (configurable, usable by some methods) > > Do you mean as follows, Matt? > > "Dump methods" means how to use devices configured for dump device to > save memory dump, and each of them should be pluggable? > (single device as standard, concatenating devices as single dump device, > mirroring devices for redundancy ...) I guess what I mean is more like the following: Assume there are multiple dump devices in /dev: /dev/dump/dump0 (major 227, minor 0) /dev/dump/dump1 (major 227, minor 1) /dev/dump/dumpN (major 227, minor N) All of these are configurable via open() and ioctl(), and each can have their own individual dumping strategies based on configuration. For example, let's say /dev/dump/dump0 is configured with dump method A, RLE compression, and is non-disruptive. /dev/dump/dump1 is configured with dump method B (let's say it's Mission Critical Linux's MCLX crash utility), no compression, and is disruptive. Etc., etc., etc. Each can be triggered via dprobes, or a system crash, SysRQ, or any other mechanism. As far as dump methods are concerned, we'd take dump_execute() and turn it into a dump method launch. So, if it gets called, it walks through the dump methods and determines which one to execute based on what is configured. The final step is to make each dump method a module rather than something statically built into the kernel. This eliminates the need for massive dump overhead in the kernel code -- each module can do its own thing with respect to method, compression, etc. It can even determine whether it wants the kernel to go silent or not (like what IBM's trying to do). So, 'lsmod' might show: [root@watereye /root]# lsmod Module Size Used by dump_rle 1104 0 (unused) dump_gzip 8906 0 (unused) dump_method_lkcd 89712 0 (unused) dump_method_mclx 22319 0 (unused) dump 17248 0 [dump_rle] So a dump method module, when loaded, can then be used by a dump device for crashing. The only complexity is writing a nice user utility that configures how crashing is performed, and loads all the right modules for you (or sets them up to load). Does this make any sense at all? :) What do you think? Anyone have any thoughts on this? --Matt > Each "dump devices" should be independently configurable about type of > compression and dump method ? > > --Masashige > > > Would this be the eventual goal? That way, everything is tunable to > > their own liking. I figured I'd ask, since if you're going to add in > > multiple dump devices, and we've gone to multiple compression types, > > you might as well go all the way and add dump methods as well. I > > don't know what the rest of the group thinks, but this could be > > very useful. > > > > I'd definitely like to get some feedback ... this is all doable, > > as long as the dump compression code is in 'lcrash', and the pages > > are dumped in a way that we can find the location in memory, this > > can work pretty sweet for everyone here. > > > > --Matt From owner-lkcd@oss.sgi.com Fri Sep 7 04:21:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f87BLl710151 for lkcd-outgoing; Fri, 7 Sep 2001 04:21:47 -0700 Received: from fgwmail5.fujitsu.co.jp (fgwmail5.fujitsu.co.jp [192.51.44.35]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f87BLgd10145 for ; Fri, 7 Sep 2001 04:21:42 -0700 Received: from m4.gw.fujitsu.co.jp by fgwmail5.fujitsu.co.jp (8.9.3/3.7W-MX0108-Fujitsu Gateway) id UAA15737; Fri, 7 Sep 2001 20:21:20 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from classic.aoi.pst.fujitsu.com by m4.gw.fujitsu.co.jp (8.9.3/3.7W-0108-Fujitsu Domain Master) id UAA11116; Fri, 7 Sep 2001 20:21:18 +0900 (JST) (envelope-from m-kotani@pst.fujitsu.com) Received: from doll (doll.aoi.pst.fujitsu.com [172.23.72.214]) by classic.aoi.pst.fujitsu.com (8.9.3/8.9.3) with SMTP id UAA27822; Fri, 7 Sep 2001 20:21:17 +0900 Message-ID: <007501c1378f$515a2dc0$d64817ac@aoi.pst.fujitsu.com> From: "Masashige Kotani" To: "Matt D. Robinson" Cc: "Howell, David P" , References: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> <3B9563E8.9A432B7B@alacritech.com> <008401c136b6$05e00600$d64817ac@aoi.pst.fujitsu.com> <3B97CEE5.ED870202@alacritech.com> Subject: Re: multiple dump devices Date: Fri, 7 Sep 2001 20:22:02 +0900 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.00.2615.200 X-MimeOLE: Produced By Microsoft MimeOLE V5.00.2615.200 Sender: owner-lkcd@oss.sgi.com Precedence: bulk > I guess what I mean is more like the following: > > Assume there are multiple dump devices in /dev: > > /dev/dump/dump0 (major 227, minor 0) > /dev/dump/dump1 (major 227, minor 1) > /dev/dump/dumpN (major 227, minor N) > > All of these are configurable via open() and ioctl(), and each can have > their own individual dumping strategies based on configuration. For > example, let's say /dev/dump/dump0 is configured with dump method A, > RLE compression, and is non-disruptive. /dev/dump/dump1 is configured > with dump method B (let's say it's Mission Critical Linux's MCLX crash > utility), no compression, and is disruptive. Etc., etc., etc. > > Each can be triggered via dprobes, or a system crash, SysRQ, or > any other mechanism. > > As far as dump methods are concerned, we'd take dump_execute() and > turn it into a dump method launch. So, if it gets called, it walks > through the dump methods and determines which one to execute based > on what is configured. The final step is to make each dump method > a module rather than something statically built into the kernel. > This eliminates the need for massive dump overhead in the kernel > code -- each module can do its own thing with respect to method, > compression, etc. It can even determine whether it wants the kernel > to go silent or not (like what IBM's trying to do). > > So, 'lsmod' might show: > > [root@watereye /root]# lsmod > Module Size Used by > dump_rle 1104 0 (unused) > dump_gzip 8906 0 (unused) > dump_method_lkcd 89712 0 (unused) > dump_method_mclx 22319 0 (unused) > dump 17248 0 [dump_rle] Are the method modules and the compress modules used two or more in once dumping? > > So a dump method module, when loaded, can then be used by a dump > device for crashing. The only complexity is writing a nice user > utility that configures how crashing is performed, and loads all > the right modules for you (or sets them up to load). > > Does this make any sense at all? :) What do you think? > Anyone have any thoughts on this? I didn't imagine such construction. Since it comes to be simply changed according to environment even if without rebuilding a kernel, I think that it becomes easy to use LKCD for user. Thank you for detailed explanation. > > --Matt > --Masashige From owner-lkcd@oss.sgi.com Fri Sep 7 10:46:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f87Hkwu15954 for lkcd-outgoing; Fri, 7 Sep 2001 10:46:58 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f87Hkkd15951 for ; Fri, 7 Sep 2001 10:46:46 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f87HfsO02231; Fri, 7 Sep 2001 10:41:54 -0700 Message-ID: <3B9908AB.99C5DF81@alacritech.com> Date: Fri, 07 Sep 2001 10:49:31 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Masashige Kotani CC: "Howell, David P" , lkcd@oss.sgi.com Subject: Re: multiple dump devices References: <10C8636AE359D4119118009027AE99870CE2F95B@FMSMSX34> <3B9563E8.9A432B7B@alacritech.com> <008401c136b6$05e00600$d64817ac@aoi.pst.fujitsu.com> <3B97CEE5.ED870202@alacritech.com> <007501c1378f$515a2dc0$d64817ac@aoi.pst.fujitsu.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Masashige Kotani wrote: > > > I guess what I mean is more like the following: > > > > Assume there are multiple dump devices in /dev: > > > > /dev/dump/dump0 (major 227, minor 0) > > /dev/dump/dump1 (major 227, minor 1) > > /dev/dump/dumpN (major 227, minor N) > > > > All of these are configurable via open() and ioctl(), and each can have > > their own individual dumping strategies based on configuration. For > > example, let's say /dev/dump/dump0 is configured with dump method A, > > RLE compression, and is non-disruptive. /dev/dump/dump1 is configured > > with dump method B (let's say it's Mission Critical Linux's MCLX crash > > utility), no compression, and is disruptive. Etc., etc., etc. > > > > Each can be triggered via dprobes, or a system crash, SysRQ, or > > any other mechanism. > > > > As far as dump methods are concerned, we'd take dump_execute() and > > turn it into a dump method launch. So, if it gets called, it walks > > through the dump methods and determines which one to execute based > > on what is configured. The final step is to make each dump method > > a module rather than something statically built into the kernel. > > This eliminates the need for massive dump overhead in the kernel > > code -- each module can do its own thing with respect to method, > > compression, etc. It can even determine whether it wants the kernel > > to go silent or not (like what IBM's trying to do). > > > > So, 'lsmod' might show: > > > > [root@watereye /root]# lsmod > > Module Size Used by > > dump_rle 1104 0 (unused) > > dump_gzip 8906 0 (unused) > > dump_method_lkcd 89712 0 (unused) > > dump_method_mclx 22319 0 (unused) > > dump 17248 0 [dump_rle] > > Are the method modules and the compress modules used > two or more in once dumping? Yes -- as they can be launched under different circumstances. Vamsi at IBM has come up with what I think to be a pretty cool way to get us to dump methods (using a kernel thread). I hope he can send off some comments on this. > > So a dump method module, when loaded, can then be used by a dump > > device for crashing. The only complexity is writing a nice user > > utility that configures how crashing is performed, and loads all > > the right modules for you (or sets them up to load). > > > > Does this make any sense at all? :) What do you think? > > Anyone have any thoughts on this? > > I didn't imagine such construction. Since it comes to be simply changed > according to environment even if without rebuilding a kernel, > I think that it becomes easy to use LKCD for user. Definitely. Then it becomes a matter of coming up with user level software to make dump configuration easy and clear. > Thank you for detailed explanation. Thanks, Masashige-san. If anyone else has comments, feel free to let me know if this is way off base or not. --Matt > > > > --Matt > > > > --Masashige From owner-lkcd@oss.sgi.com Tue Sep 11 01:32:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8B8Wcj11399 for lkcd-outgoing; Tue, 11 Sep 2001 01:32:38 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8B8Vwd11393 for ; Tue, 11 Sep 2001 01:32:34 -0700 Received: from f02n16e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f8B8RcK288526; Tue, 11 Sep 2001 18:27:39 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8B8V4d97786; Tue, 11 Sep 2001 18:31:05 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AC4.002ECE1C ; Tue, 11 Sep 2001 18:31:14 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: vamsi_krishna@in.ibm.com To: yakker@alacritech.com cc: lkcd@oss.sgi.com Message-ID: Date: Tue, 11 Sep 2001 14:14:19 +0530 Subject: issues with dump compiled as a module Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, Even when we compile LKCD as a module, arch/i386/kernel/dump.c gets built into the kernel, which does not seem to be right. Problems: - arch/i386/kernel/dump.c requires global variables in drivers/block/dump.c when __dump_silence_system etc are implemented. However, this cannot compile when LKCD is a module. - it should not be linked into the kernel, wastes kernel memory when dump.o is not loaded. We should probably move arch/i386/kernel/dump.c into a different location and link it to drivers/block/dump.o. Comments? Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Tue Sep 11 10:29:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8BHTGM20918 for lkcd-outgoing; Tue, 11 Sep 2001 10:29:16 -0700 Received: from nixpbe.pdb.sbs.de (energy.pdb.sbs.de [192.109.2.19]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8BHTDd20914 for ; Tue, 11 Sep 2001 10:29:14 -0700 Received: from trulli.pdb.fsc.net (ThisAddressDoesNotExist [172.25.96.20] (may be forged)) by nixpbe.pdb.sbs.de (8.11.2/8.11.2) with ESMTP id f8BHT7E07342 for ; Tue, 11 Sep 2001 19:29:07 +0200 Received: from biker.pdb.fsc.net (biker.pdb.fsc.net [172.25.187.106]) by trulli.pdb.fsc.net (8.9.3/8.9.3) with ESMTP id TAA19384 for ; Tue, 11 Sep 2001 19:29:07 +0200 Received: from localhost (martin@localhost) by biker.pdb.fsc.net (8.11.0/8.11.0) with ESMTP id f8BHQwQ24276 for ; Tue, 11 Sep 2001 19:26:58 +0200 Date: Tue, 11 Sep 2001 19:26:57 +0200 (CEST) From: Martin Wilck To: LKCD mailing list Subject: LKCD newbie questions Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, I have just started testing LKCD. Currently there are patches only for a limited number of kernel versions available. Are there patches for other kernel releases (Linus) somewhere? Is there any chance to apply the existing patches to vendor (SuSE, RedHat) kernels without running into big trouble? I understand from a glance over the mailing list archives that a major new release is due soon - do you recommend me to wait for that until spending a lot of work into ppatchimng different kernels? Regards, Martin -- Martin Wilck Phone: +49 5251 8 15113 Fujitsu Siemens Computers Fax: +49 5251 8 20409 Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com D-33106 Paderborn http://www.fujitsu-siemens.com/primergy From owner-lkcd@oss.sgi.com Wed Sep 12 00:08:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8C784o01189 for lkcd-outgoing; Wed, 12 Sep 2001 00:08:04 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8C781e01186 for ; Wed, 12 Sep 2001 00:08:01 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8C7C1J05515; Wed, 12 Sep 2001 00:12:01 -0700 Message-ID: <3B9F0911.742959D5@alacritech.com> Date: Wed, 12 Sep 2001 00:04:49 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: vamsi_krishna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk vamsi_krishna@in.ibm.com wrote: > > Matt, > > Even when we compile LKCD as a module, arch/i386/kernel/dump.c gets built > into the kernel, which does not seem to be right. This is actually correct (for now) -- we can change it, but if we do, I'd recommend something like drivers/dump/ instead of arch/i386/kernel. We were in arch//kernel for a few reasons: - built with architecture-specific flags; - normally statically built into the kernel; - uses fields in arch/i386/kernel/*.[ch] > Problems: > - arch/i386/kernel/dump.c requires global variables in drivers/block/dump.c > when __dump_silence_system etc are implemented. However, this cannot > compile when LKCD is a module. I didn't get a compile error, and I've been building strictly as a module for now. Are you CONFIG_SMP? > - it should not be linked into the kernel, wastes kernel memory when dump.o > is not loaded. How much memory? > We should probably move arch/i386/kernel/dump.c into a different location > and link it to drivers/block/dump.o. I don't mind moving it to drivers/dump/ and moving drivers/block/dump.c to a new location. Thoughts? > Comments? Let me know what you think of the above, and if you want, go ahead and move them. Note, you'll need to change the Makefiles as well. Thanks, Vamsi. --Matt > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 00:10:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8C7A0Q01259 for lkcd-outgoing; Wed, 12 Sep 2001 00:10:00 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8C79ve01244 for ; Wed, 12 Sep 2001 00:09:57 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8C7ETp05520; Wed, 12 Sep 2001 00:14:30 -0700 Date: Wed, 12 Sep 2001 00:14:29 -0700 (PDT) From: "Matt D. Robinson" To: Martin Wilck cc: LKCD mailing list Subject: Re: LKCD newbie questions In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Martin. Most of the patches apply somewhat cleanly. There are always offset issues if you're trying to use a previously patched kernel. Our development tree is off of the base kernels sans patches. As far as a major release, 4.0 is expected at the end of this week. I would say wait until that release comes out, as any patches after that release will be based on 4.0. --Matt P.S. For those of you in NY, I hope all is well. On Tue, 11 Sep 2001, Martin Wilck wrote: |>Hi, |> |>I have just started testing LKCD. |> |>Currently there are patches only for a limited number of |>kernel versions available. Are there patches for other kernel |>releases (Linus) somewhere? Is there any chance to apply the existing |>patches to vendor (SuSE, RedHat) kernels without running into big |>trouble? |> |>I understand from a glance over the mailing list archives that a major new |>release is due soon - do you recommend me to wait for that until spending |>a lot of work into ppatchimng different kernels? |> |>Regards, |>Martin |> |>-- |>Martin Wilck Phone: +49 5251 8 15113 |>Fujitsu Siemens Computers Fax: +49 5251 8 20409 |>Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com |>D-33106 Paderborn http://www.fujitsu-siemens.com/primergy From owner-lkcd@oss.sgi.com Wed Sep 12 06:07:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CD7HK08647 for lkcd-outgoing; Wed, 12 Sep 2001 06:07:17 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CD7Be08638 for ; Wed, 12 Sep 2001 06:07:11 -0700 Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f8CD4Xc372666; Wed, 12 Sep 2001 23:04:33 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8CD6mM88666; Wed, 12 Sep 2001 23:06:48 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AC5.00480C14 ; Wed, 12 Sep 2001 23:06:56 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: vamsi_krishna@in.ibm.com To: yakker@alacritech.com cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 12 Sep 2001 19:10:19 +0530 Subject: dump_configure_header called twice ?! Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, I am trying to figure out why dump_configure_header is called twice, once from dump_execute_memdump and then from dump_execute (after the call to dump_execute_memdump). It is not clear to me what could have changed in the meantime. Another question is regarding the x86 specific code in dump_execute to capture esp and eip one more time at the end. AFAIU, if we don't do this, the backtrace should show only untill the last time dump_configure_header is called, as the comment there says. I would think calling dump_configure_header once, at the top of dump_execute should do. Do you see any problem with this? Thanks, Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 09:55:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CGtY513627 for lkcd-outgoing; Wed, 12 Sep 2001 09:55:34 -0700 Received: from ganymede.or.intel.com (jffdns01.or.intel.com [134.134.248.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CGtQe13622 for ; Wed, 12 Sep 2001 09:55:31 -0700 Received: from SMTP (orsmsxvs02-1.jf.intel.com [192.168.65.201]) by ganymede.or.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id QAA03908; Wed, 12 Sep 2001 16:55:14 GMT Received: from orsmsx26.jf.intel.com ([192.168.70.26]) by 192.168.70.201 (Norton AntiVirus for Internet Email Gateways 1.0) ; Wed, 12 Sep 2001 16:55:13 0000 (GMT) Received: by orsmsx26.jf.intel.com with Internet Mail Service (5.5.2653.19) id ; Wed, 12 Sep 2001 09:55:12 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C57866484882@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" Cc: "'lkcd@oss.sgi.com'" Subject: RE: LKCD + KDB ? Date: Wed, 12 Sep 2001 09:55:06 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, I picked up the current CVS tree from Sourceforge. In the 2.4 tree, I find several files that would appear to come from the linux kernel that have modifications in them. If I wanted to derive a kernel patch that I could use to apply to an arbitrary kernel, which kernel version would I use as a base to diff against? I'm having some issues with MP system dumping - I'm getting some Oops type issues in the ext2 file system in a stress test - the system tries to dump, but then multiple processors get watchdog timeouts and hose the dump - I'm hoping that the later code from the CVS tree will begin to address this issue so I can get to work on the "real" problem. Kudos for the nifty "roll your own" functions for analysis of the dump file! I've managed to hack together a short function to work on the structures I build up with my debug code. Very slick capability! Thanks, Richard -- Richard.Schaal@intel.com Intel Corporation Ph: (408)765-1579 Richard Schaal Mail Stop SC12-308 3600 Juliette Lane "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Wed Sep 12 10:10:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CHAHg14166 for lkcd-outgoing; Wed, 12 Sep 2001 10:10:17 -0700 Received: from zok.sgi.com (zok.sgi.com [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CHAEe14163 for ; Wed, 12 Sep 2001 10:10:14 -0700 Received: from rock.csd.sgi.com (fddi-rock.csd.sgi.com [130.62.69.10]) by zok.sgi.com (8.11.4/8.11.4/linux-outbound_gateway-1.0) with ESMTP id f8CHA9J03807 for ; Wed, 12 Sep 2001 10:10:09 -0700 Received: from ist.csd.sgi.com (ist.csd.sgi.com [130.62.150.28]) by rock.csd.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id KAA57978; Wed, 12 Sep 2001 10:09:59 -0700 (PDT) Received: from sgi.com by ist.csd.sgi.com via ESMTP (980427.SGI.8.8.8/911001.SGI) id KAA15981; Wed, 12 Sep 2001 10:09:41 -0700 (PDT) Message-ID: <3B9F9949.DF6A6FFA@sgi.com> Date: Wed, 12 Sep 2001 13:20:09 -0400 From: Luc Chouinard Organization: SGI X-Mailer: Mozilla 4.77C-SGI [en] (X11; I; IRIX 6.5-ALPHA-1287133520 IP32) X-Accept-Language: en MIME-Version: 1.0 To: "Schaal, Richard" CC: "'Matt D. Robinson'" , "'lkcd@oss.sgi.com'" Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C57866484882@fmsmsx50.fm.intel.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Schaal, Richard" wrote: > > Kudos for the nifty "roll your own" functions for analysis of the dump file! > I've managed to > hack together a short function to work on the structures I build up with my > debug code. Very slick capability! If you're talking about the scripting capability (sial), then thanks! I've been keeping a low profile because I didn't have the bandwidth to handle potential bug reports from the people in the list :) I'm moving to a new company shortly and plan to be putting more time on this then before as I will probably get involved in porting lcrash to their environment. Please fell free to report any problems or short falls of the interpreter. > > Thanks, > Richard > > -- > Richard.Schaal@intel.com Intel Corporation > Ph: (408)765-1579 Richard Schaal > Mail Stop SC12-308 > 3600 Juliette Lane > "I can type faster than I think!" Santa Clara, CA 95052 -- Luc From owner-lkcd@oss.sgi.com Wed Sep 12 10:36:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CHapw14566 for lkcd-outgoing; Wed, 12 Sep 2001 10:36:51 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CHaje14562 for ; Wed, 12 Sep 2001 10:36:45 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8CHVpO03459; Wed, 12 Sep 2001 10:31:51 -0700 Message-ID: <3B9F9DDB.6D3C9048@alacritech.com> Date: Wed, 12 Sep 2001 10:39:39 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: vamsi_krishna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk vamsi_krishna@in.ibm.com wrote: > > Admittedly, not a significant amount of memory is wasted in the current > scheme of always linking arch-specific code into the kernel. and there is > no compilation problem as of now. However, the code that we will eventually > add to __dump_silence{resume}_system will need access to dump_flags and > possibly some other global variables of the dump driver, which won't be > available in the kernel if dump.o is built as a module. > > I agree with drivers/dump/dump.c, drivers/dump/i386/dump.c etc. I have done > something very similar recently when modifying our dprobes utility to work > as a module. I will look at doing this if there are no other comments. None at this time. Go ahead and move it. BTW, if there are variables that will remain static at all times (such as dump_function_ptr), make sure they are declared in one file. If this means moving dump_function_ptr to init/main.c and declaring everything there, or moving them all to kernel/panic.c, that's fine. This means I shouldn't touch the tree until you're done. Let me know when that is so I can update the files in their new location(s). --Matt > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi_krishna@in.ibm.com > > "Matt D. Robinson" on 09/12/2001 12:34:49 PM > > Please respond to "Matt D. Robinson" > > To: S Vamsikrishna/India/IBM@IBMIN > cc: lkcd@oss.sgi.com > Subject: Re: issues with dump compiled as a module > > vamsi_krishna@in.ibm.com wrote: > > > > Matt, > > > > Even when we compile LKCD as a module, arch/i386/kernel/dump.c gets built > > into the kernel, which does not seem to be right. > > This is actually correct (for now) -- we can change it, but if we do, > I'd recommend something like drivers/dump/ instead of > arch/i386/kernel. > We were in arch//kernel for a few reasons: > > - built with architecture-specific flags; > - normally statically built into the kernel; > - uses fields in arch/i386/kernel/*.[ch] > > > Problems: > > - arch/i386/kernel/dump.c requires global variables in > drivers/block/dump.c > > when __dump_silence_system etc are implemented. However, this cannot > > compile when LKCD is a module. > > I didn't get a compile error, and I've been building strictly as a > module for now. Are you CONFIG_SMP? > > > - it should not be linked into the kernel, wastes kernel memory when > dump.o > > is not loaded. > > How much memory? > > > We should probably move arch/i386/kernel/dump.c into a different location > > and link it to drivers/block/dump.o. > > I don't mind moving it to drivers/dump/ and moving > drivers/block/dump.c > to a new location. Thoughts? > > > Comments? > > Let me know what you think of the above, and if you want, go ahead and > move them. Note, you'll need to change the Makefiles as well. > > Thanks, Vamsi. > > --Matt > > > Regards.. Vamsi. > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Lab, Bangalore. > > Ph: +91 80 5262355 Extn: 3959 > > Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 10:42:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CHgxb14684 for lkcd-outgoing; Wed, 12 Sep 2001 10:42:59 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CHgte14681 for ; Wed, 12 Sep 2001 10:42:56 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8CHcCO03663; Wed, 12 Sep 2001 10:38:12 -0700 Message-ID: <3B9F9F57.B28820B8@alacritech.com> Date: Wed, 12 Sep 2001 10:45:59 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: vamsi_krishna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: dump_configure_header called twice ?! References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk vamsi_krishna@in.ibm.com wrote: > > Matt, > > I am trying to figure out why dump_configure_header is called twice, once > from dump_execute_memdump and then from dump_execute (after the call to > dump_execute_memdump). It is not clear to me what could have changed in the > meantime. > > Another question is regarding the x86 specific code in dump_execute to > capture esp and eip one more time at the end. AFAIU, if we don't do this, > the backtrace should show only untill the last time dump_configure_header > is called, as the comment there says. I would think calling > dump_configure_header once, at the top of dump_execute should do. Do you > see any problem with this? Both _very_ good questions. As it turns out, you _have_ to do exactly what's indicated in the comments. The reason for the first call to dump_configure_header() was to save those eip/esp values, and then we call dump_write_header() to write them out. Unfortunately, though, calling dump_write_header() in some cases blows away the ability to perform a backtrace (stack trace dump) of the failing process, because dump_write_header() overwrites part of the stack that dump_configure_header() was once on. So, you snapshot the registers and then write out the dump header. We ran into this problem way back in 2.X, and had to fix it. Some people were saying that their dumps weren't complete. When looking at the dump with 'lcrash', we saw that the stack trace was "corrupt" due to this problem, specifically with the failing task. Here's a better way around this (I wonder why I didn't do it this way to begin with). Pass an integer argument into dump_configure_header(), which passes the value down to the lower __dump_configure_header() calls. This in turn will look at the flag (integer) and determine if the lower call should call back up into dump_write_header() from below. That way you don't have to maintain the register save in that code. Sound about right? If so, I'll check it in later today. --Matt > Thanks, Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 13:34:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CKYjL18658 for lkcd-outgoing; Wed, 12 Sep 2001 13:34:45 -0700 Received: from d12lmsgate.de.ibm.com (d12lmsgate.de.ibm.com [195.212.91.199]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CKYae18646 for ; Wed, 12 Sep 2001 13:34:36 -0700 Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22]) by d12lmsgate.de.ibm.com (1.0.0) with ESMTP id WAA82960; Wed, 12 Sep 2001 22:34:22 +0200 Received: from d12ml033.de.ibm.com (d12ml033_cs0 [9.165.223.11]) by d12relay01.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8CKYKe159858; Wed, 12 Sep 2001 22:34:20 +0200 Importance: Normal Subject: lkcd checkin, s390x support To: lkcd@oss.sgi.com Cc: "Matt D. Robinson" , "Luc Chouinard" , "Tom Morano" , "Michael Holzheu" X-Mailer: Lotus Notes Release 5.0.4a July 24, 2000 Message-ID: From: "Andreas Herrmann" Date: Wed, 12 Sep 2001 22:31:41 +0200 X-MIMETrack: Serialize by Router on D12ML033/12/M/IBM(Release 5.0.8 |June 18, 2001) at 12/09/2001 22:31:44 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, today I have checked in several files. Most important is, we added lcrash support for a new platform. It is s390x, which is 64 bit. >From our side it is ok now to create the new lkcd release. Besides that we did some further changes and bug fixes. Following an overview about what we have done: Consolidated header files. libklib/include/klib.h libklib/include/asm-alpha/kl_error.h (removed) libklib/include/asm-ia64/kl_error.h (removed) libklib/include/asm-i386/kl_error.h (removed) libklib/include/asm-s390/kl_error.h (removed) libklib/include/kl_error.h (added) libklib/include/asm-alpha/kl_mem.h libklib/include/asm-ia64/kl_mem.h libklib/include/asm-i386/kl_mem.h libklib/include/asm-s390/kl_mem.h libklib/include/kl_mem.h (added for common stuff) libklib/include/asm-alpha/kl_stabs.h (removed) libklib/include/asm-ia64/kl_stabs.h (removed) libklib/include/asm-i386/kl_stabs.h (removed) libklib/include/asm-s390/kl_stabs.h (removed) libklib/include/kl_stabs.h (added) Included support for type information of so called register variables (N_RSYM in stabs format). libklib/kl_stabs.c Included support for intels PSE (Page Size Extension) in memory mapping: libklib/arch/alpha/kl_page.c libklib/arch/ia64/kl_page.c libklib/arch/i386/kl_page.c libklib/arch/s390/kl_page.c libklib/kl_mem.c libklib/kl_memory.c Changed vtop behaviour: switched on mem mapping for all virtual addresses lcrash/cmds/cmd_vtop.c s390 specific changes: lcrash/arch/s390/lib/s390-report.c lcrash/arch/s390/lib/s390-util.c lcrash/arch/s390/lib/trace.c lcrash/include/arch-s390/trace.h libklib/arch/s390/kl_s390_util.c Removed unnecessary include directives to kernel header files. Especially the handling of struct utsname was changed. lcrash/arch/s390/cmds/cmd_s390dbf.c lcrash/cmds/cmd_stat.c lcrash/include/lcrash.h lcrash/util.c libklib/include/dump.h BUGFIXES: - Inserted setjmp before first call to longjump. lcrash/main.c - "dis -F -w file", rediretion of output did not work correctly. lcrash/arch/alpha/lib/dis.c lcrash/arch/ia64/lib/dis.c lcrash/arch/s390/lib/dis.c - Setup type info immediately after reading namelist. libklib/kl_nmlist.c - Avoid printing of control characters. libklib/kl_print.c - Kernel without module support can be analyzed again. lcrash/cmds/cmd_symtab.c lcrash/cmds/cmd_module.c libklib/kl_util.c libklib/klib.c - Fixed the computation of memory size in livedump. lcrash/vmdump.c - s390x port: (a whole bunch of files) Regards, Andreas -- Linux for eServer Development Tel : +49-7031-16-4640 Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE email : aherrman@de.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 15:16:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CMGgE21215 for lkcd-outgoing; Wed, 12 Sep 2001 15:16:42 -0700 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CMGde21212 for ; Wed, 12 Sep 2001 15:16:39 -0700 Received: (qmail 29069 invoked from network); 12 Sep 2001 22:16:36 -0000 Received: from ocs3.intra.ocs.com.au (192.168.255.3) by mail.ocs.com.au with SMTP; 12 Sep 2001 22:16:36 -0000 Received: by ocs3.intra.ocs.com.au (Postfix, from userid 16331) id A1C80300095; Thu, 13 Sep 2001 08:15:45 +1000 (EST) Received: from ocs3.intra.ocs.com.au (localhost [127.0.0.1]) by ocs3.intra.ocs.com.au (Postfix) with ESMTP id 94D62AB; Thu, 13 Sep 2001 08:15:45 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: "Matt D. Robinson" Cc: vamsi_krishna@in.ibm.com, lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module In-reply-to: Your message of "Wed, 12 Sep 2001 10:39:39 MST." <3B9F9DDB.6D3C9048@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 13 Sep 2001 08:15:40 +1000 Message-ID: <4843.1000332940@ocs3.intra.ocs.com.au> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 12 Sep 2001 10:39:39 -0700, "Matt D. Robinson" wrote: >None at this time. Go ahead and move it. BTW, if there are variables >that will remain static at all times (such as dump_function_ptr), make >sure they are declared in one file. If this means moving dump_function_ptr >to init/main.c and declaring everything there, or moving them all to >kernel/panic.c, that's fine. It is better to put dump related static variables in their own file with a globally unique name and export the variables from that file. Patching an existing file is a bad idea, it results in overlapping patches for different add on components. Separate files makes for a cleaner patch and a cleaner build. From owner-lkcd@oss.sgi.com Wed Sep 12 15:43:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CMhpM21894 for lkcd-outgoing; Wed, 12 Sep 2001 15:43:51 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CMhne21891 for ; Wed, 12 Sep 2001 15:43:49 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8CMcmO13149; Wed, 12 Sep 2001 15:38:48 -0700 Message-ID: <3B9FE5CC.D226E8E1@alacritech.com> Date: Wed, 12 Sep 2001 15:46:36 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens CC: vamsi_krishna@in.ibm.com, lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module References: <4843.1000332940@ocs3.intra.ocs.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Keith Owens wrote: > > On Wed, 12 Sep 2001 10:39:39 -0700, > "Matt D. Robinson" wrote: > >None at this time. Go ahead and move it. BTW, if there are variables > >that will remain static at all times (such as dump_function_ptr), make > >sure they are declared in one file. If this means moving dump_function_ptr > >to init/main.c and declaring everything there, or moving them all to > >kernel/panic.c, that's fine. > > It is better to put dump related static variables in their own file > with a globally unique name and export the variables from that file. > Patching an existing file is a bad idea, it results in overlapping > patches for different add on components. Separate files makes for a > cleaner patch and a cleaner build. I'm not sure this is easily accomplished, Keith. We currently have hooks into panic(), die_if_kernel(), setup_kernel(), and inevitably in other places as well. Creating a new file which contains just dump variables, do you think that would be accepted into 2.5? I would think they'd want the file removed and the variables place somewhere else. For example, our dump_function_ptr gets assigned in setup_kernel() if the dump capability is built into the kernel, or assigned if an insmod takes place. There's also dump_in_progress, and eventually a bunch of dump device tables (4.1). So, you're saying create kernel/dump.c, put the stuff in there, and just patch kernel/Makefile? You still have to reference all of those externs in the other files (or at the very least, #include ). --Matt From owner-lkcd@oss.sgi.com Wed Sep 12 15:45:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CMjQB21965 for lkcd-outgoing; Wed, 12 Sep 2001 15:45:26 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CMjJe21947 for ; Wed, 12 Sep 2001 15:45:19 -0700 Received: from f02n15e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f8CMfQh10234 for ; Thu, 13 Sep 2001 08:41:26 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n15e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8CMhGk140150 for ; Thu, 13 Sep 2001 08:43:16 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AC5.007CD5C1 ; Thu, 13 Sep 2001 08:43:31 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: vamsi_krishna@in.ibm.com To: "Matt D. Robinson" cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 12 Sep 2001 18:55:11 +0530 Subject: Re: issues with dump compiled as a module Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Admittedly, not a significant amount of memory is wasted in the current scheme of always linking arch-specific code into the kernel. and there is no compilation problem as of now. However, the code that we will eventually add to __dump_silence{resume}_system will need access to dump_flags and possibly some other global variables of the dump driver, which won't be available in the kernel if dump.o is built as a module. I agree with drivers/dump/dump.c, drivers/dump/i386/dump.c etc. I have done something very similar recently when modifying our dprobes utility to work as a module. I will look at doing this if there are no other comments. Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi_krishna@in.ibm.com "Matt D. Robinson" on 09/12/2001 12:34:49 PM Please respond to "Matt D. Robinson" To: S Vamsikrishna/India/IBM@IBMIN cc: lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module vamsi_krishna@in.ibm.com wrote: > > Matt, > > Even when we compile LKCD as a module, arch/i386/kernel/dump.c gets built > into the kernel, which does not seem to be right. This is actually correct (for now) -- we can change it, but if we do, I'd recommend something like drivers/dump/ instead of arch/i386/kernel. We were in arch//kernel for a few reasons: - built with architecture-specific flags; - normally statically built into the kernel; - uses fields in arch/i386/kernel/*.[ch] > Problems: > - arch/i386/kernel/dump.c requires global variables in drivers/block/dump.c > when __dump_silence_system etc are implemented. However, this cannot > compile when LKCD is a module. I didn't get a compile error, and I've been building strictly as a module for now. Are you CONFIG_SMP? > - it should not be linked into the kernel, wastes kernel memory when dump.o > is not loaded. How much memory? > We should probably move arch/i386/kernel/dump.c into a different location > and link it to drivers/block/dump.o. I don't mind moving it to drivers/dump/ and moving drivers/block/dump.c to a new location. Thoughts? > Comments? Let me know what you think of the above, and if you want, go ahead and move them. Note, you'll need to change the Makefiles as well. Thanks, Vamsi. --Matt > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi_krishna@in.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 15:59:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CMxDT22192 for lkcd-outgoing; Wed, 12 Sep 2001 15:59:13 -0700 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CMxAe22189 for ; Wed, 12 Sep 2001 15:59:10 -0700 Received: (qmail 29357 invoked from network); 12 Sep 2001 22:59:08 -0000 Received: from ocs3.intra.ocs.com.au (192.168.255.3) by mail.ocs.com.au with SMTP; 12 Sep 2001 22:59:08 -0000 Received: by ocs3.intra.ocs.com.au (Postfix, from userid 16331) id DE903300095; Thu, 13 Sep 2001 08:58:18 +1000 (EST) Received: from ocs3.intra.ocs.com.au (localhost [127.0.0.1]) by ocs3.intra.ocs.com.au (Postfix) with ESMTP id D1DE0AB; Thu, 13 Sep 2001 08:58:18 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: "Matt D. Robinson" Cc: vamsi_krishna@in.ibm.com, lkcd@oss.sgi.com Subject: Re: issues with dump compiled as a module In-reply-to: Your message of "Wed, 12 Sep 2001 15:46:36 MST." <3B9FE5CC.D226E8E1@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 13 Sep 2001 08:58:13 +1000 Message-ID: <5269.1000335493@ocs3.intra.ocs.com.au> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 12 Sep 2001 15:46:36 -0700, "Matt D. Robinson" wrote: >I'm not sure this is easily accomplished, Keith. We currently have >hooks into panic(), die_if_kernel(), setup_kernel(), and inevitably in >other places as well. If you are already patching panic.c then put the variables there, I misread the earlier mail as adding new patches. From owner-lkcd@oss.sgi.com Wed Sep 12 16:28:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8CNSwN22747 for lkcd-outgoing; Wed, 12 Sep 2001 16:28:58 -0700 Received: from web20405.mail.yahoo.com (web20405.mail.yahoo.com [216.136.226.124]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8CNSue22744 for ; Wed, 12 Sep 2001 16:28:56 -0700 Message-ID: <20010912232856.74911.qmail@web20405.mail.yahoo.com> Received: from [63.121.140.244] by web20405.mail.yahoo.com via HTTP; Wed, 12 Sep 2001 16:28:56 PDT Date: Wed, 12 Sep 2001 16:28:56 -0700 (PDT) From: Venkat Raghu Subject: newbie... To: lkcd@oss.sgi.com MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi For dumping kernel image "brw_kiovec" is used. But I think that "brw_kiovec" is only for block devices. So what about chararcter devices? Which interface I should use?? Please mail me venkatraghu2002@yahoo.com. I have not subscribed. Thank You Raghu __________________________________________________ Do You Yahoo!? Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger http://im.yahoo.com From owner-lkcd@oss.sgi.com Wed Sep 12 17:28:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D0Ssj24142 for lkcd-outgoing; Wed, 12 Sep 2001 17:28:54 -0700 Received: from guzzi.amazon.com (guzzi.amazon.com [209.191.164.151]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D0Spe24139 for ; Wed, 12 Sep 2001 17:28:51 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by guzzi.amazon.com (Postfix) with ESMTP id B28BCAC6 for ; Wed, 12 Sep 2001 17:28:50 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id 9BB054805F for ; Wed, 12 Sep 2001 17:28:50 -0700 (PDT) From: "Monty Vanderbilt" To: Subject: lkcd_config.c errors? Date: Wed, 12 Sep 2001 17:28:50 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 In-Reply-To: <20010912232856.74911.qmail@web20405.mail.yahoo.com> Sender: owner-lkcd@oss.sgi.com Precedence: bulk Let me know if I'm off base here, but I'm having trouble getting lkcd_config to work. I'm just learning about linux device drivers so I apologize in advance for dumb questions. 1) The symptom is that the open works, but all ioctl calls (query and set) return I/O error. My suspicion is that with a dump driver in the system the /dev/dump device can no longer be a symbolic link to the swap device. But the \lkcdutils\README files still describe the setup like this. 2) 3 of the 4 ioctls in the query portion of lkcd_config are passing the "set" code rather than the "get" code. 3) I'm confused about the file access: dfd = open(DUMP_DEVICE, O_RDONLY) // opens hard-coded device name /dev/dump fd = open(device_name, O_RDONLY) // opens the device name specified on the command line - dfd is used for the device operations, but fd is not, except to get a dnum. Why is the device name hard-coded? - the define is opened read-only but "set" ioctls are performed. Is this OK? 4) When I get the latest set of sources I'm still getting the old "vmdump" files. Is there a way around this? Monty VanderBilt Amazon.com From owner-lkcd@oss.sgi.com Wed Sep 12 17:42:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D0gnj24337 for lkcd-outgoing; Wed, 12 Sep 2001 17:42:49 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D0gie24332 for ; Wed, 12 Sep 2001 17:42:45 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8D0lLF06854; Wed, 12 Sep 2001 17:47:21 -0700 Date: Wed, 12 Sep 2001 17:47:21 -0700 (PDT) From: "Matt D. Robinson" To: Monty Vanderbilt cc: Subject: Re: lkcd_config.c errors? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 12 Sep 2001, Monty Vanderbilt wrote: |>Let me know if I'm off base here, but I'm having trouble getting lkcd_config |>to work. I'm just learning about linux device drivers so I apologize in |>advance for dumb questions. |> |>1) The symptom is that the open works, but all ioctl calls (query and set) |>return I/O error. My suspicion is that with a dump driver in the system the |>/dev/dump device can no longer be a symbolic link to the swap device. But |>the \lkcdutils\README files still describe the setup like this. |> |>2) 3 of the 4 ioctls in the query portion of lkcd_config are passing the |>"set" code rather than the "get" code. Fixed. Check out the new version. Thanks. |>3) I'm confused about the file access: |> |> dfd = open(DUMP_DEVICE, O_RDONLY) // opens hard-coded device name |>/dev/dump |> |> fd = open(device_name, O_RDONLY) // opens the device name specified on |>the command line |> |> - dfd is used for the device operations, but fd is not, except to get a |>dnum. Why is the device name hard-coded? /dev/dump is now the representative "dump device driver". There will be the extension in the future of having multiple dump methods beyond /dev/dump. What this means is, you tell /dev/dump what device number or device name to dump to. For example, /dev/dump will use /dev/hda4 as its representative dump device if that's what you specify. In fact ... I'm going to move this to /dev/dump/dumpN tonight. So assume that the code is going to change one more time. This also means /sbin/lkcd also changes. |> - the define is opened read-only but "set" ioctls are performed. Is this |>OK? This is now fixed, but I also need to fix the open/ioctl. Good point. |>4) When I get the latest set of sources I'm still getting the old "vmdump" |>files. Is there a way around this? I left them in there until we reach a point where we're ready to remove them and move entirely to the new method. First, the old files will be moved into a 2.4-old, and the new files will be in the 2.4 directory. Nice to see some people using the CVS tree. :) Let me know if you find anything else. --Matt |>Monty VanderBilt |>Amazon.com |> From owner-lkcd@oss.sgi.com Wed Sep 12 17:55:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D0tvO24533 for lkcd-outgoing; Wed, 12 Sep 2001 17:55:57 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D0tre24530 for ; Wed, 12 Sep 2001 17:55:53 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8D0pAO16903; Wed, 12 Sep 2001 17:51:10 -0700 Message-ID: <3BA004D2.41CD1BF2@alacritech.com> Date: Wed, 12 Sep 2001 17:58:58 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: "Schaal, Richard" CC: "'Matt D. Robinson'" , "'lkcd@oss.sgi.com'" Subject: Re: LKCD + KDB ? References: <68843F808BE5D311AC6100A0C9C57866484882@fmsmsx50.fm.intel.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Schaal, Richard" wrote: > > Hi Matt, > > I picked up the current CVS tree from Sourceforge. > > In the 2.4 tree, I find several files that would appear to come from the > linux kernel > that have modifications in them. If I wanted to derive a kernel patch that > I could use to apply to > an arbitrary kernel, which kernel version would I use as a base to diff > against? 2.4.8 for now. The best thing is to 'cvs update', and then extract two versions of 2.4.8 into some directory. Then do something like: xpath="/my/patched/2.4.8/tree" cd /my/lkcd/cvs/tree/2.4 for file in `find ./[D-k]* -type f -print | grep -v CVS` ; do path=`echo $file | sed 's/\(.*\)\/\(.*\)$/\1/'`; cp $file $xpath/$path ; done This does a copy of all the files in the 2.4 tree on top of the files in your new 2.4.8 tree. Then a simple 'diff -Naur orig248 new248' gives you an LKCD diff. > I'm having some issues with MP system dumping - I'm getting some Oops type > issues in the > ext2 file system in a stress test - the system tries to dump, but then > multiple processors get watchdog timeouts > and hose the dump - I'm hoping that the later code from the CVS tree will > begin to address this issue so I can > get to work on the "real" problem. I'm fixing an issue with SMP right now. Suparna hasn't said much except "hmmm", but I should be able to move the sti() call in dump_silence_system() in front of the __dump_silence_system() call to re-enable the interrupts. I'm not sure the watchdog timeouts are the same. Can you mail me the stress test/configuration you're using? > Kudos for the nifty "roll your own" functions for analysis of the dump file! > I've managed to > hack together a short function to work on the structures I build up with my > debug code. Very slick capability! Thanks to Luc, he worked pretty hard on libsial. Thanks to everyone working on LKCD, for that matter. :) > Thanks, > Richard --Matt From owner-lkcd@oss.sgi.com Wed Sep 12 17:57:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D0vqu24573 for lkcd-outgoing; Wed, 12 Sep 2001 17:57:52 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D0vhe24566 for ; Wed, 12 Sep 2001 17:57:43 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8D0qxO16992; Wed, 12 Sep 2001 17:52:59 -0700 Message-ID: <3BA0053F.58D1138F@alacritech.com> Date: Wed, 12 Sep 2001 18:00:47 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Andreas Herrmann CC: lkcd@oss.sgi.com, Luc Chouinard , Tom Morano , Michael Holzheu , yakker@alacritech.com Subject: Re: lkcd checkin, s390x support References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Andreas Herrmann wrote: > Hi, > > today I have checked in several files. > Most important is, we added lcrash support for a new platform. > It is s390x, which is 64 bit. > > >From our side it is ok now to create the new lkcd release. Wow, when all of you make a check-in, you don't mess around. :) Okay, got all the changes. Are you going to roll an RPM around the same time we do, or are you going to do your own? I'd need an S390 to roll the RPM against, which I just don't have hanging around the house. :) Thanks for all the bug fixes, additions, and clean-ups along the way. I'll go through these changes, although it will take me a while. Needless to say, I won't hold up 4.0 just to do the review -- it'll still go out Friday. --Matt > Besides that we did some further changes and bug fixes. > Following an overview about what we have done: > > Consolidated header files. > > libklib/include/klib.h > > libklib/include/asm-alpha/kl_error.h (removed) > libklib/include/asm-ia64/kl_error.h (removed) > libklib/include/asm-i386/kl_error.h (removed) > libklib/include/asm-s390/kl_error.h (removed) > libklib/include/kl_error.h (added) > > libklib/include/asm-alpha/kl_mem.h > libklib/include/asm-ia64/kl_mem.h > libklib/include/asm-i386/kl_mem.h > libklib/include/asm-s390/kl_mem.h > libklib/include/kl_mem.h (added for common stuff) > > libklib/include/asm-alpha/kl_stabs.h (removed) > libklib/include/asm-ia64/kl_stabs.h (removed) > libklib/include/asm-i386/kl_stabs.h (removed) > libklib/include/asm-s390/kl_stabs.h (removed) > libklib/include/kl_stabs.h (added) > > Included support for type information of so called register variables > (N_RSYM in stabs format). > > libklib/kl_stabs.c > > Included support for intels PSE (Page Size Extension) in memory mapping: > > libklib/arch/alpha/kl_page.c > libklib/arch/ia64/kl_page.c > libklib/arch/i386/kl_page.c > libklib/arch/s390/kl_page.c > libklib/kl_mem.c > libklib/kl_memory.c > > Changed vtop behaviour: switched on mem mapping for all virtual addresses > > lcrash/cmds/cmd_vtop.c > > s390 specific changes: > > lcrash/arch/s390/lib/s390-report.c > lcrash/arch/s390/lib/s390-util.c > lcrash/arch/s390/lib/trace.c > lcrash/include/arch-s390/trace.h > libklib/arch/s390/kl_s390_util.c > > Removed unnecessary include directives to kernel header files. Especially > the > handling of struct utsname was changed. > > lcrash/arch/s390/cmds/cmd_s390dbf.c > lcrash/cmds/cmd_stat.c > lcrash/include/lcrash.h > lcrash/util.c > libklib/include/dump.h > > BUGFIXES: > > - Inserted setjmp before first call to longjump. > > lcrash/main.c > > - "dis -F -w file", rediretion of output did not work correctly. > > lcrash/arch/alpha/lib/dis.c > lcrash/arch/ia64/lib/dis.c > lcrash/arch/s390/lib/dis.c > > - Setup type info immediately after reading namelist. > > libklib/kl_nmlist.c > > - Avoid printing of control characters. > > libklib/kl_print.c > > - Kernel without module support can be analyzed again. > > lcrash/cmds/cmd_symtab.c > lcrash/cmds/cmd_module.c > libklib/kl_util.c > libklib/klib.c > > - Fixed the computation of memory size in livedump. > > lcrash/vmdump.c > > - s390x port: > > (a whole bunch of files) > > Regards, > > Andreas > > -- > Linux for eServer Development > Tel : +49-7031-16-4640 > Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE > email : aherrman@de.ibm.com From owner-lkcd@oss.sgi.com Wed Sep 12 17:59:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D0xNS24614 for lkcd-outgoing; Wed, 12 Sep 2001 17:59:23 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D0xLe24611 for ; Wed, 12 Sep 2001 17:59:21 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8D0sXO17027; Wed, 12 Sep 2001 17:54:34 -0700 Message-ID: <3BA0059A.158433F5@alacritech.com> Date: Wed, 12 Sep 2001 18:02:18 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: Venkat Raghu CC: lkcd@oss.sgi.com Subject: Re: newbie... References: <20010912232856.74911.qmail@web20405.mail.yahoo.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Venkat Raghu wrote: > > Hi > For dumping kernel image "brw_kiovec" is > used. But I think that "brw_kiovec" is only for block > devices. So what about chararcter devices? Which > interface I should use?? > > Please mail me venkatraghu2002@yahoo.com. > I have not subscribed. > > Thank You > Raghu > > __________________________________________________ > Do You Yahoo!? > Get email alerts & NEW webcam video instant messaging with Yahoo! Messenger > http://im.yahoo.com What kind of character devices? Right now, we don't have anything specified, and won't until post 4.0. Then we'll add in a dump() function pointer to block_device_operations. You're saying you want one for character devices as well? Seems to make sense to do, I don't see a problem with it right away. --Matt From owner-lkcd@oss.sgi.com Wed Sep 12 22:48:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D5m7C29435 for lkcd-outgoing; Wed, 12 Sep 2001 22:48:07 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D5lRe29425 for ; Wed, 12 Sep 2001 22:47:28 -0700 Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f8D5iDc424812; Thu, 13 Sep 2001 15:44:18 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8D5kPp132720; Thu, 13 Sep 2001 15:46:26 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AC6.001FBAFF ; Thu, 13 Sep 2001 15:46:34 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: bsuparna@in.ibm.com To: "Matt D. Robinson" cc: "Schaal, Richard" , "'Matt D. Robinson'" , "'lkcd@oss.sgi.com'" Message-ID: Date: Thu, 13 Sep 2001 10:41:35 +0500 Subject: Re: LKCD + KDB ? Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk >> I'm having some issues with MP system dumping - I'm getting some Oops type >> issues in the >> ext2 file system in a stress test - the system tries to dump, but then >> multiple processors get watchdog timeouts >> and hose the dump - I'm hoping that the later code from the CVS tree will >> begin to address this issue so I can >> get to work on the "real" problem. > >I'm fixing an issue with SMP right now. Suparna hasn't said much >except "hmmm", but I should be able to move the sti() call in >dump_silence_system() in front of the __dump_silence_system() call >to re-enable the interrupts. > >I'm not sure the watchdog timeouts are the same. Can you mail me >the stress test/configuration you're using? Well, Matt, I think I said much more than that in my last mail on the SMP issues ! But perhaps it wasn't expressed too well ? I'll go into more detail again, then in a while. But before that, I had thought that your suggestion of moving "sti" up only had to do with triggering dump from an interrupt handler on an SMP system, to avoid calling smp_call_function with interrupts disabled, didn't it ? To us, it appears that making sure that dump works in general from interrupt context in itself (keeping in mind all kinds of drivers) involves more thought. What Vamsi is experimenting with for kdb (dumping from a different thread context), may be part of the solution (i.e. always ensure that we have a legal context when we perform the dump i/o). Another approach could turn out to be the dump driver interface that you've been working on. It is also possible that an intermediate solution emerges. In any case, we plan to share our observations and trial patches along the way. Coming back to the MP issues. I really believe that it is important to have a reliable solution which we are confident will work under most situations, _and_ that we are at least in a position to say for sure which situations it may not work under. We already have made some of the changes that we think are needed for the SMP issues (for non-disruptive dumping) and got them to run (yes this is with your latest code, with a modified version of lkcd_config with a few fixes :)). Bharata could pass on the patch to you, if you'd like to take a look at it. But this is a tricky area - its great to see it work when we try it out :), but we know that this doesn't necessarily mean it will work under all conditions. That's why we've been spending so much time trying to analyse and understand the issues, and trying to close logical loopholes as far as possible. And there are few things we still need to work on before we have our code ready for release. There are also a few tradeoffs that we end up making in terms of dump accuracy as we try to fix some of the problems. So, what are the some of the pending concerns with MP system dumping ? For now, lets leave out the case of dumping from an interrupt handler, to simply the scope. If we make the other CPUs spin inside the IPI (CALL_FUNCTION_VECTOR) while dump is in progress, we need to take care of the following things: 1. If we spin the other cpu's with interrupts disabled, then we need to make sure that the NMI watchdog timer doesn't report lockups (given that dumping would take some time) on the CPUs which are intentionally made to spin (we might want lockups to be detected on the dumping cpu just in case dump itself runs into problems). We had this check in our patch. 2. It is possible that a disk interrupt generated in the dump i/o process, gets delivered to one of the spinning CPUs rather than the dumping CPU. This seems possible because the IRQ affinity for this interrupt indicates that any CPU could receive the interrupt. If the other CPUs are spinning with interrupts disabled, then they won't service such interrupts -- resulting in a potential deadlock as the dumping thread waits for i/o to complete. We spent some time delving through APIC arbitration logic section of the Intel manual :) to see if there is something that could be used to avoid this. (If you look at my last posting in this context, you might notice where some of the observations there came from ...). Somehow, when we tested our code on a 2 way machine, we never seemed to be hitting this case ... Bharata tried it on a 4 way yesterday and in one of the trials he did run into a deadlock (this doesn't happen consistently, though). There are 2 ways to avoid this: (a) Simply keep interrupts enabled on the other CPUs as they spin. This is likely to cause a little more drift in the dump snapshot, but then right now since we already have interrupts enabled on the dumping CPU, its probably not too bad. We've already tried this out. (b) Change the IRQ affinities (i.e. program the APIC) so that interrupts get delivered only to the dumping CPU during the dump process, and then revert back to the original affinities, after dumping is through. This is what we are experimenting with now. Now, besides the above there is another issue to think about (not so much for non-disruptive case). Given that smp_call_function waits for the other CPUs to receive the IPI, do we have a failsafe method to get a dump in case another CPU is caught in a tight loop with interrupts disabled ? If we used an NMI IPI, this could cause potential deadlocks in the dump path. Another question is if there any possibility of our interrupting another CPU in the midst of some i/o operations while it holds a lock needed in the dump path. We have been studying the i/o code to check if such a possibility could arise. I haven't noticed anything so far, but would like to check this out further just to be absolutely sure. For spin_lock_irqsave, this shouldn't be a problem., so obviously, this can happen for locks taken only in the pre-requestfn stage of the block i/o logic, if at all. Anything else that I missed ? Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 From owner-lkcd@oss.sgi.com Wed Sep 12 23:29:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D6TVm30149 for lkcd-outgoing; Wed, 12 Sep 2001 23:29:31 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D6TKe30144 for ; Wed, 12 Sep 2001 23:29:20 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8D6WfJ08496; Wed, 12 Sep 2001 23:32:41 -0700 Message-ID: <3BA0514F.C18C3BBC@alacritech.com> Date: Wed, 12 Sep 2001 23:25:19 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: bsuparna@in.ibm.com CC: "Schaal, Richard" , "'Matt D. Robinson'" , "'lkcd@oss.sgi.com'" Subject: Re: LKCD + KDB ? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk bsuparna@in.ibm.com wrote: > >> I'm having some issues with MP system dumping - I'm getting some Oops > type > >> issues in the > >> ext2 file system in a stress test - the system tries to dump, but then > >> multiple processors get watchdog timeouts > >> and hose the dump - I'm hoping that the later code from the CVS tree > will > >> begin to address this issue so I can > >> get to work on the "real" problem. > > > >I'm fixing an issue with SMP right now. Suparna hasn't said much > >except "hmmm", but I should be able to move the sti() call in > >dump_silence_system() in front of the __dump_silence_system() call > >to re-enable the interrupts. > > > >I'm not sure the watchdog timeouts are the same. Can you mail me > >the stress test/configuration you're using? > > Well, Matt, I think I said much more than that in my last mail on the SMP > issues ! But perhaps it wasn't expressed too well ? I'll go into more > detail again, then in a while. Actually, you did, my apologies. I was referring to the other night in our discussions based on that recommendation I made (which again, is not the inevitable fix, as you indicate below). :) > But before that, I had thought that your suggestion of moving "sti" up only > had to do with triggering dump from an interrupt handler on an SMP system, > to avoid calling smp_call_function with interrupts disabled, didn't it ? To > us, it appears that making sure that dump works in general from interrupt > context in itself (keeping in mind all kinds of drivers) involves more > thought. What Vamsi is experimenting with for kdb (dumping from a > different thread context), may be part of the solution (i.e. always ensure > that we have a legal context when we perform the dump i/o). Another > approach could turn out to be the dump driver interface that you've been > working on. It is also possible that an intermediate solution emerges. In > any case, we plan to share our observations and trial patches along the > way. Great, I'm ready to test anything you may have, and of course, I'm always sharing what I test out here. > Coming back to the MP issues. > I really believe that it is important to have a reliable solution which we > are confident will work under most situations, _and_ that we are at least > in a position to say for sure which situations it may not work under. > > We already have made some of the changes that we think are needed for the > SMP issues (for non-disruptive dumping) and got them to run (yes this is > with your latest code, with a modified version of lkcd_config with a few > fixes :)). Bharata could pass on the patch to you, if you'd like to take a > look at it. Sure, although I fixed a few things in lkcd_config already per an earlier E-mail. I'm not sure how many cross over, but I didn't test -q at all, and Marty showed me (in so many words) that it wasn't ready for prime time. Again, hopefully this is all corrected. I was mostly concerned with the DIOS stuff. > But this is a tricky area - its great to see it work when we try it out :), > but we know that this doesn't necessarily mean it will work under all > conditions. That's why we've been spending so much time trying to analyse > and understand the issues, and trying to close logical loopholes as far as > possible. And there are few things we still need to work on before we have > our code ready for release. There are also a few tradeoffs that we end up > making in terms of dump accuracy as we try to fix some of the problems. I'd like to know a bit more of what those tradeoffs are. I assume you're on #lkcd? > So, what are the some of the pending concerns with MP system dumping ? > For now, lets leave out the case of dumping from an interrupt handler, to > simply the scope. > > If we make the other CPUs spin inside the IPI (CALL_FUNCTION_VECTOR) while > dump is in progress, we need to take care of the following things: > > 1. If we spin the other cpu's with interrupts disabled, then we need to > make sure that the NMI watchdog timer doesn't report lockups (given that > dumping would take some time) on the CPUs which are intentionally made to > spin (we might want lockups to be detected on the dumping cpu just in case > dump itself runs into problems). > We had this check in our patch. Can this be as simple as dump_in_progress, or something more complex? > 2. It is possible that a disk interrupt generated in the dump i/o process, > gets delivered to one of the spinning CPUs rather than the dumping CPU. > This seems possible because the IRQ affinity for this interrupt indicates > that any CPU could receive the interrupt. If the other CPUs are spinning > with interrupts disabled, then they won't service such interrupts -- > resulting in a potential deadlock as the dumping thread waits for i/o to > complete. Eww. This means that going to re-program the APIC. > We spent some time delving through APIC arbitration logic section of the > Intel manual :) to see if there is something that could be used to avoid > this. (If you look at my last posting in this context, you might notice > where some of the observations there came from ...). > Somehow, when we tested our code on a 2 way machine, we never seemed to be > hitting this case ... Bharata tried it on a 4 way yesterday and in one of > the trials he did run into a deadlock (this doesn't happen consistently, > though). > There are 2 ways to avoid this: > (a) Simply keep interrupts enabled on the other CPUs as they spin. This is > likely to cause a little more drift in the dump snapshot, but then right > now since we already have interrupts enabled on the dumping CPU, its > probably not too bad. > We've already tried this out. Okay. This one is probably faster to implement (meaning less complex) as well. > (b) Change the IRQ affinities (i.e. program the APIC) so that interrupts > get delivered only to the dumping CPU during the dump process, and then > revert back to the original affinities, after dumping is through. > This is what we are experimenting with now. If this is possible, great, but given what you have to change while going silent, how possible is this? Note, I'm just now going to look at the code path. > Now, besides the above there is another issue to think about (not so much > for non-disruptive case). Given that smp_call_function waits for the other > CPUs to receive the IPI, do we have a failsafe method to get a dump in case > another CPU is caught in a tight loop with interrupts disabled ? > If we used an NMI IPI, this could cause potential deadlocks in the dump > path. This will always be the case, though. Unless you add in a special NMI dump handler to handle interrupts and system state at the point of the NMI, you won't have much luck, and even then, an NMI dump after a new dump type is risky (double panics are almost always useless). > Another question is if there any possibility of our interrupting another > CPU in the midst of some i/o operations while it holds a lock needed in the > dump path. We have been studying the i/o code to check if such a > possibility could arise. I haven't noticed anything so far, but would like > to check this out further just to be absolutely sure. For > spin_lock_irqsave, this shouldn't be a problem., so obviously, this can > happen for locks taken only in the pre-requestfn stage of the block i/o > logic, if at all. > > Anything else that I missed ? You've covered practically everything. I'll login now to chat with you about this some more. I'm curious about your APIC thoughts. Thanks, Suparna. --Matt > Regards > Suparna > > Suparna Bhattacharya > IBM Software Lab, India > E-mail : bsuparna@in.ibm.com > Phone : 91-80-5267117, Extn : 3961 From owner-lkcd@oss.sgi.com Wed Sep 12 23:52:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D6q4Q30538 for lkcd-outgoing; Wed, 12 Sep 2001 23:52:04 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D6q1e30535 for ; Wed, 12 Sep 2001 23:52:02 -0700 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id XAA01346 for ; Wed, 12 Sep 2001 23:51:53 -0700 (PDT) mail_from (kaos@ocs.com.au) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by nodin.corp.sgi.com (8.11.4/8.11.2/nodin-1.0) with ESMTP id f8D6ol538852974 for ; Wed, 12 Sep 2001 23:50:48 -0700 (PDT) Received: by kao2.melbourne.sgi.com (Postfix, from userid 16331) id BE8D4300095; Thu, 13 Sep 2001 16:49:57 +1000 (EST) Received: from kao2.melbourne.sgi.com (localhost [127.0.0.1]) by kao2.melbourne.sgi.com (Postfix) with ESMTP id 8CBDFAB for ; Thu, 13 Sep 2001 16:49:57 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: "'lkcd@oss.sgi.com'" Subject: Re: LKCD + KDB ? In-reply-to: Your message of "Wed, 12 Sep 2001 23:25:19 MST." <3BA0514F.C18C3BBC@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Thu, 13 Sep 2001 16:49:52 +1000 Message-ID: <7973.1000363792@kao2.melbourne.sgi.com> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Wed, 12 Sep 2001 23:25:19 -0700, "Matt D. Robinson" wrote: >bsuparna@in.ibm.com wrote: >> 1. If we spin the other cpu's with interrupts disabled, then we need to >> make sure that the NMI watchdog timer doesn't report lockups (given that > >Can this be as simple as dump_in_progress, or something more complex? Andrew Morton has code in the -AC tree which is a generic fix for the problem of the NMI watchdog tripping on long events. kdb uses it in the -AC tree. if (*f == NULL) { /* Reset NMI watchdog once per poll loop */ touch_nmi_watchdog(); f = &poll_funcs[0]; } There is no equivalent in Linus's tree, you have to hack the NMI handler yourself :(. Time to push Andrew Morton and AC to get the NMI changes into Linus's tree. From owner-lkcd@oss.sgi.com Thu Sep 13 00:36:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D7abo31063 for lkcd-outgoing; Thu, 13 Sep 2001 00:36:37 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D7aXe31059 for ; Thu, 13 Sep 2001 00:36:33 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8D7evJ08663; Thu, 13 Sep 2001 00:40:57 -0700 Message-ID: <3BA0614F.B19082B7@alacritech.com> Date: Thu, 13 Sep 2001 00:33:35 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens CC: "'lkcd@oss.sgi.com'" Subject: Re: LKCD + KDB ? References: <7973.1000363792@kao2.melbourne.sgi.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Keith Owens wrote: > On Wed, 12 Sep 2001 23:25:19 -0700, > "Matt D. Robinson" wrote: > >bsuparna@in.ibm.com wrote: > >> 1. If we spin the other cpu's with interrupts disabled, then we need to > >> make sure that the NMI watchdog timer doesn't report lockups (given that > > > >Can this be as simple as dump_in_progress, or something more complex? > > Andrew Morton has code in the -AC tree which is a generic fix for the > problem of the NMI watchdog tripping on long events. kdb uses it in > the -AC tree. > > if (*f == NULL) { > /* Reset NMI watchdog once per poll loop */ > touch_nmi_watchdog(); > f = &poll_funcs[0]; > } > > There is no equivalent in Linus's tree, you have to hack the NMI > handler yourself :(. Time to push Andrew Morton and AC to get the NMI > changes into Linus's tree. Thanks, Keith ... This looks like a reasonable patch to use, although shouldn't touch_nmi_watchdog() reset both the last_irq_sums[] and the alert_counter[] for all CPUs? Otherwise, won't you be dropping back into this loop function over and over and over again? Then again, you probably re-enter it anyway. Looks like touch_nmi_watchdog() is needed in combination with dump_in_progress. In case you don't have the tree handy, Suparna (I didn't), touch_nmi_watchdog() does: void touch_nmi_watchdog (void) { int i; /* * Just reset the alert counters, (other CPUs might be * spinning on locks we hold): */ for (i = 0; i < smp_num_cpus; i++) alert_counter[i] = 0; } alert_counter is moved out of nmi_watchdog_tick() along with last_irq_sums (global in AC's patch) so you can modify them outside of nmi_watchdog_tick(). --Matt From owner-lkcd@oss.sgi.com Thu Sep 13 01:42:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D8guC32453 for lkcd-outgoing; Thu, 13 Sep 2001 01:42:56 -0700 Received: from d12lmsgate-2.de.ibm.com (d12lmsgate-2.de.ibm.com [195.212.91.200]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D8gie32447 for ; Thu, 13 Sep 2001 01:42:44 -0700 Received: from d12relay01.de.ibm.com (d12relay01.de.ibm.com [9.165.215.22]) by d12lmsgate-2.de.ibm.com (1.0.0) with ESMTP id KAA66058; Thu, 13 Sep 2001 10:42:33 +0200 Received: from d12ml004.de.ibm.com (d12ml004_cs0 [9.165.223.50]) by d12relay01.de.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8D8gWV253476; Thu, 13 Sep 2001 10:42:33 +0200 Importance: Normal Subject: Re: lkcd checkin, s390x support To: "Matt D. Robinson" Cc: "Andreas Herrmann" , lkcd@oss.sgi.com, Luc Chouinard , Tom Morano , yakker@alacritech.com X-Mailer: Lotus Notes Release 5.0.3 March 21, 2000 Message-ID: From: "Michael Holzheu" Date: Thu, 13 Sep 2001 10:40:16 +0200 X-MIMETrack: Serialize by Router on D12ML004/12/M/IBM(Release 5.0.8 |June 18, 2001) at 13/09/2001 10:38:03 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, Don't be afraid! We tested our changes carefully also on i386/Linux. Everything should work fine. Hopefully even better than before :-) Regarding the question about an s390/s390x rpm, it would be good, if you could send me the source rpm, when you have it ready. Then I will ensure that it compiles on s390 and s390x. At the moment we are not allowed to ship binary rpms due to legal reasons. But IBM has a started a program called "Linux Community Development System" which provides access to a Linux/390. If you are interested in a Linxu/390 System, check the following link: http://www.ibm.com/servers/eserver/zseries/os/linux/lcds So for now, it would be sufficient to have a source rpm on your web site which can be compiled on Linux/390(x) without problems. Best Regards Michael ------------------------------------------------------------------------ Linux/390 Development Phone: +49-7031-16-2360, Bld 71032-06-109 Email: holzheu@de.ibm.com "Matt D. Robinson" @oss.sgi.com on 09/13/2001 03:00:47 AM Please respond to "Matt D. Robinson" Sent by: owner-lkcd@oss.sgi.com To: Andreas Herrmann/Germany/IBM@IBMDE cc: lkcd@oss.sgi.com, Luc Chouinard , Tom Morano , Michael Holzheu/Germany/IBM@IBMDE, yakker@alacritech.com Subject: Re: lkcd checkin, s390x support Andreas Herrmann wrote: > Hi, > > today I have checked in several files. > Most important is, we added lcrash support for a new platform. > It is s390x, which is 64 bit. > > >From our side it is ok now to create the new lkcd release. Wow, when all of you make a check-in, you don't mess around. :) Okay, got all the changes. Are you going to roll an RPM around the same time we do, or are you going to do your own? I'd need an S390 to roll the RPM against, which I just don't have hanging around the house. :) Thanks for all the bug fixes, additions, and clean-ups along the way. I'll go through these changes, although it will take me a while. Needless to say, I won't hold up 4.0 just to do the review -- it'll still go out Friday. --Matt > Besides that we did some further changes and bug fixes. > Following an overview about what we have done: > > Consolidated header files. > > libklib/include/klib.h > > libklib/include/asm-alpha/kl_error.h (removed) > libklib/include/asm-ia64/kl_error.h (removed) > libklib/include/asm-i386/kl_error.h (removed) > libklib/include/asm-s390/kl_error.h (removed) > libklib/include/kl_error.h (added) > > libklib/include/asm-alpha/kl_mem.h > libklib/include/asm-ia64/kl_mem.h > libklib/include/asm-i386/kl_mem.h > libklib/include/asm-s390/kl_mem.h > libklib/include/kl_mem.h (added for common stuff) > > libklib/include/asm-alpha/kl_stabs.h (removed) > libklib/include/asm-ia64/kl_stabs.h (removed) > libklib/include/asm-i386/kl_stabs.h (removed) > libklib/include/asm-s390/kl_stabs.h (removed) > libklib/include/kl_stabs.h (added) > > Included support for type information of so called register variables > (N_RSYM in stabs format). > > libklib/kl_stabs.c > > Included support for intels PSE (Page Size Extension) in memory mapping: > > libklib/arch/alpha/kl_page.c > libklib/arch/ia64/kl_page.c > libklib/arch/i386/kl_page.c > libklib/arch/s390/kl_page.c > libklib/kl_mem.c > libklib/kl_memory.c > > Changed vtop behaviour: switched on mem mapping for all virtual addresses > > lcrash/cmds/cmd_vtop.c > > s390 specific changes: > > lcrash/arch/s390/lib/s390-report.c > lcrash/arch/s390/lib/s390-util.c > lcrash/arch/s390/lib/trace.c > lcrash/include/arch-s390/trace.h > libklib/arch/s390/kl_s390_util.c > > Removed unnecessary include directives to kernel header files. Especially > the > handling of struct utsname was changed. > > lcrash/arch/s390/cmds/cmd_s390dbf.c > lcrash/cmds/cmd_stat.c > lcrash/include/lcrash.h > lcrash/util.c > libklib/include/dump.h > > BUGFIXES: > > - Inserted setjmp before first call to longjump. > > lcrash/main.c > > - "dis -F -w file", rediretion of output did not work correctly. > > lcrash/arch/alpha/lib/dis.c > lcrash/arch/ia64/lib/dis.c > lcrash/arch/s390/lib/dis.c > > - Setup type info immediately after reading namelist. > > libklib/kl_nmlist.c > > - Avoid printing of control characters. > > libklib/kl_print.c > > - Kernel without module support can be analyzed again. > > lcrash/cmds/cmd_symtab.c > lcrash/cmds/cmd_module.c > libklib/kl_util.c > libklib/klib.c > > - Fixed the computation of memory size in livedump. > > lcrash/vmdump.c > > - s390x port: > > (a whole bunch of files) > > Regards, > > Andreas > > -- > Linux for eServer Development > Tel : +49-7031-16-4640 > Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE > email : aherrman@de.ibm.com From owner-lkcd@oss.sgi.com Thu Sep 13 01:53:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8D8ruo00456 for lkcd-outgoing; Thu, 13 Sep 2001 01:53:56 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8D8rhe00446 for ; Thu, 13 Sep 2001 01:53:43 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8D8wGJ08727; Thu, 13 Sep 2001 01:58:16 -0700 Message-ID: <3BA0736D.CC5AD230@alacritech.com> Date: Thu, 13 Sep 2001 01:50:53 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Michael Holzheu CC: Andreas Herrmann , lkcd@oss.sgi.com, Luc Chouinard , Tom Morano , yakker@alacritech.com Subject: Re: lkcd checkin, s390x support References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Holzheu wrote: > > Hi Matt, > > Don't be afraid! We tested our changes carefully also on i386/Linux. > Everything should work fine. Hopefully even better than before :-) Great. My only issue was how it related to the dump.h changes over from vmdump.h, and whether you had any soak time with that. But since your method invokes off your own crash dump, it shouldn't be a big deal. > Regarding the question about an s390/s390x rpm, it would be good, > if you could send me the source rpm, when you have it ready. > Then I will ensure that it compiles on s390 and s390x. Got it. I'll send you an E-mail and a download location when it is ready. > At the moment we are not allowed to ship binary rpms due to > legal reasons. > > But IBM has a started a program called "Linux Community > Development System" which provides access to a Linux/390. > > If you are interested in a Linxu/390 System, check the following link: > > http://www.ibm.com/servers/eserver/zseries/os/linux/lcds > > So for now, it would be sufficient to have a source rpm on your > web site which can be compiled on Linux/390(x) without problems. I'll read through the S/390 stuff in the morning ... :) Thanks, Michael! --Matt > Best Regards > > Michael > > ------------------------------------------------------------------------ > Linux/390 Development > Phone: +49-7031-16-2360, Bld 71032-06-109 > Email: holzheu@de.ibm.com > > "Matt D. Robinson" @oss.sgi.com on 09/13/2001 > 03:00:47 AM > > Please respond to "Matt D. Robinson" > > Sent by: owner-lkcd@oss.sgi.com > > To: Andreas Herrmann/Germany/IBM@IBMDE > cc: lkcd@oss.sgi.com, Luc Chouinard , Tom Morano > , Michael Holzheu/Germany/IBM@IBMDE, > yakker@alacritech.com > Subject: Re: lkcd checkin, s390x support > > Andreas Herrmann wrote: > > Hi, > > > > today I have checked in several files. > > Most important is, we added lcrash support for a new platform. > > It is s390x, which is 64 bit. > > > > >From our side it is ok now to create the new lkcd release. > > Wow, when all of you make a check-in, you don't mess around. :) > Okay, got all the changes. Are you going to roll an RPM around > the same time we do, or are you going to do your own? I'd need > an S390 to roll the RPM against, which I just don't have hanging > around the house. :) > > Thanks for all the bug fixes, additions, and clean-ups along > the way. I'll go through these changes, although it will take > me a while. Needless to say, I won't hold up 4.0 just to do > the review -- it'll still go out Friday. > > --Matt > > > Besides that we did some further changes and bug fixes. > > Following an overview about what we have done: > > > > Consolidated header files. > > > > libklib/include/klib.h > > > > libklib/include/asm-alpha/kl_error.h (removed) > > libklib/include/asm-ia64/kl_error.h (removed) > > libklib/include/asm-i386/kl_error.h (removed) > > libklib/include/asm-s390/kl_error.h (removed) > > libklib/include/kl_error.h (added) > > > > libklib/include/asm-alpha/kl_mem.h > > libklib/include/asm-ia64/kl_mem.h > > libklib/include/asm-i386/kl_mem.h > > libklib/include/asm-s390/kl_mem.h > > libklib/include/kl_mem.h (added for common stuff) > > > > libklib/include/asm-alpha/kl_stabs.h (removed) > > libklib/include/asm-ia64/kl_stabs.h (removed) > > libklib/include/asm-i386/kl_stabs.h (removed) > > libklib/include/asm-s390/kl_stabs.h (removed) > > libklib/include/kl_stabs.h (added) > > > > Included support for type information of so called register variables > > (N_RSYM in stabs format). > > > > libklib/kl_stabs.c > > > > Included support for intels PSE (Page Size Extension) in memory mapping: > > > > libklib/arch/alpha/kl_page.c > > libklib/arch/ia64/kl_page.c > > libklib/arch/i386/kl_page.c > > libklib/arch/s390/kl_page.c > > libklib/kl_mem.c > > libklib/kl_memory.c > > > > Changed vtop behaviour: switched on mem mapping for all virtual addresses > > > > lcrash/cmds/cmd_vtop.c > > > > s390 specific changes: > > > > lcrash/arch/s390/lib/s390-report.c > > lcrash/arch/s390/lib/s390-util.c > > lcrash/arch/s390/lib/trace.c > > lcrash/include/arch-s390/trace.h > > libklib/arch/s390/kl_s390_util.c > > > > Removed unnecessary include directives to kernel header files. Especially > > the > > handling of struct utsname was changed. > > > > lcrash/arch/s390/cmds/cmd_s390dbf.c > > lcrash/cmds/cmd_stat.c > > lcrash/include/lcrash.h > > lcrash/util.c > > libklib/include/dump.h > > > > BUGFIXES: > > > > - Inserted setjmp before first call to longjump. > > > > lcrash/main.c > > > > - "dis -F -w file", rediretion of output did not work > correctly. > > > > lcrash/arch/alpha/lib/dis.c > > lcrash/arch/ia64/lib/dis.c > > lcrash/arch/s390/lib/dis.c > > > > - Setup type info immediately after reading namelist. > > > > libklib/kl_nmlist.c > > > > - Avoid printing of control characters. > > > > libklib/kl_print.c > > > > - Kernel without module support can be analyzed again. > > > > lcrash/cmds/cmd_symtab.c > > lcrash/cmds/cmd_module.c > > libklib/kl_util.c > > libklib/klib.c > > > > - Fixed the computation of memory size in livedump. > > > > lcrash/vmdump.c > > > > - s390x port: > > > > (a whole bunch of files) > > > > Regards, > > > > Andreas > > > > -- > > Linux for eServer Development > > Tel : +49-7031-16-4640 > > Notes mail : Andreas Herrmann/GERMANY/IBM@IBMDE > > email : aherrman@de.ibm.com From owner-lkcd@oss.sgi.com Fri Sep 14 05:13:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8ECDvc31086 for lkcd-outgoing; Fri, 14 Sep 2001 05:13:57 -0700 Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8ECDse31083 for ; Fri, 14 Sep 2001 05:13:54 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id IAA485026 for ; Fri, 14 Sep 2001 08:11:30 -0400 Received: from bharata.in.ibm.com ([9.186.133.24]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8EC7kG23562 for ; Fri, 14 Sep 2001 08:07:47 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8ECE3C01616 for lkcd@oss.sgi.com; Fri, 14 Sep 2001 17:44:03 +0530 Date: Fri, 14 Sep 2001 17:44:03 +0530 From: Bharata B Rao To: lkcd@oss.sgi.com Subject: Non disruptive dumps -- current work. Message-ID: <20010914174403.A1601@in.ibm.com> Reply-To: bharata@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, We have integrated our changes for non-disruptive dumps into the dump_silence/dump_resume framework. We have a kind of working version, but we still need to sort out some issues. In our current approach, the dumping cpu sends call function vector ipi to other cpus to put them to spin. When the dump is complete, other cpus are released from spin and made to continue. We saw that spinning with interrupts disabled will not work always as sometimes the disk interrupts go to the spinning cpus and get lost. This results in the dumping process to hang. As a work around, we are now enabling the interrupts and making the local_irq_count zero (to make sure that disk interrupts are not missed and softirqs are not prevented from running) and restoring the local_irq_count at the end of spin. This approach is found to work in most cases. Here the system state during dump can drift to the extent that other cpus can handle interrupts and softirqs. We are not sure if this is a right approach. As an alternative approach, we are also thinking of changing the irq affinity of disk interrupts (or all interrupts or some interrupts) to the dumping cpu. Currently we are tyring this approach. Comments ? Regards, Bharata. From owner-lkcd@oss.sgi.com Fri Sep 14 18:38:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8F1cY809116 for lkcd-outgoing; Fri, 14 Sep 2001 18:38:34 -0700 Received: from calliope1.fm.intel.com (fmfdns01.fm.intel.com [132.233.247.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8F1cQe09098 for ; Fri, 14 Sep 2001 18:38:31 -0700 Received: from fmsmsxvs042.fm.intel.com (fmsmsxv042-1.fm.intel.com [132.233.48.110]) by calliope1.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id BAA06860 for ; Sat, 15 Sep 2001 01:38:25 GMT Received: from fmsmsx27.FM.INTEL.COM ([132.233.42.27]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091418374606541 ; Fri, 14 Sep 2001 18:37:46 -0700 Received: by fmsmsx27.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Fri, 14 Sep 2001 18:38:40 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C5786648488C@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" Cc: "'lkcd@oss.sgi.com'" Subject: RE: Current snapshot Date: Fri, 14 Sep 2001 18:38:21 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, I have retrieved the current CVS image and as you had previously suggested, applied the files in the 2.4 tree to a linux-2.4.8 kernel. I hate to be a whiner, but it didn't build... mostly in the dump_gzip.c file - a few include files weren't as well as a missing "=" sign in the initialization of the dump_compress_t structure. I did get the code to compile eventually, but am now wondering whether you can make my life easier by obsoleting some files. 2.4/driver/block/vmdump.c appears to be obsolete. lkcdutils/scripts/sbin.vmdump ditto lkcdutils/scripts/sysconfig.vmdump ditto There seems to be some sort of disconnect in the new startup scripts as well. In the previous version, the dump parameters were squirted into the dump driver through the /proc interface. Now, lkcd_config is failing an IOCTL on the dump device. Please set me on the true path to enlightenment, happiness and dumping. Many thanks, Richard -- Richard.Schaal@intel.com Intel Corporation Ph: (408)765-1579 Richard Schaal Mail Stop SC12-308 3600 Juliette Lane "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Fri Sep 14 18:58:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8F1wxo09494 for lkcd-outgoing; Fri, 14 Sep 2001 18:58:59 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8F1wte09491 for ; Fri, 14 Sep 2001 18:58:55 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8F1sGO20377; Fri, 14 Sep 2001 18:54:16 -0700 Message-ID: <3BA2B6A2.D30B5F1D@alacritech.com> Date: Fri, 14 Sep 2001 19:02:10 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: "Schaal, Richard" CC: "'lkcd@oss.sgi.com'" Subject: Re: Current snapshot References: <68843F808BE5D311AC6100A0C9C5786648488C@fmsmsx50.fm.intel.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Schaal, Richard" wrote: > > Hi Matt, > I have retrieved the current CVS image and as you had previously suggested, > applied the files in the > 2.4 tree to a linux-2.4.8 kernel. I hate to be a whiner, but it didn't > build... mostly in the dump_gzip.c > file - a few include files weren't as well as a missing "=" sign in the > initialization of the dump_compress_t structure. I know that dump_gzip.c doesn't work yet, but it wasn't intended to be part of the main patch as of yet. I have to convert over a new zlib.h, which isn't done yet. All the rest of the files work, right? I know that you don't have to turn on DUMP_COMPRESS_GZIP for now. > I did get the code to compile eventually, but am now wondering whether you > can make my life easier by obsoleting some files. > > 2.4/driver/block/vmdump.c appears to be obsolete. > lkcdutils/scripts/sbin.vmdump ditto > lkcdutils/scripts/sysconfig.vmdump ditto Sure, I can do that this evening. Should be easy enough. It's about time that all the vmdump.c files get moved. > There seems to be some sort of disconnect in the new startup scripts as > well. In the previous version, the > dump parameters were squirted into the dump driver through the /proc > interface. Now, lkcd_config is failing > an IOCTL on the dump device. What's the failure? The new /sbin/lkcd (no longer /sbin/vmdump) should use /dev/dump (which /sbin/lkcd creates) to run the ioctl. Again, the modules have to be installed, etc. Do you have dump built directly into the kernel, or is it a module? > Please set me on the true path to enlightenment, happiness and dumping. Let me know if this works for you. I'll be around all night. I'm working on the RPM build for 'lkcdutils'. --Matt > Many thanks, > > Richard > > -- > Richard.Schaal@intel.com Intel Corporation > Ph: (408)765-1579 Richard Schaal > Mail Stop SC12-308 > 3600 Juliette Lane > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Sat Sep 15 01:46:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8F8kdw14686 for lkcd-outgoing; Sat, 15 Sep 2001 01:46:39 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8F8kYe14682 for ; Sat, 15 Sep 2001 01:46:34 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8F8p5J12242; Sat, 15 Sep 2001 01:51:05 -0700 Message-ID: <3BA314C1.ADCEFAED@alacritech.com> Date: Sat, 15 Sep 2001 01:43:45 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: bharata@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: Non disruptive dumps -- current work. References: <20010914174403.A1601@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Bharata B Rao wrote: > > Hi Matt, > > We have integrated our changes for non-disruptive dumps into the > dump_silence/dump_resume framework. We have a kind of working version, > but we still need to sort out some issues. > > In our current approach, the dumping cpu sends call function vector ipi > to other cpus to put them to spin. When the dump is complete, other cpus > are released from spin and made to continue. We saw that spinning > with interrupts disabled will not work always as sometimes the disk interrupts > go to the spinning cpus and get lost. This results in the dumping process to > hang. As a work around, we are now enabling the interrupts and making the > local_irq_count zero (to make sure that disk interrupts are not missed > and softirqs are not prevented from running) and restoring the local_irq_count > at the end of spin. This approach is found to work in most cases. > Here the system state during dump can drift to the extent that other cpus > can handle interrupts and softirqs. > > We are not sure if this is a right approach. As an alternative approach, > we are also thinking of changing the irq affinity of disk interrupts > (or all interrupts or some interrupts) to the dumping cpu. Currently we are > tyring this approach. > > Comments ? Trying to separate disk interrupts to the dumping CPU might be a pain. Here's a thought, I haven't tried it yet, and I'm not going to get to it tonight, so it's a weekend project now: If we can temporarily set the cpu_online_map to smp_processor_id() which sets only the dumping CPU, and then call setup_IO_APIC_irqs(), this may do the "right thing" in terms of redirecting interrupts to only our CPU. In theory, this sets up all IRQs to point to the dumping CPU based on cpu_online_map. In arch/i386/kernel/io_apic.c, TARGET_CPUS gets defined as cpu_online_map. So ... again, I haven't tried this yet, and won't until tomorrow at the earliest, but: cli(); saved_cpu_online_map = cpu_online_map; cpu_online_map = smp_processor_id(); setup_IO_APIC_irqs(); cpu_online_map = saved_cpu_online_map; sti(); ... just re-call setup_IO_APIC_irqs() to put things back after dumping. If someone's got a hankering to try it tonight, go for it. Or, of course, if I'm off my rocker here, feel free to mention that as well. :) Thanks, Bharata. --Matt > Regards, > Bharata. From owner-lkcd@oss.sgi.com Sat Sep 15 02:02:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8F92XZ14979 for lkcd-outgoing; Sat, 15 Sep 2001 02:02:33 -0700 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8F92Ue14976 for ; Sat, 15 Sep 2001 02:02:30 -0700 Received: (qmail 16504 invoked from network); 15 Sep 2001 09:02:27 -0000 Received: from ocs3.intra.ocs.com.au (192.168.255.3) by mail.ocs.com.au with SMTP; 15 Sep 2001 09:02:27 -0000 Received: by ocs3.intra.ocs.com.au (Postfix, from userid 16331) id 82AB3300095; Sat, 15 Sep 2001 19:01:34 +1000 (EST) Received: from ocs3.intra.ocs.com.au (localhost [127.0.0.1]) by ocs3.intra.ocs.com.au (Postfix) with ESMTP id 170FA94 for ; Sat, 15 Sep 2001 19:01:33 +1000 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: lkcd@oss.sgi.com Subject: Re: Non disruptive dumps -- current work. In-reply-to: Your message of "Sat, 15 Sep 2001 01:43:45 MST." <3BA314C1.ADCEFAED@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 15 Sep 2001 19:01:28 +1000 Message-ID: <20778.1000544488@ocs3.intra.ocs.com.au> Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Sat, 15 Sep 2001 01:43:45 -0700, "Matt D. Robinson" wrote: > cli(); > saved_cpu_online_map = cpu_online_map; > cpu_online_map = smp_processor_id(); > setup_IO_APIC_irqs(); > cpu_online_map = saved_cpu_online_map; > sti(); setup_IO_APIC_irqs() is defined __init, the code does not exist after boot. Sounds like you need Rusty Russell's hot swap cpu code, http://sourceforge.net/projects/lhcs/. From owner-lkcd@oss.sgi.com Sat Sep 15 15:55:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8FMtIG26771 for lkcd-outgoing; Sat, 15 Sep 2001 15:55:18 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8FMtEe26767 for ; Sat, 15 Sep 2001 15:55:15 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8FMwuJ13192; Sat, 15 Sep 2001 15:58:56 -0700 Message-ID: <3BA3DB73.F731D08F@alacritech.com> Date: Sat, 15 Sep 2001 15:51:31 -0700 From: "Matt D. Robinson" X-Mailer: Mozilla 4.75 [en] (Win98; U) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens CC: lkcd@oss.sgi.com Subject: Re: Non disruptive dumps -- current work. References: <20778.1000544488@ocs3.intra.ocs.com.au> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Keith Owens wrote: > > On Sat, 15 Sep 2001 01:43:45 -0700, > "Matt D. Robinson" wrote: > > cli(); > > saved_cpu_online_map = cpu_online_map; > > cpu_online_map = smp_processor_id(); > > setup_IO_APIC_irqs(); > > cpu_online_map = saved_cpu_online_map; > > sti(); > > setup_IO_APIC_irqs() is defined __init, the code does not exist after > boot. Sounds like you need Rusty Russell's hot swap cpu code, > http://sourceforge.net/projects/lhcs/. Easy enough to remove the __init ... :) Rusty's code is close to what want, but doesn't do enough with the IO-APIC (directly). It's a combination of both, I think. Time to read more of Rusty's code. It looks like most of the __cpu_disable() function can be used, but changed to only go to the dumping CPU (not any active CPU). --Matt From owner-lkcd@oss.sgi.com Mon Sep 17 06:53:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8HDrng04819 for lkcd-outgoing; Mon, 17 Sep 2001 06:53:49 -0700 Received: from eamail1-out.unisys.com (eamail1-out.unisys.com [192.61.61.99]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8HDrle04816 for ; Mon, 17 Sep 2001 06:53:47 -0700 Received: from us-ea-gtwy-6.ea.unisys.com (us-ea-gtwy-6.ea.unisys.com [192.61.146.102]) by eamail1-out.unisys.com (8.9.3/8.9.3) with ESMTP id NAA28660 for ; Mon, 17 Sep 2001 13:52:17 GMT Received: by us-ea-gtwy-6.ea.unisys.com with Internet Mail Service (5.5.2653.19) id ; Mon, 17 Sep 2001 08:53:45 -0500 Message-ID: <245F259ABD41D511A07000D0B71C4CBA1E1287@us-slc-exch-3.slc.unisys.com> From: "Carr, Valerie" To: "'lkcd@oss.sgi.com'" Subject: lkcd kernel_magic error Date: Mon, 17 Sep 2001 08:53:40 -0500 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk I am new at Linux and I'm trying to install the lkcd patch to be able to save off dumps and look at them with lcrash. However, after applying the patch to the kernel and saving off the dump I get the error "kernel_magic mismatch of map and memory image". I'm sure it is something I'm doing, but I do not know what. I also tried lcrash on the running kernel and I still get the same error. I have looked through the mailing list a bit and I cannot find the same problem. I did look at the code and I see where it compares the kernel_magic to _end, and prints out this error when these are not the same. How can I make the two the same? - Do I need to go in and manually modify the numbers? I am applying it to a Caldera 2.4.8 kernel using arch ia64. Any help would be appreciated. Valerie Carr From owner-lkcd@oss.sgi.com Mon Sep 17 08:46:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8HFkfY07024 for lkcd-outgoing; Mon, 17 Sep 2001 08:46:41 -0700 Received: from calliope1.fm.intel.com (fmfdns01.fm.intel.com [132.233.247.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8HFkUe07021 for ; Mon, 17 Sep 2001 08:46:35 -0700 Received: from fmsmsxvs040.fm.intel.com (fmsmsxv040-1.fm.intel.com [132.233.48.108]) by calliope1.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id PAA12351 for ; Mon, 17 Sep 2001 15:46:30 GMT Received: from fmsmsx28.fm.intel.com ([132.233.42.28]) by fmsmsxvs040.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091708460810320 ; Mon, 17 Sep 2001 08:46:08 -0700 Received: by fmsmsx28.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Mon, 17 Sep 2001 08:46:35 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C5786648488E@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" , "Schaal, Richard" Cc: "'lkcd@oss.sgi.com'" Subject: RE: Current snapshot Date: Mon, 17 Sep 2001 08:46:24 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk My mistake. I was having problems realizing that the configuration got split between the /sbin/lkcd script and the /etc/syconfig/dump script and having the script refer to /dev/vmdump confused me further. It appears to be working ok now that I have the configuration corrected. I think the biggest problem I had was to point /dev/dump at the target drive for the dump rather than as the access method to the dump driver. Thanks! Richard -----Original Message----- From: Matt D. Robinson [mailto:yakker@alacritech.com] Sent: Friday, September 14, 2001 7:02 PM To: Schaal, Richard Cc: 'lkcd@oss.sgi.com' Subject: Re: Current snapshot "Schaal, Richard" wrote: > > Hi Matt, > I have retrieved the current CVS image and as you had previously suggested, > applied the files in the > 2.4 tree to a linux-2.4.8 kernel. I hate to be a whiner, but it didn't > build... mostly in the dump_gzip.c > file - a few include files weren't as well as a missing "=" sign in the > initialization of the dump_compress_t structure. I know that dump_gzip.c doesn't work yet, but it wasn't intended to be part of the main patch as of yet. I have to convert over a new zlib.h, which isn't done yet. All the rest of the files work, right? I know that you don't have to turn on DUMP_COMPRESS_GZIP for now. > I did get the code to compile eventually, but am now wondering whether you > can make my life easier by obsoleting some files. > > 2.4/driver/block/vmdump.c appears to be obsolete. > lkcdutils/scripts/sbin.vmdump ditto > lkcdutils/scripts/sysconfig.vmdump ditto Sure, I can do that this evening. Should be easy enough. It's about time that all the vmdump.c files get moved. > There seems to be some sort of disconnect in the new startup scripts as > well. In the previous version, the > dump parameters were squirted into the dump driver through the /proc > interface. Now, lkcd_config is failing > an IOCTL on the dump device. What's the failure? The new /sbin/lkcd (no longer /sbin/vmdump) should use /dev/dump (which /sbin/lkcd creates) to run the ioctl. Again, the modules have to be installed, etc. Do you have dump built directly into the kernel, or is it a module? > Please set me on the true path to enlightenment, happiness and dumping. Let me know if this works for you. I'll be around all night. I'm working on the RPM build for 'lkcdutils'. --Matt > Many thanks, > > Richard > > -- > Richard.Schaal@intel.com Intel Corporation > Ph: (408)765-1579 Richard Schaal > Mail Stop SC12-308 > 3600 Juliette Lane > "I can type faster than I think!" Santa Clara, CA 95052 From owner-lkcd@oss.sgi.com Mon Sep 17 14:37:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8HLbTk13766 for lkcd-outgoing; Mon, 17 Sep 2001 14:37:29 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8HLbQe13763 for ; Mon, 17 Sep 2001 14:37:26 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8HLWnO08978; Mon, 17 Sep 2001 14:32:49 -0700 Message-ID: <3BA66DE0.E14ADCCA@alacritech.com> Date: Mon, 17 Sep 2001 14:40:48 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: "Carr, Valerie" CC: "'lkcd@oss.sgi.com'" Subject: Re: lkcd kernel_magic error References: <245F259ABD41D511A07000D0B71C4CBA1E1287@us-slc-exch-3.slc.unisys.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Carr, Valerie" wrote: > > I am new at Linux and I'm trying to install the lkcd patch to be able to > save off dumps and look at them with lcrash. However, after applying the > patch to the kernel and saving off the dump I get the error "kernel_magic > mismatch of map and memory image". I'm sure it is something I'm doing, but > I do not know what. I also tried lcrash on the running kernel and I still > get the same error. I have looked through the mailing list a bit and I > cannot find the same problem. This is almost always a mismatch between your System.map file and the vmdump image you are using. If they don't match, this error occurs. The kernel_magic and _end are supposed to match, because the value is saved in the memory image to match what's in the System.map. That way there's a direct correlation between the two files. > I did look at the code and I see where it compares the kernel_magic to _end, > and prints out this error when these are not the same. How can I make the > two the same? - Do I need to go in and manually modify the numbers? Nope. If they don't match, the map file wasn't generated against that crash dump. You need to make sure both match -- copy the System.map over by hand if you need to. If you're building your own kernel, make sure both get copied to your boot directory, and that the name of the map file is System.map. > I am applying it to a Caldera 2.4.8 kernel using arch ia64. > > Any help would be appreciated. > > Valerie Carr Hmmm, ia64 ... that should still work. If it doesn't, let me know. When I was at SGI a few years ago, I put in that code and tried it out, and it worked fine. Nothing _should_ have changed, but then again, something may have messed up along the way. Anyway, verify that and let me know. --Matt From owner-lkcd@oss.sgi.com Mon Sep 17 16:41:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8HNfqT16737 for lkcd-outgoing; Mon, 17 Sep 2001 16:41:52 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8HNfke16734 for ; Mon, 17 Sep 2001 16:41:46 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8HNavO13118; Mon, 17 Sep 2001 16:36:58 -0700 Message-ID: <3BA68AF8.1CC7259A@alacritech.com> Date: Mon, 17 Sep 2001 16:44:56 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Keith Owens , lkcd@oss.sgi.com Subject: Re: Non disruptive dumps -- current work. References: <20778.1000544488@ocs3.intra.ocs.com.au> <3BA3DB73.F731D08F@alacritech.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Matt D. Robinson" wrote: > > Keith Owens wrote: > > > > On Sat, 15 Sep 2001 01:43:45 -0700, > > "Matt D. Robinson" wrote: > > > cli(); > > > saved_cpu_online_map = cpu_online_map; > > > cpu_online_map = smp_processor_id(); > > > setup_IO_APIC_irqs(); > > > cpu_online_map = saved_cpu_online_map; > > > sti(); > > > > setup_IO_APIC_irqs() is defined __init, the code does not exist after > > boot. Sounds like you need Rusty Russell's hot swap cpu code, > > http://sourceforge.net/projects/lhcs/. > > Easy enough to remove the __init ... :) > > Rusty's code is close to what want, but doesn't do enough with the > IO-APIC (directly). It's a combination of both, I think. > > Time to read more of Rusty's code. It looks like most of the > __cpu_disable() function can be used, but changed to only go to > the dumping CPU (not any active CPU). > > --Matt Just an FYI for folks following this, the setup_IO_APIC_irqs() mechanism doesn't work (it fails miserably, actually). I'm going to try something similar to the following: int __dump_cpu_disable(unsigned int dumping_cpu) { int i; unsigned long val, cpu = smp_processor_id(); if ((cpu == 0) || (cpu == dumping_cpu)) return -EINVAL; clear_bit(cpu, &cpu_online_map); mb(); /* first, move everyone to the dumping CPU */ for (i = 0; i < NR_IRQS; i++) { if (irq_desc[i].handler == NULL) continue; val = irq_affinity[i]; if (val & (1 << cpu)) { if (!(val & cpu_online_map)) val = (1 << dumping_cpu); else val = val & ~(1 << cpu); irq_affinity[i] = val; if (irq_desc[i].handler->set_affinity != NULL) irq_desc[i].handler->set_affinity(i, val); } } cli(); sti(); return 0; } That last wierd cli()/sti() is supposed to catch a case where IRQs may be running with interrupts disabled, and they haven't quite flushed out yet. Note that this doesn't take care of spinning the CPUs. Something else has to be added for that case. Most of this comes from Rusty's __cpu_disable(), but changed so that it doesn't randomly pass irqs to any CPU, and takes a dumping CPU argument. --Matt From owner-lkcd@oss.sgi.com Mon Sep 17 17:26:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I0Qkl17527 for lkcd-outgoing; Mon, 17 Sep 2001 17:26:46 -0700 Received: from calliope1.fm.intel.com (fmfdns01.fm.intel.com [132.233.247.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I0QZe17521 for ; Mon, 17 Sep 2001 17:26:40 -0700 Received: from fmsmsxvs042.fm.intel.com (fmsmsxv042-1.fm.intel.com [132.233.48.110]) by calliope1.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id AAA01295 for ; Tue, 18 Sep 2001 00:26:35 GMT Received: from fmsmsx19.fm.intel.com ([132.233.222.210]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091717260015381 ; Mon, 17 Sep 2001 17:26:00 -0700 Received: by fmsmsx19.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Mon, 17 Sep 2001 17:26:34 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C5786648488F@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" , "Schaal, Richard" Cc: "'lkcd@oss.sgi.com'" Subject: Suggestion Box + SMP blues Date: Mon, 17 Sep 2001 17:26:28 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi Matt, In doing my development and testing, now that the dump recovery seems to be working, I find my disk is filling pretty rapidly because the same dump is recovered more than once. - I dump to a separate device and not to the swap area. I wonder if the dump save step shouldn't set some sort of flag in the dump header on the dump device that would say this is a stale dump, which might need some --force flag in order to "save" it again. I seem to be dumping ok on my SMP system when I have a relatively simple "oops" to cause a dump for testing, but with increased activity and possible multiple processor panics, the dump is still failing with the console messages pretty well scrambled - apparently messages being intermixed from multiple processors. Here's a sample... Red Hat Linux release 7.1 (Seawolf) Kernel 2.4.8 on an 8-processor i686 dopey login: (scsi0:A:1:0): Locking max tag count at 64 U1b nUnab11U>nalb1e1U bln> l>UethneoaaUU tnb tnhblaalbnoe de l ltaeehtobaloaon n edkdlth e hhrataonod enad llneee lnl kkhhed a el aereN krknnkndeederlUelnlrn neerLNLe e lUnl k Lp ekeNLoleUiernL lnLpNrtNUUoneee iLlNl LL UL nr tpNLN pdpUeLoroLLi ni tpopniao dneio tntreaLt te ir ndtrtev re e prirdvdo iredee tefnedrriuarertteenfeelcerf ur eaarel fre eefddrdeeeraerrernene aenftnecsccd cereseevdire nre at tca uetvsaa0l it sa0 vr ia00tdvrdt0ua0rut0i0 l0rvt00 ia0ea0rdlu0tdsua s0ard l0 e s0dal s0 pi p0r0:pp 0ri0ni0ent0:s0ts0i0ie s0nn0s g0 p 0g0e0i0e 0s00s0p e000:0000 0ir 00 i pp0n0tr:000ii n0n0 c820 npg pr ieren i sipen pttrs:iii pnnr0tg0i0i n pg:e0ini nd00*ep0 =00c 1001090040619e06010c8 0 0 041908810040g :i p00c i910118:p1 144:81 *=dppe9c8990*1=eee0 0> *p = ==0dp<*ed0p d10e0 e 0==0 =0>0O20c 00 000001101 104100dppd0*d0ep0e ede = = =* =0000P : C 0*=0 *00p1p00d0e 0 0>*0=p0d 0d0e0e : Oddly enough, if you take every third or fourth character, you can assemble some of the common error messages. :-) I'll take a look at the panic and dump path to see if there's a window of opportunity for the processors to wander about after a panic. Regards, Richard From owner-lkcd@oss.sgi.com Mon Sep 17 22:43:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I5h1122400 for lkcd-outgoing; Mon, 17 Sep 2001 22:43:01 -0700 Received: from e1.ny.us.ibm.com (e1.ny.us.ibm.com [32.97.182.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I5gue22397 for ; Mon, 17 Sep 2001 22:42:56 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e1.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id BAA336986; Tue, 18 Sep 2001 01:40:24 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8I5adA83140; Tue, 18 Sep 2001 01:36:39 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8I5hjU01532; Tue, 18 Sep 2001 11:13:45 +0530 Date: Tue, 18 Sep 2001 11:13:45 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: lkcd@oss.sgi.com Subject: Re: Non disruptive dumps -- current work. Message-ID: <20010918111345.A1247@in.ibm.com> Reply-To: bharata@in.ibm.com References: <20778.1000544488@ocs3.intra.ocs.com.au> <3BA3DB73.F731D08F@alacritech.com> <3BA68AF8.1CC7259A@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BA68AF8.1CC7259A@alacritech.com>; from yakker@alacritech.com on Mon, Sep 17, 2001 at 04:44:56PM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, Sep 17, 2001 at 04:44:56PM -0700, Matt D. Robinson wrote: > Just an FYI for folks following this, the setup_IO_APIC_irqs() > mechanism doesn't work (it fails miserably, actually). I'm going > to try something similar to the following: > > int __dump_cpu_disable(unsigned int dumping_cpu) > { > int i; > unsigned long val, cpu = smp_processor_id(); > if ((cpu == 0) || (cpu == dumping_cpu)) return -EINVAL; > clear_bit(cpu, &cpu_online_map); > mb(); > /* first, move everyone to the dumping CPU */ > for (i = 0; i < NR_IRQS; i++) { > if (irq_desc[i].handler == NULL) continue; > val = irq_affinity[i]; > if (val & (1 << cpu)) { > if (!(val & cpu_online_map)) > val = (1 << dumping_cpu); > else > val = val & ~(1 << cpu); > irq_affinity[i] = val; > if (irq_desc[i].handler->set_affinity != NULL) > irq_desc[i].handler->set_affinity(i, val); > } > } > > cli(); > sti(); > return 0; > } Will this fuction be called for each cpu ? If so won't you be going through the loop of NR_IRQS for all cpus ? Can't the affinities of all irqs be changed to that of dumping cpu at one go ? Something like this, { . . int cpu = smp_processor_id(); for (i = 0; i < NR_IRQS; i++) { if (irq_desc[i].handler == NULL) continue; irq_affinity[i] = 1UL << cpu; if (irq_desc[i].handler->set_affinity != NULL) irq_desc[i].handler->set_affinity(i, irq_affinity[i]); } } > > That last wierd cli()/sti() is supposed to catch a case where > IRQs may be running with interrupts disabled, and they haven't > quite flushed out yet. > > Note that this doesn't take care of spinning the CPUs. > Something else has to be added for that case. Most of this > comes from Rusty's __cpu_disable(), but changed so that it > doesn't randomly pass irqs to any CPU, and takes a dumping > CPU argument. > > --Matt Regards, Bharata. -- Bharata B Rao, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Mon Sep 17 23:05:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I65kT22610 for lkcd-outgoing; Mon, 17 Sep 2001 23:05:46 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I65ie22607 for ; Mon, 17 Sep 2001 23:05:44 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8I65cm25892; Mon, 17 Sep 2001 23:05:39 -0700 Date: Mon, 17 Sep 2001 23:05:38 -0700 (PDT) From: "Matt D. Robinson" To: Bharata B Rao cc: "Matt D. Robinson" , Subject: Re: Non disruptive dumps -- current work. In-Reply-To: <20010918111345.A1247@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Tue, 18 Sep 2001, Bharata B Rao wrote: |>On Mon, Sep 17, 2001 at 04:44:56PM -0700, Matt D. Robinson wrote: |>Will this fuction be called for each cpu ? If so won't you be going through |>the loop of NR_IRQS for all cpus ? Can't the affinities of all irqs be changed |>to that of dumping cpu at one go ? Something like this, |> |>{ |> . |> . |> int cpu = smp_processor_id(); |> for (i = 0; i < NR_IRQS; i++) { |> if (irq_desc[i].handler == NULL) |> continue; |> irq_affinity[i] = 1UL << cpu; |> if (irq_desc[i].handler->set_affinity != NULL) |> irq_desc[i].handler->set_affinity(i, irq_affinity[i]); |> } |>} This is fine ... it makes it a single call rather than an smp_call_function() mechanism. |>Regards, |>Bharata. Thanks for the heads-up, as always. --Matt From owner-lkcd@oss.sgi.com Mon Sep 17 23:07:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I67Yi22637 for lkcd-outgoing; Mon, 17 Sep 2001 23:07:34 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I67Re22633 for ; Mon, 17 Sep 2001 23:07:27 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8I6BwI25909; Mon, 17 Sep 2001 23:11:58 -0700 Date: Mon, 17 Sep 2001 23:11:58 -0700 (PDT) From: "Matt D. Robinson" To: "Schaal, Richard" cc: "'Matt D. Robinson'" , "'lkcd@oss.sgi.com'" Subject: Re: Suggestion Box + SMP blues In-Reply-To: <68843F808BE5D311AC6100A0C9C5786648488F@fmsmsx50.fm.intel.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, 17 Sep 2001, Schaal, Richard wrote: |>Hi Matt, |> |>In doing my development and testing, now that the dump recovery seems to be |>working, I find my disk is filling |>pretty rapidly because the same dump is recovered more than once. - I dump |>to a separate device and not to the |>swap area. I wonder if the dump save step shouldn't set some sort of flag |>in the dump header on the dump device |>that would say this is a stale dump, which might need some --force flag in |>order to "save" it again. This should always be done. A special flag overwrites something in the dump header to prevent re-saves from taking place (well, let me re-state that ... re-saves can take place, but they won't go too far, they'll stop as soon as they see the dump header's magic number overwritten). |>I seem to be dumping ok on my SMP system when I have a relatively simple |>"oops" to cause a dump for testing, but with increased activity and possible |>multiple processor panics, the dump is still failing with the console |>messages pretty well scrambled - apparently messages being intermixed from |>multiple processors. Here's a sample... I've never seen anything scrambled like this before. This is absolutely bizarre. Are you re-directing your console output or klogd/syslogd to this console? Are you doing anything special in your kernel builds related to serial consoles? Is the crash taking place in any console code? This is very wierd to me. Of course, I don't have an 8P to test it on, but if you have a spare one available, I'm more than willing to help. :) Seriously, let me know what you're doing to crash the system so I can help provide more details. I've seriously never seen a console this jumbled before. It's like you've got something redirected in character/raw mode to the console. |>Red Hat Linux release 7.1 (Seawolf) |>Kernel 2.4.8 on an 8-processor i686 |> |>dopey login: (scsi0:A:1:0): Locking max tag count at 64 |>U1b nUnab11U>nalb1e1U bln> l>UethneoaaUU tnb |>tnhblaalbnoe |>de l ltaeehtobaloaon n edkdlth e hhrataonod enad llneee lnl kkhhed a el |>aereN |>krknnkndeederlUelnlrn neerLNLe e lUnl k Lp ekeNLoleUiernL lnLpNrtNUUoneee |>iLlNl |>LL UL nr tpNLN pdpUeLoroLLi ni tpopniao |>dneio tntreaLt te ir ndtrtev re e prirdvdo iredee |>tefnedrriuarertteenfeelcerf ur |>eaarel fre eefddrdeeeraerrernene aenftnecsccd cereseevdire nre at tca |>uetvsaa0l |>it sa0 vr ia00tdvrdt0ua0rut0i0 l0rvt00 ia0ea0rdlu0tdsua s0ard |>l0 e |> s0dal s0 pi p0r0:pp |>0ri0ni0ent0:s0ts0i0ie |> |>s0nn0s g0 p 0g0e0i0e 0s00s0p e000:0000 |>0ir |>00 i |>pp0n0tr:000ii |>n0n0 |> |>c820 |>npg pr ieren i sipen pttrs:iii pnnr0tg0i0i n |>pg:e0ini nd00*ep0 =00c |>1001090040619e06010c8 |>0 |>0 |>041908810040g :i p00c |>i910118:p1 |>144:81 |>*=dppe9c8990*1=eee0 0> *p = |> ==0dp<*ed0p d10e0 e 0==0 =0>0O20c |>00 |>000001101 |>104100dppd0*d0ep0e |>ede = = =* =0000P : C 0*=0 *00p1p00d0e |>0 |>0>*0=p0d 0d0e0e |>: |> |>Oddly enough, if you take every third or fourth character, you can assemble |>some of the common |>error messages. :-) |> |>I'll take a look at the panic and dump path to see if there's a window of |>opportunity for the processors to |>wander about after a panic. There is the possibility, but the printk()s shouldn't criss-cross like this. |>Regards, |>Richard Thanks, Richard. Let me know. --Matt From owner-lkcd@oss.sgi.com Tue Sep 18 00:56:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I7u0d24640 for lkcd-outgoing; Tue, 18 Sep 2001 00:56:00 -0700 Received: from babel.spoiled.org (babel.spoiled.org [212.84.234.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I7tve24637 for ; Tue, 18 Sep 2001 00:55:57 -0700 Received: (qmail 18222 invoked by uid 8); 18 Sep 2001 07:55:56 -0000 From: thomas graichen Reply-To: thomas graichen X-Newsgroups: spoiled.linux.sgi.lkcd Subject: can't enlarge buffer because scanner uses REJECT Date: Tue, 18 Sep 2001 09:54:47 +0200 Organization: spoiled dot org Lines: 25 Distribution: local Message-ID: Reply-To: thomas graichen X-Complaints-To: newsmaster@spoiled.org User-Agent: tin/1.4.4-20000803 ("Vet for the Insane") (UNIX) (Linux/2.4.9-xfs (i686)) To: lkcd@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk i get the following error then trying to run lcrash after an initated panic including crashdump-saving on the resulting crashdump on an 1gb memory machine - kernel is 2.2.19ext3 (i.e. with the ext3 patch applied - have to stay at 2.2 for now btw. - so 2.4 is no option at the moment): one:/var/log/vmdump # lcrash map.0 vmdump.0 kerntypes.0 map = map.0, vmdump = vmdump.0, outfile = stdout, kerntypes = kerntypes.0 Please wait............. input buffer overflow, can't enlarge buffer because scanner uses REJECT one:/var/log/vmdump # looks like something is too big - any chance to work around this or any idea of a fix for this problem? is this a known problem? a lot of thanks in advance t -- thomas graichen ... perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. --- antoine de saint-exupery From owner-lkcd@oss.sgi.com Tue Sep 18 02:32:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8I9WDZ26431 for lkcd-outgoing; Tue, 18 Sep 2001 02:32:13 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8I9W9e26428 for ; Tue, 18 Sep 2001 02:32:10 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8I9ah926251; Tue, 18 Sep 2001 02:36:43 -0700 Date: Tue, 18 Sep 2001 02:36:43 -0700 (PDT) From: "Matt D. Robinson" To: thomas graichen cc: Subject: Re: can't enlarge buffer because scanner uses REJECT In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk We just had a post or two on this topic. Make sure your flex is really flex and doesn't get called with -l. Check the archives on oss.sgi.com at: http://oss.sgi.com/cgi-bin/archive/lkcd Look in the August 01 archive. --Matt On Tue, 18 Sep 2001, thomas graichen wrote: |>i get the following error then trying to run lcrash after an initated |>panic including crashdump-saving on the resulting crashdump on an |>1gb memory machine - kernel is 2.2.19ext3 (i.e. with the ext3 |>patch applied - have to stay at 2.2 for now btw. - so 2.4 is |>no option at the moment): |> |> one:/var/log/vmdump # lcrash map.0 vmdump.0 kerntypes.0 |> map = map.0, vmdump = vmdump.0, outfile = stdout, kerntypes = |> kerntypes.0 |> |> Please wait............. |> input buffer overflow, can't enlarge buffer because scanner uses REJECT |> one:/var/log/vmdump # |> |>looks like something is too big - any chance to work around this |>or any idea of a fix for this problem? is this a known problem? |> |>a lot of thanks in advance |> |>t |> |>-- |>thomas graichen ... perfection is reached, not |>when there is no longer anything to add, but when there is no |>longer anything to take away. --- antoine de saint-exupery |> From owner-lkcd@oss.sgi.com Tue Sep 18 05:09:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8IC9uI29474 for lkcd-outgoing; Tue, 18 Sep 2001 05:09:56 -0700 Received: from dc-mx08.cluster1.charter.net (dc-mx08.cluster0.hsacorp.net [209.225.8.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8IC9qe29470 for ; Tue, 18 Sep 2001 05:09:52 -0700 Received: from [24.240.184.228] (HELO charter.net) by dc-mx08.cluster1.charter.net (CommuniGate Pro SMTP 3.4.6) with ESMTP id 28985789; Tue, 18 Sep 2001 08:16:45 -0400 Message-ID: <3BA738CF.4010100@charter.net> Date: Tue, 18 Sep 2001 08:06:39 -0400 From: Luc Chouinard User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2) Gecko/20010726 Netscape6/6.1 X-Accept-Language: en-us MIME-Version: 1.0 To: "Matt D. Robinson" CC: thomas graichen , lkcd@oss.sgi.com Subject: Re: can't enlarge buffer because scanner uses REJECT References: Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk The digest of it is... With the latest bits this problem should go away. The Makefile in libsial now calls flex directly instead of lex, which on some linux distributions called flex -l. Make sure you have the latest bits from libsial including the Makefile. Matt D. Robinson wrote: >We just had a post or two on this topic. Make sure your >flex is really flex and doesn't get called with -l. Check >the archives on oss.sgi.com at: > > http://oss.sgi.com/cgi-bin/archive/lkcd > >Look in the August 01 archive. > >--Matt > >On Tue, 18 Sep 2001, thomas graichen wrote: >|>i get the following error then trying to run lcrash after an initated >|>panic including crashdump-saving on the resulting crashdump on an >|>1gb memory machine - kernel is 2.2.19ext3 (i.e. with the ext3 >|>patch applied - have to stay at 2.2 for now btw. - so 2.4 is >|>no option at the moment): >|> >|> one:/var/log/vmdump # lcrash map.0 vmdump.0 kerntypes.0 >|> map = map.0, vmdump = vmdump.0, outfile = stdout, kerntypes = >|> kerntypes.0 >|> >|> Please wait............. >|> input buffer overflow, can't enlarge buffer because scanner uses REJECT >|> one:/var/log/vmdump # >|> >|>looks like something is too big - any chance to work around this >|>or any idea of a fix for this problem? is this a known problem? >|> >|>a lot of thanks in advance >|> >|>t >|> >|>-- >|>thomas graichen ... perfection is reached, not >|>when there is no longer anything to add, but when there is no >|>longer anything to take away. --- antoine de saint-exupery >|> > > From owner-lkcd@oss.sgi.com Tue Sep 18 06:09:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8ID9sB30576 for lkcd-outgoing; Tue, 18 Sep 2001 06:09:54 -0700 Received: from babel.spoiled.org (babel.spoiled.org [212.84.234.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8ID9pe30573 for ; Tue, 18 Sep 2001 06:09:51 -0700 Received: (qmail 20421 invoked by uid 8); 18 Sep 2001 13:09:50 -0000 From: thomas graichen Reply-To: thomas graichen X-Newsgroups: spoiled.linux.sgi.lkcd Subject: Re: can't enlarge buffer because scanner uses REJECT Date: Tue, 18 Sep 2001 14:51:20 +0200 Organization: spoiled dot org Lines: 19 Distribution: local Message-ID: References: Reply-To: thomas graichen X-Complaints-To: newsmaster@spoiled.org User-Agent: tin/1.4.4-20000803 ("Vet for the Insane") (UNIX) (Linux/2.4.9-xfs (i686)) To: lkcd@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Matt D. Robinson" wrote: > We just had a post or two on this topic. Make sure your > flex is really flex and doesn't get called with -l. Check > the archives on oss.sgi.com at: > http://oss.sgi.com/cgi-bin/archive/lkcd > Look in the August 01 archive. i vaguely remembered something like this but did not find it on my first quick look ... yes and indeed on this system lex is a script calling flex -l ... thanks - that should be it t -- thomas graichen ... perfection is reached, not when there is no longer anything to add, but when there is no longer anything to take away. --- antoine de saint-exupery From owner-lkcd@oss.sgi.com Tue Sep 18 15:05:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8IM5Ie09069 for lkcd-outgoing; Tue, 18 Sep 2001 15:05:18 -0700 Received: from calliope1.fm.intel.com (fmfdns01.fm.intel.com [132.233.247.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8IM51e09063 for ; Tue, 18 Sep 2001 15:05:06 -0700 Received: from fmsmsxvs042.fm.intel.com (fmsmsxv042-1.fm.intel.com [132.233.48.110]) by calliope1.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id WAA24001 for ; Tue, 18 Sep 2001 22:05:00 GMT Received: from fmsmsx27.FM.INTEL.COM ([132.233.42.27]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091815042523886 ; Tue, 18 Sep 2001 15:04:25 -0700 Received: by fmsmsx27.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Tue, 18 Sep 2001 15:05:21 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C57866484891@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'Matt D. Robinson'" Cc: "'lkcd@oss.sgi.com'" Subject: RE: Suggestion Box + SMP blues Date: Tue, 18 Sep 2001 15:04:52 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk To respond to your questions, I'm capturing the console output through a serial port. The serial console is set up in the ordinary manner through lilo and boot time arguments. The panic is occurring in the file system, not in the console code. I'm checking whether I can send the test script out so you can see the activity generated. I've been looking at your code in dump_silence_system() . Let's say that CPU 3 wants to panic and take a dump. It calls dump_silence_system which calls __dump_silence_system which calls smp_call_function(dump_stop_cpu...) this causes cpu 1 and 2 to execute the HLT loop, cpu 0 comes in and then leaves, and cpu 3 then goes off to try to do the dump while cpu 0 is busy doing whatever it was doing - possibly blurring the dump image. I don't think that was what you had in mind. Did you want CPU 0 to do the dump? If so, we'll have to send him to the dump routine directly after getting his attention with the smp_call_function() call. I've hooked up a call to do the dump from the unknown NMI routine - I have an NMI switch on the front panel of my box. I'm figuring if we can dump from the NMI, then dumping after the smp_call_function() call should also work ok. So far, I managed to park the other CPUs and start dumping the header - according to printk, but then I hit a null pointer in schedule(). Regards, Richard -----Original Message----- From: Matt D. Robinson [mailto:yakker@aparity.com] Sent: Monday, September 17, 2001 11:12 PM To: Schaal, Richard Cc: 'Matt D. Robinson'; 'lkcd@oss.sgi.com' Subject: Re: Suggestion Box + SMP blues On Mon, 17 Sep 2001, Schaal, Richard wrote: |>Hi Matt, |> |>In doing my development and testing, now that the dump recovery seems to be |>working, I find my disk is filling |>pretty rapidly because the same dump is recovered more than once. - I dump |>to a separate device and not to the |>swap area. I wonder if the dump save step shouldn't set some sort of flag |>in the dump header on the dump device |>that would say this is a stale dump, which might need some --force flag in |>order to "save" it again. This should always be done. A special flag overwrites something in the dump header to prevent re-saves from taking place (well, let me re-state that ... re-saves can take place, but they won't go too far, they'll stop as soon as they see the dump header's magic number overwritten). |>I seem to be dumping ok on my SMP system when I have a relatively simple |>"oops" to cause a dump for testing, but with increased activity and possible |>multiple processor panics, the dump is still failing with the console |>messages pretty well scrambled - apparently messages being intermixed from |>multiple processors. Here's a sample... I've never seen anything scrambled like this before. This is absolutely bizarre. Are you re-directing your console output or klogd/syslogd to this console? Are you doing anything special in your kernel builds related to serial consoles? Is the crash taking place in any console code? This is very wierd to me. Of course, I don't have an 8P to test it on, but if you have a spare one available, I'm more than willing to help. :) Seriously, let me know what you're doing to crash the system so I can help provide more details. I've seriously never seen a console this jumbled before. It's like you've got something redirected in character/raw mode to the console. |>Red Hat Linux release 7.1 (Seawolf) |>Kernel 2.4.8 on an 8-processor i686 |> |>dopey login: (scsi0:A:1:0): Locking max tag count at 64 |>U1b nUnab11U>nalb1e1U bln> l>UethneoaaUU tnb |>tnhblaalbnoe |>de l ltaeehtobaloaon n edkdlth e hhrataonod enad llneee lnl kkhhed a el |>aereN |>krknnkndeederlUelnlrn neerLNLe e lUnl k Lp ekeNLoleUiernL lnLpNrtNUUoneee |>iLlNl |>LL UL nr tpNLN pdpUeLoroLLi ni tpopniao |>dneio tntreaLt te ir ndtrtev re e prirdvdo iredee |>tefnedrriuarertteenfeelcerf ur |>eaarel fre eefddrdeeeraerrernene aenftnecsccd cereseevdire nre at tca |>uetvsaa0l |>it sa0 vr ia00tdvrdt0ua0rut0i0 l0rvt00 ia0ea0rdlu0tdsua s0ard |>l0 e |> s0dal s0 pi p0r0:pp |>0ri0ni0ent0:s0ts0i0ie |> |>s0nn0s g0 p 0g0e0i0e 0s00s0p e000:0000 |>0ir |>00 i |>pp0n0tr:000ii |>n0n0 |> |>c820 |>npg pr ieren i sipen pttrs:iii pnnr0tg0i0i n |>pg:e0ini nd00*ep0 =00c |>1001090040619e06010c8 |>0 |>0 |>041908810040g :i p00c |>i910118:p1 |>144:81 |>*=dppe9c8990*1=eee0 0> *p = |> ==0dp<*ed0p d10e0 e 0==0 =0>0O20c |>00 |>000001101 |>104100dppd0*d0ep0e |>ede = = =* =0000P : C 0*=0 *00p1p00d0e |>0 |>0>*0=p0d 0d0e0e |>: |> |>Oddly enough, if you take every third or fourth character, you can assemble |>some of the common |>error messages. :-) |> |>I'll take a look at the panic and dump path to see if there's a window of |>opportunity for the processors to |>wander about after a panic. There is the possibility, but the printk()s shouldn't criss-cross like this. |>Regards, |>Richard Thanks, Richard. Let me know. --Matt From owner-lkcd@oss.sgi.com Tue Sep 18 23:05:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8J651o18354 for lkcd-outgoing; Tue, 18 Sep 2001 23:05:01 -0700 Received: from thalia.fm.intel.com (fmfdns02.fm.intel.com [132.233.247.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8J64re18351 for ; Tue, 18 Sep 2001 23:04:58 -0700 Received: from fmsmsxvs041.fm.intel.com (fmsmsxv041-1.fm.intel.com [132.233.48.109]) by thalia.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id AAA04189 for ; Wed, 19 Sep 2001 00:55:08 GMT Received: from fmsmsx17.intel.com ([132.233.58.209]) by fmsmsxvs041.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091817564428682 for ; Tue, 18 Sep 2001 17:56:44 -0700 Received: by fmsmsx17.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Tue, 18 Sep 2001 17:54:39 -0700 Message-ID: <68843F808BE5D311AC6100A0C9C57866484893@fmsmsx50.fm.intel.com> From: "Schaal, Richard" To: "'lkcd@oss.sgi.com'" Subject: Non Disruptive Dumps - Question Date: Tue, 18 Sep 2001 17:52:37 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk I'm curious as to the external behaviour one would expect to see when taking a non-disruptive dump. Would you be able to start the dump and then continue working on your application while the dump continues? - Don't laugh - we could do that at Stratus because of the mirrored memory. In the first cut on a general purpose system, I would expect to be able to start a dump - the system would freeze during the dump, and then when complete, the system would be responsive once more. - and not require a reboot. The reason I ask this, is that I see you folks poking about the IO_APIC area , and I think you might be thinking about directing interrupts from all sources to the one CPU that we want running in order to take the dump. I'm coming from the other direction thinking that I don't want any interrupts at all during the whole dump process. Which is easier? Would one technique produce a better dump than the other? Is freezing the system for the duration of the dump going to cause dropped connections? - is that why you want to be servicing interrupts? If you service interrupts for I/O chances are that you will blur the dump. Bottom line - I wonder if it is easier to status drive the disk controller or redirect and then restore interrupt routing on the fly. I look forward to your views - Richard From owner-lkcd@oss.sgi.com Wed Sep 19 06:13:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8JDDaw26838 for lkcd-outgoing; Wed, 19 Sep 2001 06:13:36 -0700 Received: from d06lmsgate-2.uk.ibm.com (d06lmsgate-2.uk.ibm.com [195.212.29.2]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8JDDXe26835 for ; Wed, 19 Sep 2001 06:13:34 -0700 Received: from d06relay01.portsmouth.uk.ibm.com (d06relay01.portsmouth.uk.ibm.com [9.166.84.147]) by d06lmsgate-2.uk.ibm.com (1.0.0) with ESMTP id NAA41710; Wed, 19 Sep 2001 13:55:51 +0100 Received: from d06ml023.portsmouth.uk.ibm.com (d06ml023_cs0 [9.180.35.10]) by d06relay01.portsmouth.uk.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8JDD5335432; Wed, 19 Sep 2001 14:13:05 +0100 Subject: Re: Non Disruptive Dumps - Question To: "Schaal, Richard" Cc: "'lkcd@oss.sgi.com'" X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 Message-ID: From: "Richard J Moore" Date: Wed, 19 Sep 2001 11:29:40 +0100 X-MIMETrack: Serialize by Router on D06ML023/06/M/IBM(Release 5.0.8 |June 18, 2001) at 19/09/2001 14:13:06 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Richard, what we would like to have eventually is an option to select the level of system quiescing. This will become very pertinent when we move on to allowing selected memory objects to be snapshot - in both one or more user spaces and system space. Whether or not the level of quiescing can be automated according to context needs thinking about. From experience of system dumping technologies on other operating systems, it does seem a practical option to freeze by process in the case of user space. Given the non-preemptable nature of the kernel it's not so easy to do this for kernel space, but not impossible; it all depends on whether there is a user context associated with a particular object and whether there is any process level serialisation for such objects. A fully preemptable kernel would make this much easier because serialisation would be much more granular. Another option, which we used on other operating systems, is to lock the kernel for a particular component, then yield before locking and dumping the next component. Yes, you can get inconsistencies between the data for each component, but that's not necessarily a bad thing. Richard Moore - RAS Project Lead - Linux Technology Centre (ATS-PIC). http://oss.software.ibm.com/developerworks/opensource/linux Office: (+44) (0)1962-817072, Mobile: (+44) (0)7768-298183 IBM UK Ltd, MP135 Galileo Centre, Hursley Park, Winchester, SO21 2JN, UK From owner-lkcd@oss.sgi.com Wed Sep 19 09:58:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8JGwAb32376 for lkcd-outgoing; Wed, 19 Sep 2001 09:58:10 -0700 Received: from ausmtp02.au.ibm.com (ausmtp02.au.ibm.COM [202.135.136.105]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8JGvpe32370 for ; Wed, 19 Sep 2001 09:57:52 -0700 Received: from f02n16e.au.ibm.com by ausmtp02.au.ibm.com (IBM AP 2.0) with ESMTP id f8JGtDw465400 for ; Thu, 20 Sep 2001 02:55:13 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n16e.au.ibm.com (8.11.1m3/NCO v4.98) with SMTP id f8JGvJj118602 for ; Thu, 20 Sep 2001 02:57:20 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256ACC.005D27C2 ; Thu, 20 Sep 2001 02:57:30 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: bsuparna@in.ibm.com To: "Schaal, Richard" cc: "'lkcd@oss.sgi.com'" Message-ID: Date: Wed, 19 Sep 2001 20:54:46 +0500 Subject: Re: Non Disruptive Dumps - Question Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Valid observations ! As Richard Moore mentions, the direction we are looking at would be to eventually have the degree of system quiescing configurable, because of the tradeoffs inherent in the various choices that we make. If we try to freeze everything, we do affect nomal system operation (at least timing sensitive operations) as you've observed and that may be an issue depending on the environment. If we don't freeze things, then we have a drift in the dump image. The more active the system is, the greater could be the drift. If we freeze things exactly as is at the instant of dumping we get an accurate snapshot, but this snapshot may contain data structures that are in an inconsistent state which could make interpretation a little difficult. In some situations an experienced debugger may even utilize the drift to an advantage to interpret some things about the state of the system. Thus, depending on the kind of problem determination that is required in a given situation, one may prefer to choose a certain level of quiescing. If we wish to utilize the existing code paths in the system for dumping, and allow various kinds of dump devices / drivers, then there is also the consideration of the level of system capabilities that need to be available for dumping (interrupts, softirqs, relevant locks, even kernel worker threads ?), as well as the changes in state that the dump i/o path execution causes. The livedump capability that exists in lcrash today is perhaps an example where we get a dump while the system continues to operate without any disruption, but could have inconsistent state with drift as the system state changes while it is being dumped. What mission critical's dump facility attempts to provide could in some ways (sans some details) thought to consitute the opposite end (though it doesn't support dump of the entire physical memory - e.g user memory areas, because it needs some extra space for the second kernel boot), i.e close to accurate memory state snapshot under most conditions with complete disruption of system operation. It is interesting that you mention mirrored memory, because that kind of thing indeed could allow one to continue operation without loosing accuracy. In fact one of the ideas that we had been toying with a while back (when we just started looking into crash dump) was to see if some kind of copy-on-write memory snapshotting implementation could be used to achieve the dual goals of accurate system snapshot and normal system operation and dumping through normal system interfaces. The idea was to make use of either page table or segment protection support to implement such a scheme. One could think of multiple ways to keep track of the modified portions. (I'm not familiar with the Stratus configuration, so I don't know if this mirroring happens in hardware or in software) There is, however, a tradeoff even in this case in terms of resource consumption and complexity/intrusiveness of our solution. And that is also affected by the extent of activity that we decide to allow on the system. It is easy enough to notice that the amount of additional memory needed to maintain the snapshot depends on the extent/spread of changes in system memory state all the while from the instant the dump was triggered to the time it completes. It would vary depending on how exactly we maintain the snapshot, but the extra space and complexity as well as performance implications do rise, the more the activity level that we decide to support. For example if we were to allow normal scheduling and have all applications running without interruption, then we need a more complicated scheme and more available memory to maintain the parallel states. So we need to weight the practical benefits before attempting this. I guess what we would first try to achieve would be a reasonably working solution - attempt to keep the drift low, but may not really freeze everything (e.g. may allow interrupts, and perhaps some critcal kernel code); also attempt to keep the disruption of the system low, but again not ideal unjittered operation (e.g. applications would be suspended to start with). It would be an approximate solution, not an exact one, and the degree of drift may also depend on the state/context from which dump is triggered (because of the kind of context that is needed in order to use the existing i/o path for dumping), but may be of some practical utility. Then we could look into improving this further towards the ideal solution for configurable quiesce levels, as we do further work on selective dumping. Regarding your specific question about servicing/routing interrupts, if we knew exactly what interrupts are necessary for the dump driver, then we could perhaps keep only those active, but again trying to support all kinds of devices involves some extra considerations in deciding what exactly needs to be active... And as you observe totally shutting off other interrupts could have some side effects on system operation continuation. Yes, we did also think of redirecting the concerned interrupts to take a special execution path that wouldn't tie in with kernel resources/locks, or switch to a poll based approach (which is probably what you mean by "status drive the controller") but then again this needs some specific knowledge of the IRQs that the device needs. And on Linux we have a wide range of devices/controllers that we'd ideally like to support for dump. One approach that is also under consideration is having devices export a dump interface that could be poll based and avoid locks/resources used in the normal i/o path (You'd may have seen some discussions about that on this list earlier - AIX has a ddump interface for example), and do the right things to ensure normal i/o path works after dump. Matt is probably already exploring this possibility for IDE to start with. Something like this could be used for network dumps too, as you might already know. Dave Howell might have some further thoughts on that. So some of the efforts would include: - First attempting an approximate solution with practical tradeoffs between drift and system continuation+minimal dump environment context setup & activity requirements -- kind of best effort start and try refining as we get better ideas - Provide configurable levels of quiescing together with more granular selective dumping feature introduction - Device Driver dump interface evolution (towards more accurate snapshot and minimizing system dependence) The following applies to panic dump type situations only (not non-disruptive dump, that is): - Integration with mission critical's 2 kernel approach for standalone dumping situations - for cases where the driver doesn't have a dump interface or if it is detected that the interface can't be used for some reason (based on some verification scheme) and the normal i/o path can't / shouldn't be used. Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 "Schaal, Richard" on 09/19/2001 06:22:37 AM Please respond to "Schaal, Richard" To: "'lkcd@oss.sgi.com'" cc: (bcc: Suparna Bhattacharya/India/IBM) Subject: Non Disruptive Dumps - Question I'm curious as to the external behaviour one would expect to see when taking a non-disruptive dump. Would you be able to start the dump and then continue working on your application while the dump continues? - Don't laugh - we could do that at Stratus because of the mirrored memory. In the first cut on a general purpose system, I would expect to be able to start a dump - the system would freeze during the dump, and then when complete, the system would be responsive once more. - and not require a reboot. The reason I ask this, is that I see you folks poking about the IO_APIC area , and I think you might be thinking about directing interrupts from all sources to the one CPU that we want running in order to take the dump. I'm coming from the other direction thinking that I don't want any interrupts at all during the whole dump process. Which is easier? Would one technique produce a better dump than the other? Is freezing the system for the duration of the dump going to cause dropped connections? - is that why you want to be servicing interrupts? If you service interrupts for I/O chances are that you will blur the dump. Bottom line - I wonder if it is easier to status drive the disk controller or redirect and then restore interrupt routing on the fly. I look forward to your views - Richard From owner-lkcd@oss.sgi.com Wed Sep 19 12:26:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8JJQlG02818 for lkcd-outgoing; Wed, 19 Sep 2001 12:26:47 -0700 Received: from calliope1.fm.intel.com (fmfdns01.fm.intel.com [132.233.247.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8JJQSe02806 for ; Wed, 19 Sep 2001 12:26:33 -0700 Received: from fmsmsxvs042.fm.intel.com (fmsmsxv042-1.fm.intel.com [132.233.48.110]) by calliope1.fm.intel.com (8.9.1a+p1/8.9.1/d: relay.m4,v 1.42 2001/09/04 16:24:19 root Exp $) with SMTP id TAA18781 for ; Wed, 19 Sep 2001 19:26:23 GMT Received: from fmsmsx17.intel.com ([132.233.58.209]) by fmsmsxvs042.fm.intel.com (NAVGW 2.5.1.6) with SMTP id M2001091912255111597 ; Wed, 19 Sep 2001 12:25:51 -0700 Received: by fmsmsx17.fm.intel.com with Internet Mail Service (5.5.2653.19) id ; Wed, 19 Sep 2001 12:28:22 -0700 Message-ID: <10C8636AE359D4119118009027AE99870CE2F9EC@FMSMSX34> From: "Howell, David P" To: "'bsuparna@in.ibm.com'" , "Schaal, Richard" Cc: "'lkcd@oss.sgi.com'" Subject: RE: Non Disruptive Dumps - Question Date: Wed, 19 Sep 2001 12:26:14 -0700 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-lkcd@oss.sgi.com Precedence: bulk Suparna wrote: The following applies to panic dump type situations only (not non-disruptive dump, that is): - Integration with Mission Critical's 2 kernel approach for standalone dumping situations - for cases where the driver doesn't have a dump interface or if it is detected that the interface can't be used for some reason (based on some verification scheme) and the normal i/o path can't / shouldn't be used. I gathered some insight from some of the folks who were familiar with a previous similar kernel dumping effort and this was the route that they had chosen for the implementation, very similar to what MCL is doing. The reasons had more to do with accuracy, dump reliability, and a cleaner overall implementation; prior to this they were doing panic side dumping and had investigated several alternatives to make it more reliable, we've considered all of these here. But it is disruptive. So, a MCL approach for disruptive that is more accurate with non-disruptive working within the kernel makes sense, especially since we should have more confidence that the system facilities are ok for a non-disruptive. And, the levels of intrusiveness is the right way to go. Regards, Dave Howell -----Original Message----- From: bsuparna@in.ibm.com [mailto:bsuparna@in.ibm.com] Sent: Wednesday, September 19, 2001 11:55 AM To: Schaal, Richard Cc: 'lkcd@oss.sgi.com' Subject: Re: Non Disruptive Dumps - Question Valid observations ! As Richard Moore mentions, the direction we are looking at would be to eventually have the degree of system quiescing configurable, because of the tradeoffs inherent in the various choices that we make. If we try to freeze everything, we do affect nomal system operation (at least timing sensitive operations) as you've observed and that may be an issue depending on the environment. If we don't freeze things, then we have a drift in the dump image. The more active the system is, the greater could be the drift. If we freeze things exactly as is at the instant of dumping we get an accurate snapshot, but this snapshot may contain data structures that are in an inconsistent state which could make interpretation a little difficult. In some situations an experienced debugger may even utilize the drift to an advantage to interpret some things about the state of the system. Thus, depending on the kind of problem determination that is required in a given situation, one may prefer to choose a certain level of quiescing. If we wish to utilize the existing code paths in the system for dumping, and allow various kinds of dump devices / drivers, then there is also the consideration of the level of system capabilities that need to be available for dumping (interrupts, softirqs, relevant locks, even kernel worker threads ?), as well as the changes in state that the dump i/o path execution causes. The livedump capability that exists in lcrash today is perhaps an example where we get a dump while the system continues to operate without any disruption, but could have inconsistent state with drift as the system state changes while it is being dumped. What mission critical's dump facility attempts to provide could in some ways (sans some details) thought to consitute the opposite end (though it doesn't support dump of the entire physical memory - e.g user memory areas, because it needs some extra space for the second kernel boot), i.e close to accurate memory state snapshot under most conditions with complete disruption of system operation. It is interesting that you mention mirrored memory, because that kind of thing indeed could allow one to continue operation without loosing accuracy. In fact one of the ideas that we had been toying with a while back (when we just started looking into crash dump) was to see if some kind of copy-on-write memory snapshotting implementation could be used to achieve the dual goals of accurate system snapshot and normal system operation and dumping through normal system interfaces. The idea was to make use of either page table or segment protection support to implement such a scheme. One could think of multiple ways to keep track of the modified portions. (I'm not familiar with the Stratus configuration, so I don't know if this mirroring happens in hardware or in software) There is, however, a tradeoff even in this case in terms of resource consumption and complexity/intrusiveness of our solution. And that is also affected by the extent of activity that we decide to allow on the system. It is easy enough to notice that the amount of additional memory needed to maintain the snapshot depends on the extent/spread of changes in system memory state all the while from the instant the dump was triggered to the time it completes. It would vary depending on how exactly we maintain the snapshot, but the extra space and complexity as well as performance implications do rise, the more the activity level that we decide to support. For example if we were to allow normal scheduling and have all applications running without interruption, then we need a more complicated scheme and more available memory to maintain the parallel states. So we need to weight the practical benefits before attempting this. I guess what we would first try to achieve would be a reasonably working solution - attempt to keep the drift low, but may not really freeze everything (e.g. may allow interrupts, and perhaps some critcal kernel code); also attempt to keep the disruption of the system low, but again not ideal unjittered operation (e.g. applications would be suspended to start with). It would be an approximate solution, not an exact one, and the degree of drift may also depend on the state/context from which dump is triggered (because of the kind of context that is needed in order to use the existing i/o path for dumping), but may be of some practical utility. Then we could look into improving this further towards the ideal solution for configurable quiesce levels, as we do further work on selective dumping. Regarding your specific question about servicing/routing interrupts, if we knew exactly what interrupts are necessary for the dump driver, then we could perhaps keep only those active, but again trying to support all kinds of devices involves some extra considerations in deciding what exactly needs to be active... And as you observe totally shutting off other interrupts could have some side effects on system operation continuation. Yes, we did also think of redirecting the concerned interrupts to take a special execution path that wouldn't tie in with kernel resources/locks, or switch to a poll based approach (which is probably what you mean by "status drive the controller") but then again this needs some specific knowledge of the IRQs that the device needs. And on Linux we have a wide range of devices/controllers that we'd ideally like to support for dump. One approach that is also under consideration is having devices export a dump interface that could be poll based and avoid locks/resources used in the normal i/o path (You'd may have seen some discussions about that on this list earlier - AIX has a ddump interface for example), and do the right things to ensure normal i/o path works after dump. Matt is probably already exploring this possibility for IDE to start with. Something like this could be used for network dumps too, as you might already know. Dave Howell might have some further thoughts on that. So some of the efforts would include: - First attempting an approximate solution with practical tradeoffs between drift and system continuation+minimal dump environment context setup & activity requirements -- kind of best effort start and try refining as we get better ideas - Provide configurable levels of quiescing together with more granular selective dumping feature introduction - Device Driver dump interface evolution (towards more accurate snapshot and minimizing system dependence) The following applies to panic dump type situations only (not non-disruptive dump, that is): - Integration with mission critical's 2 kernel approach for standalone dumping situations - for cases where the driver doesn't have a dump interface or if it is detected that the interface can't be used for some reason (based on some verification scheme) and the normal i/o path can't / shouldn't be used. Regards Suparna Suparna Bhattacharya IBM Software Lab, India E-mail : bsuparna@in.ibm.com Phone : 91-80-5267117, Extn : 3961 "Schaal, Richard" on 09/19/2001 06:22:37 AM Please respond to "Schaal, Richard" To: "'lkcd@oss.sgi.com'" cc: (bcc: Suparna Bhattacharya/India/IBM) Subject: Non Disruptive Dumps - Question I'm curious as to the external behaviour one would expect to see when taking a non-disruptive dump. Would you be able to start the dump and then continue working on your application while the dump continues? - Don't laugh - we could do that at Stratus because of the mirrored memory. In the first cut on a general purpose system, I would expect to be able to start a dump - the system would freeze during the dump, and then when complete, the system would be responsive once more. - and not require a reboot. The reason I ask this, is that I see you folks poking about the IO_APIC area , and I think you might be thinking about directing interrupts from all sources to the one CPU that we want running in order to take the dump. I'm coming from the other direction thinking that I don't want any interrupts at all during the whole dump process. Which is easier? Would one technique produce a better dump than the other? Is freezing the system for the duration of the dump going to cause dropped connections? - is that why you want to be servicing interrupts? If you service interrupts for I/O chances are that you will blur the dump. Bottom line - I wonder if it is easier to status drive the disk controller or redirect and then restore interrupt routing on the fly. I look forward to your views - Richard From owner-lkcd@oss.sgi.com Thu Sep 20 06:01:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8KD1eJ22330 for lkcd-outgoing; Thu, 20 Sep 2001 06:01:40 -0700 Received: from d06lmsgate.uk.ibm.COM (d06lmsgate.uk.ibm.com [195.212.29.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8KD1ae22327 for ; Thu, 20 Sep 2001 06:01:37 -0700 Received: from d12relay02.de.ibm.com (d12relay02.de.ibm.com [9.165.215.23]) by d06lmsgate.uk.ibm.COM (1.0.0) with ESMTP id NAA26252 for ; Thu, 20 Sep 2001 13:41:26 +0100 Received: from d12ml004.de.ibm.com (d12ml004_cs0 [9.165.223.50]) by d12relay02.de.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8KD0Ea14912 for ; Thu, 20 Sep 2001 15:00:14 +0200 Importance: Normal Subject: new field in task command To: lkcd@oss.sgi.com Cc: "Michael Geselbracht" X-Mailer: Lotus Notes Release 5.0.3 March 21, 2000 Message-ID: From: "Michael Holzheu" Date: Thu, 20 Sep 2001 14:59:09 +0200 X-MIMETrack: Serialize by Router on D12ML004/12/M/IBM(Release 5.0.8 |June 18, 2001) at 20/09/2001 15:00:13 MIME-Version: 1.0 Content-type: text/plain; charset=us-ascii Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Would have anybody here a problem, if we add a new field "CPU" to the output of the "task" command of lcrash. This field should show the cpu number of the actual running task When developing the GUI (QLCRASH) for lcrash we found that it would be very usefull to have this information. The state field alone is not sufficient to identify running tasks since a "0" does only mean "runnable" This could be a possible output: The CPU column should be obtained from the task_struct.processor: ADDR UID PID PPID STATE CPU FLAGS NAME =============================================================================== 0x1ac000 0 0 0 0 0 0 swapper 0x5b4000 0 1 0 1 - 0x100 init 0x59e000 0 2 1 1 - 0x40 kmcheck 0x59c000 0 3 1 1 - 0x40 keventd 0x59a000 0 4 0 0 1 0x40 ksoftirqd_CPU0 0x598000 0 5 0 1 - 0x40 ksoftirqd_CPU1 0x596000 0 6 0 1 - 0x40 ksoftirqd_CPU2 0x7ffa000 0 7 0 0 - 0x840 kswapd Regards Michael ------------------------------------------------------------------------ Linux/390 Development Phone: +49-7031-16-2360, Bld 71032-06-109 Email: holzheu@de.ibm.com From owner-lkcd@oss.sgi.com Thu Sep 20 16:50:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8KNoZW06749 for lkcd-outgoing; Thu, 20 Sep 2001 16:50:35 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8KNoXe06746 for ; Thu, 20 Sep 2001 16:50:34 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8KNiYL02791; Thu, 20 Sep 2001 16:44:34 -0700 Message-ID: <3BAA819A.C472D192@alacritech.com> Date: Thu, 20 Sep 2001 16:54:02 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: Michael Holzheu CC: lkcd@oss.sgi.com, Michael Geselbracht Subject: Re: new field in task command References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Michael Holzheu wrote: > > Hi, > > Would have anybody here a problem, if we add a new field "CPU" > to the output of the "task" command of lcrash. This field should > show the cpu number of the actual running task > > When developing the GUI (QLCRASH) for lcrash we found that it > would be very usefull to have this information. The state field alone is > not sufficient to identify running tasks since a "0" does only mean > "runnable" This looks like a great addition, Michael. No problems here. --Matt From owner-lkcd@oss.sgi.com Thu Sep 20 22:43:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8L5hTA12755 for lkcd-outgoing; Thu, 20 Sep 2001 22:43:29 -0700 Received: from e32.bld.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8L5h4e12688 for ; Thu, 20 Sep 2001 22:43:04 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.99.140.24]) by e32.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id BAA115618 for ; Fri, 21 Sep 2001 01:39:52 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by westrelay03.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8L5eeq247970 for ; Thu, 20 Sep 2001 23:40:41 -0600 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8L5fb304645 for lkcd@oss.sgi.com; Fri, 21 Sep 2001 11:11:37 +0530 Date: Fri, 21 Sep 2001 11:11:37 +0530 From: Bharata B Rao To: lkcd@oss.sgi.com Subject: [PATCH] Non-disruptive dump/SMP fixes Message-ID: <20010921111137.B4554@in.ibm.com> Reply-To: bharata@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Given below are patches for different files which are modified by us to provide non-disruptive dump support and some SMP fixes. These patches are against the files taken from sourceforge cvs of lkcd. Here is the description of the changes. Instead of stopping of other CPUs, send a CALL_FUNCTION_VECTOR IPI to make them spin while dump is in progress. Before freezing the other CPUs, the affinity of every IRQ in the IRQ descriptor table is changed to ensure that all subsequent IOAPIC interrupts are sent to the dumping cpu. This logic is now common for both panic dumps and non-disruptive dumps. Once dump completes the CPUs are released and IRQ affinities are restored to their original settings. In the panic dump case machine_restart takes care of true shutdown of the CPUs. Special case: If another CPU is handling an IRQ when the IPI comes in, avoid spinning inside the IPI handler. Instead its safer to make it spin inside schedule on the way up. This avoids possibilities of deadlock where a softirq executing on the dumping cpu attempts a global cli. Because the other CPUs could be spinning inside the IPI handler with interrupts disabled, NMI watchdog checks are ignored on the non-dumping CPUs. Tried this on a 4 way SMP for non-disruptive dumping as well as panic dumps from task-time context. TODO 1. Dumping from interrupt context doesn't work as yet. One of the approaches we are experimenting with is to use a separate kernel thread for the dump i/o. 2. We curently divert all interrupts to the dumping CPU. May consider merits of disabling all but the IRQs necessary for functioning of the dump i/o path. 3. Explore ways to avoid the scheduler hook overheads when dump is not active. 4. Since we stop scheduling altogether we could face problems with drivers that depend on worker threads in order to service i/o. Need to investigate this further. 5. For disruptive dumps, avoid deadlocks while dumping if another CPU doesn't respond to the IPI (Refer to Tony Dziedzic's earlier note) PATCHES drivers/block/dump.c --- /home/bharata/lkcd_cvs/2.4/drivers/block/dump.c Thu Sep 13 12:19:10 2001 +++ dump.c Fri Sep 21 09:38:03 2001 @@ -145,7 +145,7 @@ * are lots of possibilities. This is a BITMASK value, not an index. * * 1: Try to keep the system running _after_ we are done - * dumping -- for non-disruptive dumps. (DUMP_NONDISRUPT) + * dumping -- for non-disruptive dumps. (DUMP_FLAGS_NONDISRUPT) * * ----------------------------------------------------------------------- */ @@ -212,7 +212,8 @@ void *dump_page_buf; /* dump page buffer for memcpy()! */ int dump_sector_size; /* sector size for dump_device */ int dump_sector_bits; /* sector bits for dump_device */ -int dump_in_progress = FALSE; /* when we're dumping, we're dumping */ +volatile int dump_in_progress = FALSE; /* when we're dumping, we're dumping */ +volatile int dumping_cpu = 0; /* cpu on which dump is progressing */ dump_header_t dump_header; /* the primary dump header */ dump_header_asm_t dump_header_asm; /* the arch-specific dump header */ struct kiobuf *dump_iobuf; /* kiobuf for raw I/O to disk */ @@ -558,15 +559,17 @@ void dump_silence_system(void) { - /* do what each architecture wants first */ - __dump_silence_system(); + int cpu = smp_processor_id(); /* we set this to FALSE so we don't ever re-enter this code! */ dump_okay = FALSE; + dumping_cpu = cpu; dump_in_progress = TRUE; - save_flags(dump_save_flags); - sti(); /* enable interrupts just in case ... */ + + /* do what each architecture wants first */ + __dump_silence_system(); + return; } @@ -586,12 +589,11 @@ dump_okay = TRUE; /* reboot the system if this isn't a disrupted dump */ - if ((panic_timeout > 0) && (!(dump_level & DUMP_FLAGS_NONDISRUPT))) { + if((panic_timeout > 0) && (!(dump_flags & DUMP_FLAGS_NONDISRUPT))) { DUMP_PRINT("Rebooting in %d seconds ...", panic_timeout); mdelay(panic_timeout * 1000); machine_restart(NULL); } - return; } ------------------------------------------------------------------ arch/i386/kernel/dump.c --- /home/bharata/lkcd_cvs/2.4/arch/i386/kernel/dump.c Mon Aug 27 12:01:30 2001 +++ dump_asm.c Fri Sep 21 10:28:59 2001 @@ -24,6 +24,13 @@ #include #include +#include +#include + +extern volatile int dump_in_progress; +extern unsigned long irq_affinity[NR_IRQS]; +static int saved_affinity[NR_IRQS]; + /* * Name: __dump_save_panic_regs() * Func: Save the EIP (really the RA). We may pass an argument later. @@ -99,52 +106,66 @@ } /* - * Name: dump_stop_this_cpu() - * Func: Just like stop_this_cpu(), but we don't disable the APIC - * and we do the same operations for every CPU but 0. Now - * this _might_ change ... - * - * Also, the "hlt" instruction is going to be changed to be - * some sort of spinning kernel function so we can resume if - * this is a non-disruptive dump. - */ -static void -dump_stop_this_cpu(void * dummy) -{ -#ifdef CONFIG_SMP - int cpu; - - /* - * Remove this CPU: - */ - cpu = smp_processor_id(); - - /* don't stop CPU 0, per se, for now ... */ - if (cpu) { - clear_bit(cpu, &cpu_online_map); - __cli(); -#if 0 - /* don't do this for now -- interrupts are scattered */ - disable_local_APIC(); -#endif - if (cpu_data[smp_processor_id()].hlt_works_ok) - for(;;) __asm__("hlt"); - for (;;); + * Non dumping cpus will spin here. If a cpu is handling an irq when ipi is + * received, we let go of it here while making sure that it hits schedule + * on the way up and make it spin there instead. + */ +static void dump_spin(void) +{ + if (in_irq()) { + current->need_resched = 1; + } else { + while (dump_in_progress) ; + } + return; +} + +/* + * Routine to save the old irq affinities and change affinities of all irqs to + * the dumping cpu. + */ +static void set_irq_affinity(void) +{ + int i; + int cpu = smp_processor_id(); + for (i = 0; i < NR_IRQS; i++) { + if (irq_desc[i].handler == NULL) + continue; + saved_affinity[i] = irq_affinity[i]; + irq_affinity[i] = 1UL << cpu; + if (irq_desc[i].handler->set_affinity != NULL) + irq_desc[i].handler->set_affinity(i, irq_affinity[i]); + } +} + +/* + * Restore old irq affinities. + */ +static void reset_irq_affinity(void) +{ + int i; + for (i = 0; i < NR_IRQS; i++) { + if (irq_desc[i].handler == NULL) + continue; + irq_affinity[i] = saved_affinity[i]; + if (irq_desc[i].handler->set_affinity != NULL) + irq_desc[i].handler->set_affinity(i, saved_affinity[i]); } -#endif } /* * Name: __dump_silence_system() * Func: Do an architecture-specific silencing of the system. + * - Change irq affinities + * - Wait for other cpus to come out of irq handling + * - Send CALL_FUNCTION_VECTOR ipi to other cpus to put them to spin */ void __dump_silence_system(void) { -#if CONFIG_SMP - smp_call_function(dump_stop_this_cpu, (void *)NULL, 1, 0); - smp_num_cpus = 1; -#endif + set_irq_affinity(); + synchronize_irq(); + smp_call_function(dump_spin, NULL, 0, 0); /* return */ return; @@ -157,6 +178,7 @@ void __dump_resume_system(void) { + reset_irq_affinity(); /* return */ return; } --------------------------------------------------------------- kernel/sched.c --- /home/bharata/lkcd_cvs/2.4/kernel/sched.c Mon Aug 27 12:01:31 2001 +++ sched.c Fri Sep 21 10:51:38 2001 @@ -529,8 +529,9 @@ */ asmlinkage void schedule(void) { -#if defined(CONFIG_DUMP) - extern int dump_in_progress; +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) + extern int volatile dump_in_progress; + extern int volatile dumping_cpu; #endif struct schedule_data * sched_data; struct task_struct *prev, *next, *p; @@ -542,7 +543,7 @@ prev = current; this_cpu = prev->processor; -#if defined(CONFIG_DUMP) +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) if (dump_in_progress) { goto dump_scheduling_disabled; } @@ -713,8 +714,15 @@ scheduling_in_interrupt: printk("Scheduling in interrupt\n"); BUG(); -#if defined(CONFIG_DUMP) +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) dump_scheduling_disabled: + /* + * If this is not the dumping cpu, then spin right here + * till the dump is complete + */ + if (this_cpu != dumping_cpu) { + while (dump_in_progress); + } #endif return; } ------------------------------------------------------------------ arch/i386/kernel/traps.c --- traps.c.orig Thu Sep 20 18:55:53 2001 +++ traps.c Thu Sep 20 18:45:55 2001 @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -56,6 +57,10 @@ struct desc_struct default_ldt[] = { { 0, 0 }, { 0, 0 }, { 0, 0 }, { 0, 0 }, { 0, 0 } }; +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) +extern void (*dump_function_ptr)(char *, struct pt_regs *); +#endif + /* * The IDT has to be page-aligned to simplify the Pentium * F0 0F bug workaround.. We have a special link segment @@ -220,7 +225,11 @@ spin_lock_irq(&die_lock); printk("%s: %04lx\n", str, err & 0xffff); show_registers(regs); - +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) + if (dump_function_ptr) { + dump_function_ptr((char *)str, regs); + } +#endif spin_unlock_irq(&die_lock); do_exit(SIGSEGV); } @@ -406,6 +415,11 @@ static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; +#ifdef CONFIG_DUMP +extern volatile int dump_in_progress; +extern volatile int dumping_cpu; +#endif + inline void nmi_watchdog_tick(struct pt_regs * regs) { /* @@ -432,6 +446,13 @@ */ int sum, cpu = smp_processor_id(); +#ifdef CONFIG_DUMP + /* + * Ignore watchdog when dumping is in progress. + * Todo: consider using the touch_nmi_watchdog() approach instead + */ + if (dump_in_progress && cpu != dumping_cpu) return; +#endif sum = apic_timer_irqs[cpu]; if (last_irq_sums[cpu] == sum) { @@ -450,6 +471,12 @@ printk("NMI Watchdog detected LOCKUP on CPU%d, registers:\n", cpu); show_registers(regs); printk("console shuts up ...\n"); +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) + if (dump_function_ptr) { + dump_function_ptr("NMI Watchdog Detected", + regs); + } +#endif console_silent(); spin_unlock(&nmi_print_lock); do_exit(SIGSEGV); ------------------------------------------------------------ Regards, Crashdump Team, India. -- Bharata B Rao, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Fri Sep 21 00:05:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8L75UX14489 for lkcd-outgoing; Fri, 21 Sep 2001 00:05:30 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8L753e14476 for ; Fri, 21 Sep 2001 00:05:03 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8L79Ot00941; Fri, 21 Sep 2001 00:09:24 -0700 Message-ID: <3BAAE5E3.D122DD35@alacritech.com> Date: Fri, 21 Sep 2001 00:01:55 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: bharata@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: [PATCH] Non-disruptive dump/SMP fixes References: <20010921111137.B4554@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi, Bharata. The patch looks good. Thanks first and foremost to your team for all the hard work. I noticed a few bug fixes in the patch which shouldn't have been in the code, but were. The only thing I would change is based on our conversation the other night. After thinking about the __cpu_disable() option, would it be faster to just memcpy() the irq_affinity[] into saved_affinity[], then just walk the loop? I don't think speed is a concern, but if you're saving 'em all anyway ... Thanks again, this looks great. If you haven't already (I didn't see it in the tree) go ahead and check this in directly to the 2.4 branch. --Matt Bharata B Rao wrote: > > Given below are patches for different files which are modified by us to > provide non-disruptive dump support and some SMP fixes. These patches are > against the files taken from sourceforge cvs of lkcd. > > Here is the description of the changes. > > Instead of stopping of other CPUs, send a CALL_FUNCTION_VECTOR IPI to > make them spin while dump is in progress. Before freezing the other CPUs, > the affinity of every IRQ in the IRQ descriptor table is changed to ensure > that all subsequent IOAPIC interrupts are sent to the dumping cpu. > This logic is now common for both panic dumps and non-disruptive dumps. > Once dump completes the CPUs are released and IRQ affinities are restored > to their original settings. In the panic dump case machine_restart takes > care of true shutdown of the CPUs. > > Special case: If another CPU is handling an IRQ when the IPI comes in, avoid > spinning inside the IPI handler. Instead its safer to make it spin inside > schedule on the way up. This avoids possibilities of deadlock where a > softirq executing on the dumping cpu attempts a global cli. > > Because the other CPUs could be spinning inside the IPI handler with > interrupts disabled, NMI watchdog checks are ignored on the non-dumping > CPUs. > > Tried this on a 4 way SMP for non-disruptive dumping as well as panic dumps > from task-time context. > > TODO > > 1. Dumping from interrupt context doesn't work as yet. One of the approaches > we are experimenting with is to use a separate kernel thread for the dump > i/o. > 2. We curently divert all interrupts to the dumping CPU. May consider merits > of disabling all but the IRQs necessary for functioning of the dump i/o path. > 3. Explore ways to avoid the scheduler hook overheads when dump is not active. > 4. Since we stop scheduling altogether we could face problems with drivers > that depend on worker threads in order to service i/o. Need to investigate > this further. > 5. For disruptive dumps, avoid deadlocks while dumping if another CPU doesn't > respond to the IPI (Refer to Tony Dziedzic's earlier note) > > PATCHES > > drivers/block/dump.c > > --- /home/bharata/lkcd_cvs/2.4/drivers/block/dump.c Thu Sep 13 12:19:10 2001 > +++ dump.c Fri Sep 21 09:38:03 2001 > @@ -145,7 +145,7 @@ > * are lots of possibilities. This is a BITMASK value, not an index. > * > * 1: Try to keep the system running _after_ we are done > - * dumping -- for non-disruptive dumps. (DUMP_NONDISRUPT) > + * dumping -- for non-disruptive dumps. (DUMP_FLAGS_NONDISRUPT) > * > * ----------------------------------------------------------------------- > */ > @@ -212,7 +212,8 @@ > void *dump_page_buf; /* dump page buffer for memcpy()! */ > int dump_sector_size; /* sector size for dump_device */ > int dump_sector_bits; /* sector bits for dump_device */ > -int dump_in_progress = FALSE; /* when we're dumping, we're dumping */ > +volatile int dump_in_progress = FALSE; /* when we're dumping, we're dumping */ > +volatile int dumping_cpu = 0; /* cpu on which dump is progressing */ > dump_header_t dump_header; /* the primary dump header */ > dump_header_asm_t dump_header_asm; /* the arch-specific dump header */ > struct kiobuf *dump_iobuf; /* kiobuf for raw I/O to disk */ > @@ -558,15 +559,17 @@ > void > dump_silence_system(void) > { > - /* do what each architecture wants first */ > - __dump_silence_system(); > + int cpu = smp_processor_id(); > > /* we set this to FALSE so we don't ever re-enter this code! */ > dump_okay = FALSE; > + dumping_cpu = cpu; > dump_in_progress = TRUE; > - > save_flags(dump_save_flags); > - sti(); /* enable interrupts just in case ... */ > + > + /* do what each architecture wants first */ > + __dump_silence_system(); > + > return; > } > > @@ -586,12 +589,11 @@ > dump_okay = TRUE; > > /* reboot the system if this isn't a disrupted dump */ > - if ((panic_timeout > 0) && (!(dump_level & DUMP_FLAGS_NONDISRUPT))) { > + if((panic_timeout > 0) && (!(dump_flags & DUMP_FLAGS_NONDISRUPT))) { > DUMP_PRINT("Rebooting in %d seconds ...", panic_timeout); > mdelay(panic_timeout * 1000); > machine_restart(NULL); > } > - > return; > } > > ------------------------------------------------------------------ > arch/i386/kernel/dump.c > > > --- /home/bharata/lkcd_cvs/2.4/arch/i386/kernel/dump.c Mon Aug 27 12:01:30 2001 > +++ dump_asm.c Fri Sep 21 10:28:59 2001 > @@ -24,6 +24,13 @@ > #include > #include > > +#include > +#include > + > +extern volatile int dump_in_progress; > +extern unsigned long irq_affinity[NR_IRQS]; > +static int saved_affinity[NR_IRQS]; > + > /* > * Name: __dump_save_panic_regs() > * Func: Save the EIP (really the RA). We may pass an argument later. > @@ -99,52 +106,66 @@ > } > > /* > - * Name: dump_stop_this_cpu() > - * Func: Just like stop_this_cpu(), but we don't disable the APIC > - * and we do the same operations for every CPU but 0. Now > - * this _might_ change ... > - * > - * Also, the "hlt" instruction is going to be changed to be > - * some sort of spinning kernel function so we can resume if > - * this is a non-disruptive dump. > - */ > -static void > -dump_stop_this_cpu(void * dummy) > -{ > -#ifdef CONFIG_SMP > - int cpu; > - > - /* > - * Remove this CPU: > - */ > - cpu = smp_processor_id(); > - > - /* don't stop CPU 0, per se, for now ... */ > - if (cpu) { > - clear_bit(cpu, &cpu_online_map); > - __cli(); > -#if 0 > - /* don't do this for now -- interrupts are scattered */ > - disable_local_APIC(); > -#endif > - if (cpu_data[smp_processor_id()].hlt_works_ok) > - for(;;) __asm__("hlt"); > - for (;;); > + * Non dumping cpus will spin here. If a cpu is handling an irq when ipi is > + * received, we let go of it here while making sure that it hits schedule > + * on the way up and make it spin there instead. > + */ > +static void dump_spin(void) > +{ > + if (in_irq()) { > + current->need_resched = 1; > + } else { > + while (dump_in_progress) ; > + } > + return; > +} > + > +/* > + * Routine to save the old irq affinities and change affinities of all irqs to > + * the dumping cpu. > + */ > +static void set_irq_affinity(void) > +{ > + int i; > + int cpu = smp_processor_id(); > + for (i = 0; i < NR_IRQS; i++) { > + if (irq_desc[i].handler == NULL) > + continue; > + saved_affinity[i] = irq_affinity[i]; > + irq_affinity[i] = 1UL << cpu; > + if (irq_desc[i].handler->set_affinity != NULL) > + irq_desc[i].handler->set_affinity(i, irq_affinity[i]); > + } > +} > + > +/* > + * Restore old irq affinities. > + */ > +static void reset_irq_affinity(void) > +{ > + int i; > + for (i = 0; i < NR_IRQS; i++) { > + if (irq_desc[i].handler == NULL) > + continue; > + irq_affinity[i] = saved_affinity[i]; > + if (irq_desc[i].handler->set_affinity != NULL) > + irq_desc[i].handler->set_affinity(i, saved_affinity[i]); > } > -#endif > } > > /* > * Name: __dump_silence_system() > * Func: Do an architecture-specific silencing of the system. > + * - Change irq affinities > + * - Wait for other cpus to come out of irq handling > + * - Send CALL_FUNCTION_VECTOR ipi to other cpus to put them to spin > */ > void > __dump_silence_system(void) > { > -#if CONFIG_SMP > - smp_call_function(dump_stop_this_cpu, (void *)NULL, 1, 0); > - smp_num_cpus = 1; > -#endif > + set_irq_affinity(); > + synchronize_irq(); > + smp_call_function(dump_spin, NULL, 0, 0); > > /* return */ > return; > @@ -157,6 +178,7 @@ > void > __dump_resume_system(void) > { > + reset_irq_affinity(); > /* return */ > return; > } > > --------------------------------------------------------------- > kernel/sched.c > > --- /home/bharata/lkcd_cvs/2.4/kernel/sched.c Mon Aug 27 12:01:31 2001 > +++ sched.c Fri Sep 21 10:51:38 2001 > @@ -529,8 +529,9 @@ > */ > asmlinkage void schedule(void) > { > -#if defined(CONFIG_DUMP) > - extern int dump_in_progress; > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > + extern int volatile dump_in_progress; > + extern int volatile dumping_cpu; > #endif > struct schedule_data * sched_data; > struct task_struct *prev, *next, *p; > @@ -542,7 +543,7 @@ > prev = current; > this_cpu = prev->processor; > > -#if defined(CONFIG_DUMP) > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > if (dump_in_progress) { > goto dump_scheduling_disabled; > } > @@ -713,8 +714,15 @@ > scheduling_in_interrupt: > printk("Scheduling in interrupt\n"); > BUG(); > -#if defined(CONFIG_DUMP) > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > dump_scheduling_disabled: > + /* > + * If this is not the dumping cpu, then spin right here > + * till the dump is complete > + */ > + if (this_cpu != dumping_cpu) { > + while (dump_in_progress); > + } > #endif > return; > } > > ------------------------------------------------------------------ > arch/i386/kernel/traps.c > > --- traps.c.orig Thu Sep 20 18:55:53 2001 > +++ traps.c Thu Sep 20 18:45:55 2001 > @@ -19,6 +19,7 @@ > #include > #include > #include > +#include > #include > #include > #include > @@ -56,6 +57,10 @@ > struct desc_struct default_ldt[] = { { 0, 0 }, { 0, 0 }, { 0, 0 }, > { 0, 0 }, { 0, 0 } }; > > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > +extern void (*dump_function_ptr)(char *, struct pt_regs *); > +#endif > + > /* > * The IDT has to be page-aligned to simplify the Pentium > * F0 0F bug workaround.. We have a special link segment > @@ -220,7 +225,11 @@ > spin_lock_irq(&die_lock); > printk("%s: %04lx\n", str, err & 0xffff); > show_registers(regs); > - > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > + if (dump_function_ptr) { > + dump_function_ptr((char *)str, regs); > + } > +#endif > spin_unlock_irq(&die_lock); > do_exit(SIGSEGV); > } > @@ -406,6 +415,11 @@ > > static spinlock_t nmi_print_lock = SPIN_LOCK_UNLOCKED; > > +#ifdef CONFIG_DUMP > +extern volatile int dump_in_progress; > +extern volatile int dumping_cpu; > +#endif > + > inline void nmi_watchdog_tick(struct pt_regs * regs) > { > /* > @@ -432,6 +446,13 @@ > */ > int sum, cpu = smp_processor_id(); > > +#ifdef CONFIG_DUMP > + /* > + * Ignore watchdog when dumping is in progress. > + * Todo: consider using the touch_nmi_watchdog() approach instead > + */ > + if (dump_in_progress && cpu != dumping_cpu) return; > +#endif > sum = apic_timer_irqs[cpu]; > > if (last_irq_sums[cpu] == sum) { > @@ -450,6 +471,12 @@ > printk("NMI Watchdog detected LOCKUP on CPU%d, registers:\n", cpu); > show_registers(regs); > printk("console shuts up ...\n"); > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > + if (dump_function_ptr) { > + dump_function_ptr("NMI Watchdog Detected", > + regs); > + } > +#endif > console_silent(); > spin_unlock(&nmi_print_lock); > do_exit(SIGSEGV); > > ------------------------------------------------------------ > Regards, > Crashdump Team, India. > > -- > Bharata B Rao, > IBM Linux Technology Center, > IBM Software Lab, Bangalore. > > Ph: 91-80-5262355 Ex: 3962 > Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Fri Sep 21 11:48:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8LIm7r30814 for lkcd-outgoing; Fri, 21 Sep 2001 11:48:07 -0700 Received: from aprilia.amazon.com (aprilia.amazon.com [209.191.164.156]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8LIlxe30810 for ; Fri, 21 Sep 2001 11:47:59 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by aprilia.amazon.com (Postfix) with ESMTP id 8E9A053C for ; Fri, 21 Sep 2001 11:47:59 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id 6840748064 for ; Fri, 21 Sep 2001 11:47:58 -0700 (PDT) From: "Monty Vanderbilt" To: Subject: lkcd_config typos & warnings Date: Fri, 21 Sep 2001 11:47:57 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-lkcd@oss.sgi.com Precedence: bulk The patch below fixes some trivial comment typos and warnings from the lack of an ioctl prototype. Do others routinely ignore these warnings or is my compiler setup just pickier than normal? --- lkcdutils/lkcd_config/lkcd_config.c Wed Sep 12 23:30:28 2001 +++ lkcdutils.fix/lkcd_config/lkcd_config.c Mon Sep 17 12:26:15 2001 @@ -20,8 +20,10 @@ #define DUMP_DEVICE "/dev/dump" #define DUMP_TRUE 1 #define DUMP_FALSE 0 +int ioctl(int dev, unsigned int cmd, caddr_t arg); + /* * Name: main() * Func: A quick-and-dirty dump configuration tool. Should suffice * for the time being. @@ -116,37 +118,37 @@ perror("open of dump device"); return (dfd); } - /* set dump compression */ - if ((err = ioctl(dfd, DIOGDUMPCOMPRESS, &compress)) < 0) { + /* get dump compression */ + if ((err = ioctl(dfd, DIOGDUMPCOMPRESS, (caddr_t)&compress)) < 0) { perror("ioctl() query for dump compression failed"); close(dfd); return (err); } - /* set dump flags */ - if ((err = ioctl(dfd, DIOGDUMPFLAGS, &flags)) < 0) { + /* get dump flags */ + if ((err = ioctl(dfd, DIOGDUMPFLAGS, (caddr_t)&flags)) < 0) { perror("ioctl() query for dump flags failed"); close(dfd); return (err); } - /* set dump level */ - if ((err = ioctl(dfd, DIOGDUMPLEVEL, &level)) < 0) { + /* get dump level */ + if ((err = ioctl(dfd, DIOGDUMPLEVEL, (caddr_t)&level)) < 0) { perror("ioctl() query for dump level failed"); close(dfd); return (err); } - /* set device to dump to (if specified) */ - if ((err = ioctl(dfd, DIOGDUMPDEV, &dnum)) < 0) { + /* get device to dump to (if specified) */ + if ((err = ioctl(dfd, DIOGDUMPDEV, (caddr_t)&dnum)) < 0) { perror("ioctl() for dump device failed"); close(dfd); return (err); } - printf(" Configured dump device: 0x%x\n", dnum); + printf(" Configured dump device: 0x%x\n", (int)dnum); memset(tbuf, 0, 1024); if (flags == DUMP_FLAGS_NONE) { strcat(tbuf, "DUMP_FLAGS_NONE|"); @@ -239,36 +241,36 @@ } /* set dump compression */ if (compress_set == DUMP_TRUE) { - if ((err = ioctl(dfd, DIOSDUMPCOMPRESS, compress)) < 0) { + if ((err = ioctl(dfd, DIOSDUMPCOMPRESS, (caddr_t)compress)) < 0) { perror("ioctl() for dump compression failed"); close(dfd); return (err); } } /* set dump flags */ if (flags_set == DUMP_TRUE) { - if ((err = ioctl(dfd, DIOSDUMPFLAGS, flags)) < 0) { + if ((err = ioctl(dfd, DIOSDUMPFLAGS, (caddr_t)flags)) < 0) { perror("ioctl() for dump flags failed"); close(dfd); return (err); } } /* set dump level */ if (level_set == DUMP_TRUE) { - if ((err = ioctl(dfd, DIOSDUMPLEVEL, level)) < 0) { + if ((err = ioctl(dfd, DIOSDUMPLEVEL, (caddr_t)level)) < 0) { perror("ioctl() for dump level failed"); close(dfd); return (err); } } /* set device to dump to (if specified) */ if (dnum != (dev_t)0) { - if ((err = ioctl(dfd, DIOSDUMPDEV, dnum)) < 0) { + if ((err = ioctl(dfd, DIOSDUMPDEV, (caddr_t)dnum)) < 0) { perror("ioctl() for dump device failed"); close(dfd); return (err); } From owner-lkcd@oss.sgi.com Fri Sep 21 11:49:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8LInoR30877 for lkcd-outgoing; Fri, 21 Sep 2001 11:49:50 -0700 Received: from ducati.amazon.com (ducati.amazon.com [209.191.164.152]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8LInhe30871 for ; Fri, 21 Sep 2001 11:49:43 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by ducati.amazon.com (Postfix) with ESMTP id 5EAB0345 for ; Fri, 21 Sep 2001 11:49:43 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id 2BE2A48063 for ; Fri, 21 Sep 2001 11:49:43 -0700 (PDT) From: "Monty Vanderbilt" To: Subject: FW: drivers/block/dump.c fixes Date: Fri, 21 Sep 2001 11:49:43 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-Mimeole: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-lkcd@oss.sgi.com Precedence: bulk resending - mistakenly used the owner-lkcd address in the previous attempt ... I ran into a few problems with drivers/block/dump.c when compiling it into the kernel. 1) dump_in_progress is initialized both in drivers/block/dump.c and kernel/panic.c. I think the reference in /block/dump.c should be declared as an extern. 2) RDWR permission check is not required for query ioctls. 3) The calls to dump_compress_rle_init (and dump_compress_gzip_init, if it was usable) in dump.c result in double registration and a corrupted dump_compress_list. This happens because module_init in dump_rle.c becomes either an __initcall() or a init_module() and is automatically called by the kernel whether it's configured as a module or not. Here's a patch with fixes for the above changes --- 2.4/drivers/block/dump.c Mon Sep 17 12:00:14 2001 +++ 2.4fix/linux/drivers/block/dump.c Fri Sep 21 11:07:59 2001 @@ -211,9 +211,9 @@ char dumpdev_name[PATH_MAX]; /* the name of the dump device */ void *dump_page_buf; /* dump page buffer for memcpy()! */ int dump_sector_size; /* sector size for dump_device */ int dump_sector_bits; /* sector bits for dump_device */ -int dump_in_progress = FALSE; /* when we're dumping, we're dumping */ +extern int dump_in_progress; /* when we're dumping, we're dumping */ dump_header_t dump_header; /* the primary dump header */ dump_header_asm_t dump_header_asm; /* the arch-specific dump header */ struct kiobuf *dump_iobuf; /* kiobuf for raw I/O to disk */ loff_t dump_fpos; /* the offset in the output device */ @@ -1132,11 +1132,17 @@ if (!capable(CAP_SYS_ADMIN)) { return (-EPERM); } - /* check flags */ - if (!(f->f_flags & O_RDWR)) { - return (-EPERM); + switch (cmd) { + case DIOSDUMPDEV: + case DIOSDUMPLEVEL: + case DIOSDUMPFLAGS: + case DIOSDUMPCOMPRESS: + /* check flags */ + if (!(f->f_flags & O_RDWR)) { + return (-EPERM); + } } /* * This is the main mechanism for controlling get/set data @@ -1325,16 +1331,8 @@ /* set the dump_compression_list structure up */ dump_compress = DUMP_COMPRESS_NONE; dump_compress_func = dump_compress_none; dump_register_compression(&dump_none_compression); - -#if CONFIG_DUMP_COMPRESS_RLE - (void)dump_compress_rle_init(); -#endif - -#if CONFIG_DUMP_COMPRESS_GZIP - (void)dump_compress_gzip_init(); -#endif /* initialize the dump flags, dump level and dump_compress fields */ dump_flags = DUMP_FLAGS_NONE; dump_level = DUMP_LEVEL_ALL; Monty VanderBilt mvb@amazon.com From owner-lkcd@oss.sgi.com Sat Sep 22 22:54:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8N5s8G25736 for lkcd-outgoing; Sat, 22 Sep 2001 22:54:08 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8N5s5e25733 for ; Sat, 22 Sep 2001 22:54:05 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8N5wZi04638; Sat, 22 Sep 2001 22:58:35 -0700 Date: Sat, 22 Sep 2001 22:58:35 -0700 (PDT) From: "Matt D. Robinson" To: Monty Vanderbilt cc: Subject: Re: lkcd_config typos & warnings In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk My compiler didn't complain, but it's good to clean it up regardless. I've also applied the dump.c fixes, as this is similar (as far as dump_in_progress is concerned) to what Bharata saw. Anyway, both are checked in now: Checking in fixes to dump.c and lkcd_config.c based on Monty Vanderbilt's review of the code. There were a few good problems to fix, both in comments, ioctl() variables, and in dump.c where compression module(s) were being initialized twice. CVS: ---------------------------------------------------------------------- CVS: Enter Log. Lines beginning with `CVS:' are removed automatically CVS: CVS: Committing in . CVS: CVS: Modified Files: CVS: 2.4/drivers/block/dump.c lkcdutils/lkcd_config/lkcd_config.c CVS: ---------------------------------------------------------------------- --Matt On Fri, 21 Sep 2001, Monty Vanderbilt wrote: |>The patch below fixes some trivial comment typos and warnings from the lack |>of an ioctl prototype. Do others routinely ignore these warnings or is my |>compiler setup just pickier than normal? From owner-lkcd@oss.sgi.com Sat Sep 22 23:42:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8N6gap26432 for lkcd-outgoing; Sat, 22 Sep 2001 23:42:36 -0700 Received: from e32.bld.us.ibm.com (e32.co.us.ibm.com [32.97.110.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8N6gWe26429 for ; Sat, 22 Sep 2001 23:42:32 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.99.140.24]) by e32.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id CAA137260; Sun, 23 Sep 2001 02:40:10 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by westrelay03.boulder.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8N6fAt106576; Sun, 23 Sep 2001 00:41:11 -0600 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8N6feu16521; Sun, 23 Sep 2001 12:11:40 +0530 Date: Sun, 23 Sep 2001 12:11:40 +0530 From: Bharata B Rao To: Monty Vanderbilt Cc: lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes Message-ID: <20010923121140.A16355@in.ibm.com> Reply-To: bharata@in.ibm.com References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from mvb@amazon.com on Fri, Sep 21, 2001 at 11:49:43AM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Fri, Sep 21, 2001 at 11:49:43AM -0700, Monty Vanderbilt wrote: > > 2) RDWR permission check is not required for query ioctls. > > --- 2.4/drivers/block/dump.c Mon Sep 17 12:00:14 2001 > +++ 2.4fix/linux/drivers/block/dump.c Fri Sep 21 11:07:59 2001 > @@ -211,9 +211,9 @@ > > - /* check flags */ > - if (!(f->f_flags & O_RDWR)) { > - return (-EPERM); > + switch (cmd) { > + case DIOSDUMPDEV: > + case DIOSDUMPLEVEL: > + case DIOSDUMPFLAGS: > + case DIOSDUMPCOMPRESS: > + /* check flags */ > + if (!(f->f_flags & O_RDWR)) { > + return (-EPERM); > + } > } > Just checked out this file from cvs. This is causing "duplicate case value" error for DIOSDUMPDEV, DIOSDUMPLEVEL, DIOSDUMPFLAGS and DIOSDUMPCOMPRESS. Did you not get this error ? > Monty VanderBilt > mvb@amazon.com Regards, Bharata. -- Bharata B Rao, IBM Linux Technology Center, IBM Software Lab, Bangalore. Ph: 91-80-5262355 Ex: 3962 Mail: bharata@in.ibm.com From owner-lkcd@oss.sgi.com Sat Sep 22 23:57:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8N6vWl26601 for lkcd-outgoing; Sat, 22 Sep 2001 23:57:32 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8N6vSe26597 for ; Sat, 22 Sep 2001 23:57:28 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8N6tZh04763; Sat, 22 Sep 2001 23:55:35 -0700 Date: Sat, 22 Sep 2001 23:55:35 -0700 (PDT) From: "Matt D. Robinson" To: Bharata B Rao cc: Monty Vanderbilt , Subject: Re: FW: drivers/block/dump.c fixes In-Reply-To: <20010923121140.A16355@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Sun, 23 Sep 2001, Bharata B Rao wrote: |>On Fri, Sep 21, 2001 at 11:49:43AM -0700, Monty Vanderbilt wrote: |>Just checked out this file from cvs. This is causing "duplicate case value" |>error for DIOSDUMPDEV, DIOSDUMPLEVEL, DIOSDUMPFLAGS and DIOSDUMPCOMPRESS. |>Did you not get this error ? Which compiler are you using, Bharata? I think we have an issue of compiler differences. --Matt From owner-lkcd@oss.sgi.com Sun Sep 23 00:07:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8N771q26761 for lkcd-outgoing; Sun, 23 Sep 2001 00:07:01 -0700 Received: from e3.ny.us.ibm.com (e3.ny.us.ibm.com [32.97.182.103]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8N76xe26757 for ; Sun, 23 Sep 2001 00:06:59 -0700 Received: from northrelay03.pok.ibm.com (northrelay03.pok.ibm.com [9.117.200.23]) by e3.ny.us.ibm.com (8.9.3/8.9.3) with ESMTP id DAA23956; Sun, 23 Sep 2001 03:03:10 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by northrelay03.pok.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8N6w5e80638; Sun, 23 Sep 2001 02:58:06 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8N74db16585; Sun, 23 Sep 2001 12:34:39 +0530 Date: Sun, 23 Sep 2001 12:34:39 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: mvb@amazon.com, lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes Message-ID: <20010923123439.A16535@in.ibm.com> Reply-To: bharata@in.ibm.com References: <20010923121140.A16355@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from yakker@aparity.com on Sat, Sep 22, 2001 at 11:55:35PM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Sat, Sep 22, 2001 at 11:55:35PM -0700, Matt D. Robinson wrote: > On Sun, 23 Sep 2001, Bharata B Rao wrote: > |>On Fri, Sep 21, 2001 at 11:49:43AM -0700, Monty Vanderbilt wrote: > |>Just checked out this file from cvs. This is causing "duplicate case value" > |>error for DIOSDUMPDEV, DIOSDUMPLEVEL, DIOSDUMPFLAGS and DIOSDUMPCOMPRESS. > |>Did you not get this error ? > > Which compiler are you using, Bharata? I think we have an issue > of compiler differences. > Matt, I am using gcc version egcs-2.91.66, looks a bit old. Time to upgrade. But any way, 2 cases for the same switch value... isn't this an error ? I am finding this as an error even in egcs-2.96 (RH 7.1) also(just tried a sample program, not the dump code). > --Matt Regards, Bharata From owner-lkcd@oss.sgi.com Sun Sep 23 03:06:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8NA6ck29551 for lkcd-outgoing; Sun, 23 Sep 2001 03:06:38 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8NA6Xe29548 for ; Sun, 23 Sep 2001 03:06:33 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8N88v605035; Sun, 23 Sep 2001 01:09:02 -0700 Date: Sun, 23 Sep 2001 01:08:57 -0700 (PDT) From: "Matt D. Robinson" To: Bharata B Rao cc: , Subject: Re: FW: drivers/block/dump.c fixes In-Reply-To: <20010923123439.A16535@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Sun, 23 Sep 2001, Bharata B Rao wrote: |>On Sat, Sep 22, 2001 at 11:55:35PM -0700, Matt D. Robinson wrote: |>> On Sun, 23 Sep 2001, Bharata B Rao wrote: |>> |>On Fri, Sep 21, 2001 at 11:49:43AM -0700, Monty Vanderbilt wrote: |>> |>Just checked out this file from cvs. This is causing "duplicate case value" |>> |>error for DIOSDUMPDEV, DIOSDUMPLEVEL, DIOSDUMPFLAGS and DIOSDUMPCOMPRESS. |>> |>Did you not get this error ? |>> |>> Which compiler are you using, Bharata? I think we have an issue |>> of compiler differences. |>> |>Matt, |> |>I am using gcc version egcs-2.91.66, looks a bit old. Time to upgrade. |>But any way, 2 cases for the same switch value... isn't this an error ? |>I am finding this as an error even in egcs-2.96 (RH 7.1) also(just |>tried a sample program, not the dump code). No, it should fall through -- if you don't specify the break, it should fall through to the next case. Either way, I just fixed it in the CVS tree so that we do it under each case, so no compiler complains about it. It's easy enough to avoid the compilation case. Bharata, I am checking in all of your fixes, with some changes. You might want to CVS update and get all the latest stuff and let me know if I missed something (note, I'll compile it after checking it in, for this one exception). This includes moving volatile int dumping_cpu and dump_in_progress to kernel/panic.c, and I've also now set it up to pass a "stage" unsigned into into both silence and resume (in case any architecture wants to do global + arch specific silencing or resuming in various stages.) |>Bharata --Matt Checking in a ton of modifications. This may not build as of yet, I'm still walking through and adding Bharata's stuff, modifying Monty's stuff, and doing some general clean-up. CVS: ---------------------------------------------------------------------- CVS: Enter Log. Lines beginning with `CVS:' are removed automatically CVS: CVS: Committing in . CVS: CVS: Modified Files: CVS: 2.4/arch/i386/kernel/dump.c 2.4/arch/i386/kernel/traps.c CVS: 2.4/drivers/block/dump.c 2.4/drivers/block/dump_gzip.c CVS: 2.4/drivers/block/dump_rle.c 2.4/include/asm-alpha/dump.h CVS: 2.4/include/asm-i386/dump.h 2.4/include/asm-ia64/dump.h CVS: 2.4/include/linux/dump.h 2.4/init/kerntypes.c CVS: 2.4/kernel/panic.c 2.4/kernel/sched.c CVS: Added Files: CVS: 2.4/arch/alpha/kernel/dump.c 2.4/arch/ia64/kernel/dump.c CVS: Removed Files: CVS: 2.4/arch/alpha/kernel/vmdump.c 2.4/arch/i386/kernel/vmdump.c CVS: 2.4/arch/ia64/kernel/vmdump.c 2.4/drivers/block/vmdump.c CVS: 2.4/include/asm-alpha/vmdump.h 2.4/include/asm-i386/vmdump.h CVS: 2.4/include/asm-ia64/vmdump.h 2.4/include/linux/vmdump.h CVS: ---------------------------------------------------------------------- From owner-lkcd@oss.sgi.com Mon Sep 24 00:15:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8O7Fni23777 for lkcd-outgoing; Mon, 24 Sep 2001 00:15:49 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8O7Fke23774 for ; Mon, 24 Sep 2001 00:15:47 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id CAA21964; Mon, 24 Sep 2001 02:13:13 -0500 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.98) with ESMTP id f8O7FSQ39554; Mon, 24 Sep 2001 03:15:28 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8O7F4s23943; Mon, 24 Sep 2001 12:45:04 +0530 Date: Mon, 24 Sep 2001 12:45:03 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: mvb@amazon.com, lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes Message-ID: <20010924124503.A23889@in.ibm.com> Reply-To: bharata@in.ibm.com References: <20010923123439.A16535@in.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from yakker@aparity.com on Sun, Sep 23, 2001 at 01:08:57AM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Sun, Sep 23, 2001 at 01:08:57AM -0700, Matt D. Robinson wrote: > > Bharata, I am checking in all of your fixes, with some changes. > You might want to CVS update and get all the latest stuff and let > me know if I missed something (note, I'll compile it after checking > it in, for this one exception). This includes moving volatile int > dumping_cpu and dump_in_progress to kernel/panic.c, and I've also > now set it up to pass a "stage" unsigned into into both silence > and resume (in case any architecture wants to do global + arch > specific silencing or resuming in various stages.) > > |>Bharata > Commited drivers/block/dump.c for a few missed things. 1. An extra call to __dump_silence_system() removed. 2. Temporarily passing 0 arg to __dump_silence_system(). Leaving this to Matt to fill up his "stage" variable logic. Regards, Bharata. > --Matt > From owner-lkcd@oss.sgi.com Mon Sep 24 00:16:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8O7GvP23820 for lkcd-outgoing; Mon, 24 Sep 2001 00:16:57 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8O7Gte23817 for ; Mon, 24 Sep 2001 00:16:55 -0700 Received: from localhost (yakker@localhost) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8O7LKP15094; Mon, 24 Sep 2001 00:21:20 -0700 Date: Mon, 24 Sep 2001 00:21:19 -0700 (PDT) From: "Matt D. Robinson" To: Bharata B Rao cc: , Subject: Re: FW: drivers/block/dump.c fixes In-Reply-To: <20010924124503.A23889@in.ibm.com> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, 24 Sep 2001, Bharata B Rao wrote: |>On Sun, Sep 23, 2001 at 01:08:57AM -0700, Matt D. Robinson wrote: |>Commited drivers/block/dump.c for a few missed things. |>1. An extra call to __dump_silence_system() removed. |>2. Temporarily passing 0 arg to __dump_silence_system(). Leaving this to |>Matt to fill up his "stage" variable logic. Hmmm, hopefully we didn't step on each other, I just checked fixes to both of these 5 minutes ago. :) I'll look at the logs. |>Regards, |>Bharata. |>> --Matt From owner-lkcd@oss.sgi.com Mon Sep 24 10:44:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8OHik104755 for lkcd-outgoing; Mon, 24 Sep 2001 10:44:46 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8OHiPe04745 for ; Mon, 24 Sep 2001 10:44:25 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8OHcVL29099; Mon, 24 Sep 2001 10:38:31 -0700 Message-ID: <3BAF71D6.E4CD82E@alacritech.com> Date: Mon, 24 Sep 2001 10:48:06 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: Bharata B Rao , mvb@amazon.com, lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Matt D. Robinson" wrote: > > On Mon, 24 Sep 2001, Bharata B Rao wrote: > |>On Sun, Sep 23, 2001 at 01:08:57AM -0700, Matt D. Robinson wrote: > |>Commited drivers/block/dump.c for a few missed things. > |>1. An extra call to __dump_silence_system() removed. > |>2. Temporarily passing 0 arg to __dump_silence_system(). Leaving this to > |>Matt to fill up his "stage" variable logic. > > Hmmm, hopefully we didn't step on each other, I just checked > fixes to both of these 5 minutes ago. :) > > I'll look at the logs. > > |>Regards, > |>Bharata. > |>> --Matt Looks like most of the tree moves/updates are done. For those of you working directly out of the CVS tree, you're going to need to back-out any changes related to the previous tree (that includes all modifications in arch/*/kernel, etc.) and then update with all the new files. There's now a drivers/dump directory (drivers/block/dump.c is no longer pertinent, everything is being built in its own directory) where everything now resides. 4.0 is pending now a couple more crash tests. --Matt From owner-lkcd@oss.sgi.com Mon Sep 24 23:01:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8P61Fu11870 for lkcd-outgoing; Mon, 24 Sep 2001 23:01:15 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8P61BD11866 for ; Mon, 24 Sep 2001 23:01:12 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id AAA151706; Tue, 25 Sep 2001 00:53:32 -0500 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8P5tg1138208; Tue, 25 Sep 2001 01:55:43 -0400 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8P5tCB25962; Tue, 25 Sep 2001 11:25:12 +0530 Date: Tue, 25 Sep 2001 11:25:11 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes Message-ID: <20010925112511.A25747@in.ibm.com> Reply-To: bharata@in.ibm.com References: <3BAF71D6.E4CD82E@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BAF71D6.E4CD82E@alacritech.com>; from yakker@alacritech.com on Mon, Sep 24, 2001 at 10:48:06AM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, Sep 24, 2001 at 10:48:06AM -0700, Matt D. Robinson wrote: > > Hmmm, hopefully we didn't step on each other, I just checked > > fixes to both of these 5 minutes ago. :) > > > > I'll look at the logs. > > > > |>Regards, > > |>Bharata. > > |>> --Matt > Corrected the overlaps in checkins of driver/dump/dump_base.c (originally drivers/block/dump.c). Also updated drivers/dump/dump_i386.c to conform __dump_silence_system() to the new logic of silencing/resuming in stages. Regards, Bharata. > Looks like most of the tree moves/updates are done. For those > of you working directly out of the CVS tree, you're going to need > to back-out any changes related to the previous tree (that includes > all modifications in arch/*/kernel, etc.) and then update with all > the new files. > > There's now a drivers/dump directory (drivers/block/dump.c is > no longer pertinent, everything is being built in its own > directory) where everything now resides. > > 4.0 is pending now a couple more crash tests. > > --Matt From owner-lkcd@oss.sgi.com Mon Sep 24 23:56:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8P6uAj13147 for lkcd-outgoing; Mon, 24 Sep 2001 23:56:10 -0700 Received: from nakedeye.aparity.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8P6u7D13144 for ; Mon, 24 Sep 2001 23:56:07 -0700 Received: from alacritech.com (w032.z064001165.sjc-ca.dsl.cnc.net [64.1.165.32]) by nakedeye.aparity.com (8.11.2/8.11.2) with ESMTP id f8P6xKI08445; Mon, 24 Sep 2001 23:59:20 -0700 Message-ID: <3BB02979.D3C82033@alacritech.com> Date: Mon, 24 Sep 2001 23:51:37 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.78 [en] (Windows NT 5.0; U) X-Accept-Language: en MIME-Version: 1.0 To: bharata@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes References: <3BAF71D6.E4CD82E@alacritech.com> <20010925112511.A25747@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Bharata B Rao wrote: > > On Mon, Sep 24, 2001 at 10:48:06AM -0700, Matt D. Robinson wrote: > > > Hmmm, hopefully we didn't step on each other, I just checked > > > fixes to both of these 5 minutes ago. :) > > > > > > I'll look at the logs. > > > > > > |>Regards, > > > |>Bharata. > > > |>> --Matt > > > > Corrected the overlaps in checkins of driver/dump/dump_base.c (originally > drivers/block/dump.c). > Also updated drivers/dump/dump_i386.c to conform __dump_silence_system() to > the new logic of silencing/resuming in stages. Any problems with the new driver layout/builds so far? I haven't encountered any (yet). The tree looks somewhat sane for once. :) --Matt > Regards, > Bharata. > > > Looks like most of the tree moves/updates are done. For those > > of you working directly out of the CVS tree, you're going to need > > to back-out any changes related to the previous tree (that includes > > all modifications in arch/*/kernel, etc.) and then update with all > > the new files. > > > > There's now a drivers/dump directory (drivers/block/dump.c is > > no longer pertinent, everything is being built in its own > > directory) where everything now resides. > > > > 4.0 is pending now a couple more crash tests. > > > > --Matt From owner-lkcd@oss.sgi.com Tue Sep 25 00:07:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8P773w13513 for lkcd-outgoing; Tue, 25 Sep 2001 00:07:03 -0700 Received: from e31.bld.us.ibm.com (e31.co.us.ibm.com [32.97.110.129]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8P772D13510 for ; Tue, 25 Sep 2001 00:07:02 -0700 Received: from westrelay03.boulder.ibm.com (westrelay03.boulder.ibm.com [9.99.140.24]) by e31.bld.us.ibm.com (8.9.3/8.9.3) with ESMTP id DAA23274; Tue, 25 Sep 2001 03:04:39 -0400 Received: from bharata.in.ibm.com (bharata.in.ibm.com [9.186.133.24]) by westrelay03.boulder.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8P76t661400; Tue, 25 Sep 2001 01:06:56 -0600 Received: (from bharata@localhost) by bharata.in.ibm.com (8.11.2/8.11.2) id f8P76wK26041; Tue, 25 Sep 2001 12:36:58 +0530 Date: Tue, 25 Sep 2001 12:36:58 +0530 From: Bharata B Rao To: "Matt D. Robinson" Cc: lkcd@oss.sgi.com Subject: Re: FW: drivers/block/dump.c fixes Message-ID: <20010925123657.A26037@in.ibm.com> Reply-To: bharata@in.ibm.com References: <3BAF71D6.E4CD82E@alacritech.com> <20010925112511.A25747@in.ibm.com> <3BB02979.D3C82033@alacritech.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <3BB02979.D3C82033@alacritech.com>; from yakker@alacritech.com on Mon, Sep 24, 2001 at 11:51:37PM -0700 Sender: owner-lkcd@oss.sgi.com Precedence: bulk On Mon, Sep 24, 2001 at 11:51:37PM -0700, Matt D. Robinson wrote: > > Any problems with the new driver layout/builds so far? I haven't > encountered any (yet). > > The tree looks somewhat sane for once. :) > Yes, it is compiling, buiding and working without any problems till now. Regards, Bharata. > --Matt > From owner-lkcd@oss.sgi.com Tue Sep 25 00:19:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8P7JJF13802 for lkcd-outgoing; Tue, 25 Sep 2001 00:19:19 -0700 Received: from e21.nc.us.ibm.com (e21.nc.us.ibm.com [32.97.136.227]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8P7IlD13797 for ; Tue, 25 Sep 2001 00:18:47 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e21.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id CAA177866; Tue, 25 Sep 2001 02:11:15 -0500 Received: from maze.in.ibm.com (maze.in.ibm.com [9.186.135.29]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97.1) with ESMTP id f8P7DT1248126; Tue, 25 Sep 2001 03:13:30 -0400 Received: (from vamsi@localhost) by maze.in.ibm.com (8.11.2/8.11.2) id f8P7JrQ22550; Tue, 25 Sep 2001 12:49:53 +0530 Date: Tue, 25 Sep 2001 12:49:52 +0530 From: "Vamsi Krishna S ." To: yakker@alacritech.com Cc: lkcd@oss.sgi.com Subject: [patch] further cleanups Message-ID: <20010925124952.A22542@in.ibm.com> Reply-To: vamsi_krishna@in.ibm.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hello Matt, I would like you to consider applying the following patch to the latest tree. I could checkin these changes if you are okay with them. What we are trying to do here is to: - reduce #ifdef code by coding a static inline dump function - keep all our extern function definitions in include/linux/dump.h - remove the remaining VMDUMP from alpha/ia64 Note: you still need to remove arch/alpha/kernel/dump.c from cvs. -- Regards, Vamsi Krishna S. Linux Technology Center, IBM Software Labs, Bangalore. diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/config.in ./arch/alpha/config.in --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/config.in Fri Jul 6 05:13:04 2001 +++ ./arch/alpha/config.in Tue Sep 25 12:16:01 2001 @@ -361,7 +361,7 @@ fi bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ -bool 'Support kernel crash dump capabilities' CONFIG_VMDUMP +bool 'Support kernel crash dump capabilities' CONFIG_DUMP bool 'Legacy kernel start address' CONFIG_ALPHA_LEGACY_START_ADDRESS diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/dump.c ./arch/alpha/kernel/dump.c --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/dump.c Sun Sep 23 13:51:22 2001 +++ ./arch/alpha/kernel/dump.c Tue Sep 25 12:18:30 2001 @@ -16,7 +16,7 @@ */ #include #include -#include +#include #include /* static variables */ diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/setup.c ./arch/alpha/kernel/setup.c --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/setup.c Fri Jul 6 05:13:04 2001 +++ ./arch/alpha/kernel/setup.c Tue Sep 25 12:15:44 2001 @@ -385,7 +385,7 @@ } int -#ifndef CONFIG_VMDUMP +#ifndef CONFIG_DUMP __init #endif page_is_ram(unsigned long pfn) diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/traps.c ./arch/alpha/kernel/traps.c --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/traps.c Fri Jul 6 05:13:04 2001 +++ ./arch/alpha/kernel/traps.c Tue Sep 25 12:14:26 2001 @@ -14,7 +14,7 @@ #include #include #include -#include +#include #include #include @@ -23,10 +23,6 @@ #include "proto.h" -#ifdef CONFIG_VMDUMP -extern void (*dump_function_ptr)(char *, struct pt_regs *); -#endif - void dik_show_regs(struct pt_regs *regs, unsigned long *r9_15) { @@ -323,15 +319,7 @@ while (1); } current->thread.flags |= (1UL << 63); -#ifdef CONFIG_VMDUMP - dump_execute((char *)str, regs); -#else -#ifdef CONFIG_VMDUMP_MODULE - if (dump_function_ptr) { - dump_function_ptr((char *)str, regs); - } -#endif -#endif + dump((char *)str, regs); do_exit(SIGSEGV); } diff -urN /home/vamsi/lkcd_cvs/2.4/arch/i386/kernel/traps.c ./arch/i386/kernel/traps.c --- /home/vamsi/lkcd_cvs/2.4/arch/i386/kernel/traps.c Sun Sep 23 13:20:01 2001 +++ ./arch/i386/kernel/traps.c Tue Sep 25 12:11:35 2001 @@ -57,12 +57,6 @@ struct desc_struct default_ldt[] = { { 0, 0 }, { 0, 0 }, { 0, 0 }, { 0, 0 }, { 0, 0 } }; -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) -extern void (*dump_function_ptr)(char *, struct pt_regs *); -extern volatile int dump_in_progress; -extern volatile int dumping_cpu; -#endif - /* * The IDT has to be page-aligned to simplify the Pentium * F0 0F bug workaround.. We have a special link segment @@ -227,11 +221,7 @@ spin_lock_irq(&die_lock); printk("%s: %04lx\n", str, err & 0xffff); show_registers(regs); -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) - if (dump_function_ptr) { - dump_function_ptr((char *)str, regs); - } -#endif + dump(str, regs); spin_unlock_irq(&die_lock); do_exit(SIGSEGV); } @@ -469,11 +459,7 @@ printk("NMI Watchdog detected LOCKUP on CPU%d, registers:\n", cpu); show_registers(regs); printk("console shuts up ...\n"); -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) - if (dump_function_ptr) { - dump_function_ptr("NMI Watchdog Detected", regs); - } -#endif + dump("NMI Watchdog Detected", regs); console_silent(); spin_unlock(&nmi_print_lock); do_exit(SIGSEGV); diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/config.in ./arch/ia64/config.in --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/config.in Fri Jul 6 05:13:04 2001 +++ ./arch/ia64/config.in Tue Sep 25 12:18:00 2001 @@ -274,7 +274,7 @@ fi bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ -bool 'Support kernel crash dump capabilities' CONFIG_VMDUMP +bool 'Support kernel crash dump capabilities' CONFIG_DUMP bool 'Early printk support (requires VGA!)' CONFIG_IA64_EARLY_PRINTK bool 'Turn on compare-and-exchange bug checking (slow!)' CONFIG_IA64_DEBUG_CMPXCHG bool 'Turn on irq debug checks (slow!)' CONFIG_IA64_DEBUG_IRQ diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/smp.c ./arch/ia64/kernel/smp.c --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/smp.c Fri Jul 6 05:13:04 2001 +++ ./arch/ia64/kernel/smp.c Tue Sep 25 12:16:56 2001 @@ -287,7 +287,7 @@ int i; for (i = 0; i < smp_num_cpus; i++) { -#ifdef CONFIG_VMDUMP +#ifdef CONFIG_DUMP /* avoid shutting down CPU 0 for now ... */ if ((!i) && (op == IPI_CPU_STOP)) continue; #endif diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/traps.c ./arch/ia64/kernel/traps.c --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/traps.c Fri Jul 6 05:13:04 2001 +++ ./arch/ia64/kernel/traps.c Tue Sep 25 12:17:37 2001 @@ -37,16 +37,12 @@ #include #include #include -#include +#include #include static fpswa_interface_t *fpswa_interface; -#ifdef CONFIG_VMDUMP -extern void (*dump_function_ptr)(char *, struct pt_regs *); -#endif - void __init trap_init (void) { @@ -72,15 +68,7 @@ printk("%s[%d]: %s %ld\n", current->comm, current->pid, str, err); show_regs(regs); -#ifdef CONFIG_VMDUMP - dump_execute((char *)str, regs); -#else -#ifdef CONFIG_VMDUMP_MODULE - if (dump_function_ptr) { - dump_function_ptr((char *)str, regs); - } -#endif -#endif + dump((char *)str, regs); if (current->thread.flags & IA64_KERNEL_DEATH) { printk("die_if_kernel recursion detected.\n"); diff -urN /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_alpha.c ./drivers/dump/dump_alpha.c --- /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_alpha.c Mon Sep 24 15:09:01 2001 +++ ./drivers/dump/dump_alpha.c Tue Sep 25 12:18:43 2001 @@ -16,7 +16,7 @@ */ #include #include -#include +#include #include /* static variables */ diff -urN /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_base.c ./drivers/dump/dump_base.c --- /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_base.c Tue Sep 25 11:35:32 2001 +++ ./drivers/dump/dump_base.c Tue Sep 25 12:20:10 2001 @@ -273,9 +273,6 @@ extern void __dump_save_panic_regs(dump_header_asm_t *); #endif -/* dump function pointer used for modules (not yet supported fully) */ -extern void (*dump_function_ptr)(char *, struct pt_regs *); - /* external functions */ extern void si_meminfo(struct sysinfo *); extern void *kmalloc(size_t, int); diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-alpha/dump.h ./include/asm-alpha/dump.h --- /home/vamsi/lkcd_cvs/2.4/include/asm-alpha/dump.h Mon Sep 24 16:23:11 2001 +++ ./include/asm-alpha/dump.h Tue Sep 25 12:16:29 2001 @@ -9,8 +9,8 @@ */ /* This header file holds the architecture specific crash dump header */ -#ifndef _ASM_VMDUMP_H -#define _ASM_VMDUMP_H +#ifndef _ASM_DUMP_H +#define _ASM_DUMP_H /* necessary header files */ #include /* for pt_regs */ @@ -56,4 +56,4 @@ }) #endif -#endif /* _ASM_VMDUMP_H */ +#endif /* _ASM_DUMP_H */ diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-i386/dump.h ./include/asm-i386/dump.h --- /home/vamsi/lkcd_cvs/2.4/include/asm-i386/dump.h Mon Sep 24 16:23:11 2001 +++ ./include/asm-i386/dump.h Tue Sep 25 12:19:05 2001 @@ -9,8 +9,8 @@ */ /* This header file holds the architecture specific crash dump header */ -#ifndef _ASM_VMDUMP_H -#define _ASM_VMDUMP_H +#ifndef _ASM_DUMP_H +#define _ASM_DUMP_H /* necessary header files */ #include /* for pt_regs */ @@ -47,4 +47,4 @@ } dump_header_asm_t; -#endif /* _ASM_VMDUMP_H */ +#endif /* _ASM_DUMP_H */ diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-ia64/dump.h ./include/asm-ia64/dump.h --- /home/vamsi/lkcd_cvs/2.4/include/asm-ia64/dump.h Mon Sep 24 16:23:11 2001 +++ ./include/asm-ia64/dump.h Tue Sep 25 12:19:20 2001 @@ -9,8 +9,8 @@ */ /* This header file holds the architecture specific crash dump header */ -#ifndef _ASM_VMDUMP_H -#define _ASM_VMDUMP_H +#ifndef _ASM_DUMP_H +#define _ASM_DUMP_H /* necessary header files */ #include /* for pt_regs */ @@ -53,4 +53,4 @@ } dump_header_asm_t; -#endif /* _ASM_VMDUMP_H */ +#endif /* _ASM_DUMP_H */ diff -urN /home/vamsi/lkcd_cvs/2.4/include/linux/dump.h ./include/linux/dump.h --- /home/vamsi/lkcd_cvs/2.4/include/linux/dump.h Mon Sep 24 14:41:34 2001 +++ ./include/linux/dump.h Tue Sep 25 13:00:13 2001 @@ -224,7 +224,21 @@ extern void dump_execute(char *, struct pt_regs *); extern volatile int dump_in_progress; extern volatile int dumping_cpu; +extern void (*dump_function_ptr)(char *, struct pt_regs *); +extern int page_is_ram(unsigned long); +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) +static inline void dump(char * str, struct pt_regs * regs) +{ + if (dump_function_ptr) { + dump_function_ptr((char *)str, regs); + } +} +#else +static inline void dump(char * str, struct pt_regs * regs) +{ +} +#endif /* CONFIG_DUMP */ #endif /* __KERNEL__ */ #endif /* _DUMP_H */ diff -urN /home/vamsi/lkcd_cvs/2.4/kernel/ksyms.c ./kernel/ksyms.c --- /home/vamsi/lkcd_cvs/2.4/kernel/ksyms.c Mon Sep 24 16:23:11 2001 +++ ./kernel/ksyms.c Tue Sep 25 13:00:20 2001 @@ -66,11 +66,6 @@ extern spinlock_t dma_spin_lock; extern int panic_timeout; -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) -extern void (*dump_function_ptr)(char *, struct pt_regs *); -extern int page_is_ram(unsigned long); -#endif - #ifdef CONFIG_MODVERSIONS const struct module_symbol __export_Using_Versions __attribute__((section("__ksymtab"))) = { @@ -360,11 +355,11 @@ /* dump (system crash dump) functions and needed parameters */ EXPORT_SYMBOL(get_blkfops); -EXPORT_SYMBOL(dump_in_progress); -EXPORT_SYMBOL(dumping_cpu); #if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) EXPORT_SYMBOL(dump_function_ptr); EXPORT_SYMBOL(page_is_ram); +EXPORT_SYMBOL(dump_in_progress); +EXPORT_SYMBOL(dumping_cpu); #endif EXPORT_SYMBOL(panic_timeout); diff -urN /home/vamsi/lkcd_cvs/2.4/kernel/panic.c ./kernel/panic.c --- /home/vamsi/lkcd_cvs/2.4/kernel/panic.c Mon Sep 24 13:37:30 2001 +++ ./kernel/panic.c Tue Sep 25 12:12:47 2001 @@ -75,11 +75,7 @@ notifier_call_chain(&panic_notifier_list, 0, NULL); -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) - if (dump_function_ptr) { - dump_function_ptr(buf, (struct pt_regs *)0); - } -#endif + dump(buf, NULL); if (panic_timeout > 0) { From owner-lkcd@oss.sgi.com Tue Sep 25 13:27:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8PKRgd01894 for lkcd-outgoing; Tue, 25 Sep 2001 13:27:42 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8PKRHD01879 for ; Tue, 25 Sep 2001 13:27:17 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8PKKCL10268; Tue, 25 Sep 2001 13:20:12 -0700 Message-ID: <3BB0E93D.DAFCD571@alacritech.com> Date: Tue, 25 Sep 2001 13:29:49 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: vamsi_krishna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: [patch] further cleanups References: <20010925124952.A22542@in.ibm.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk "Vamsi Krishna S ." wrote: > > Hello Matt, > > I would like you to consider applying the following patch to the > latest tree. I could checkin these changes if you are okay with them. > > What we are trying to do here is to: > - reduce #ifdef code by coding a static inline dump function > - keep all our extern function definitions in include/linux/dump.h > - remove the remaining VMDUMP from alpha/ia64 > > Note: you still need to remove arch/alpha/kernel/dump.c from cvs. Feel free to check all this in, Vamsi. Also, I've removed the arch/alpha/kernel/dump.c file, since the new file is in drivers/dump/dump_alpha.c. --Matt > -- > Regards, > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Labs, Bangalore. > > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/config.in ./arch/alpha/config.in > --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/config.in Fri Jul 6 05:13:04 2001 > +++ ./arch/alpha/config.in Tue Sep 25 12:16:01 2001 > @@ -361,7 +361,7 @@ > fi > > bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ > -bool 'Support kernel crash dump capabilities' CONFIG_VMDUMP > +bool 'Support kernel crash dump capabilities' CONFIG_DUMP > > bool 'Legacy kernel start address' CONFIG_ALPHA_LEGACY_START_ADDRESS > > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/dump.c ./arch/alpha/kernel/dump.c > --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/dump.c Sun Sep 23 13:51:22 2001 > +++ ./arch/alpha/kernel/dump.c Tue Sep 25 12:18:30 2001 > @@ -16,7 +16,7 @@ > */ > #include > #include > -#include > +#include > #include > > /* static variables */ > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/setup.c ./arch/alpha/kernel/setup.c > --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/setup.c Fri Jul 6 05:13:04 2001 > +++ ./arch/alpha/kernel/setup.c Tue Sep 25 12:15:44 2001 > @@ -385,7 +385,7 @@ > } > > int > -#ifndef CONFIG_VMDUMP > +#ifndef CONFIG_DUMP > __init > #endif > page_is_ram(unsigned long pfn) > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/traps.c ./arch/alpha/kernel/traps.c > --- /home/vamsi/lkcd_cvs/2.4/arch/alpha/kernel/traps.c Fri Jul 6 05:13:04 2001 > +++ ./arch/alpha/kernel/traps.c Tue Sep 25 12:14:26 2001 > @@ -14,7 +14,7 @@ > #include > #include > #include > -#include > +#include > > #include > #include > @@ -23,10 +23,6 @@ > > #include "proto.h" > > -#ifdef CONFIG_VMDUMP > -extern void (*dump_function_ptr)(char *, struct pt_regs *); > -#endif > - > void > dik_show_regs(struct pt_regs *regs, unsigned long *r9_15) > { > @@ -323,15 +319,7 @@ > while (1); > } > current->thread.flags |= (1UL << 63); > -#ifdef CONFIG_VMDUMP > - dump_execute((char *)str, regs); > -#else > -#ifdef CONFIG_VMDUMP_MODULE > - if (dump_function_ptr) { > - dump_function_ptr((char *)str, regs); > - } > -#endif > -#endif > + dump((char *)str, regs); > do_exit(SIGSEGV); > } > > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/i386/kernel/traps.c ./arch/i386/kernel/traps.c > --- /home/vamsi/lkcd_cvs/2.4/arch/i386/kernel/traps.c Sun Sep 23 13:20:01 2001 > +++ ./arch/i386/kernel/traps.c Tue Sep 25 12:11:35 2001 > @@ -57,12 +57,6 @@ > struct desc_struct default_ldt[] = { { 0, 0 }, { 0, 0 }, { 0, 0 }, > { 0, 0 }, { 0, 0 } }; > > -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > -extern void (*dump_function_ptr)(char *, struct pt_regs *); > -extern volatile int dump_in_progress; > -extern volatile int dumping_cpu; > -#endif > - > /* > * The IDT has to be page-aligned to simplify the Pentium > * F0 0F bug workaround.. We have a special link segment > @@ -227,11 +221,7 @@ > spin_lock_irq(&die_lock); > printk("%s: %04lx\n", str, err & 0xffff); > show_registers(regs); > -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > - if (dump_function_ptr) { > - dump_function_ptr((char *)str, regs); > - } > -#endif > + dump(str, regs); > spin_unlock_irq(&die_lock); > do_exit(SIGSEGV); > } > @@ -469,11 +459,7 @@ > printk("NMI Watchdog detected LOCKUP on CPU%d, registers:\n", cpu); > show_registers(regs); > printk("console shuts up ...\n"); > -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > - if (dump_function_ptr) { > - dump_function_ptr("NMI Watchdog Detected", regs); > - } > -#endif > + dump("NMI Watchdog Detected", regs); > console_silent(); > spin_unlock(&nmi_print_lock); > do_exit(SIGSEGV); > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/config.in ./arch/ia64/config.in > --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/config.in Fri Jul 6 05:13:04 2001 > +++ ./arch/ia64/config.in Tue Sep 25 12:18:00 2001 > @@ -274,7 +274,7 @@ > fi > > bool 'Magic SysRq key' CONFIG_MAGIC_SYSRQ > -bool 'Support kernel crash dump capabilities' CONFIG_VMDUMP > +bool 'Support kernel crash dump capabilities' CONFIG_DUMP > bool 'Early printk support (requires VGA!)' CONFIG_IA64_EARLY_PRINTK > bool 'Turn on compare-and-exchange bug checking (slow!)' CONFIG_IA64_DEBUG_CMPXCHG > bool 'Turn on irq debug checks (slow!)' CONFIG_IA64_DEBUG_IRQ > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/smp.c ./arch/ia64/kernel/smp.c > --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/smp.c Fri Jul 6 05:13:04 2001 > +++ ./arch/ia64/kernel/smp.c Tue Sep 25 12:16:56 2001 > @@ -287,7 +287,7 @@ > int i; > > for (i = 0; i < smp_num_cpus; i++) { > -#ifdef CONFIG_VMDUMP > +#ifdef CONFIG_DUMP > /* avoid shutting down CPU 0 for now ... */ > if ((!i) && (op == IPI_CPU_STOP)) continue; > #endif > diff -urN /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/traps.c ./arch/ia64/kernel/traps.c > --- /home/vamsi/lkcd_cvs/2.4/arch/ia64/kernel/traps.c Fri Jul 6 05:13:04 2001 > +++ ./arch/ia64/kernel/traps.c Tue Sep 25 12:17:37 2001 > @@ -37,16 +37,12 @@ > #include > #include > #include > -#include > +#include > > #include > > static fpswa_interface_t *fpswa_interface; > > -#ifdef CONFIG_VMDUMP > -extern void (*dump_function_ptr)(char *, struct pt_regs *); > -#endif > - > void __init > trap_init (void) > { > @@ -72,15 +68,7 @@ > printk("%s[%d]: %s %ld\n", current->comm, current->pid, str, err); > > show_regs(regs); > -#ifdef CONFIG_VMDUMP > - dump_execute((char *)str, regs); > -#else > -#ifdef CONFIG_VMDUMP_MODULE > - if (dump_function_ptr) { > - dump_function_ptr((char *)str, regs); > - } > -#endif > -#endif > + dump((char *)str, regs); > > if (current->thread.flags & IA64_KERNEL_DEATH) { > printk("die_if_kernel recursion detected.\n"); > diff -urN /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_alpha.c ./drivers/dump/dump_alpha.c > --- /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_alpha.c Mon Sep 24 15:09:01 2001 > +++ ./drivers/dump/dump_alpha.c Tue Sep 25 12:18:43 2001 > @@ -16,7 +16,7 @@ > */ > #include > #include > -#include > +#include > #include > > /* static variables */ > diff -urN /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_base.c ./drivers/dump/dump_base.c > --- /home/vamsi/lkcd_cvs/2.4/drivers/dump/dump_base.c Tue Sep 25 11:35:32 2001 > +++ ./drivers/dump/dump_base.c Tue Sep 25 12:20:10 2001 > @@ -273,9 +273,6 @@ > extern void __dump_save_panic_regs(dump_header_asm_t *); > #endif > > -/* dump function pointer used for modules (not yet supported fully) */ > -extern void (*dump_function_ptr)(char *, struct pt_regs *); > - > /* external functions */ > extern void si_meminfo(struct sysinfo *); > extern void *kmalloc(size_t, int); > diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-alpha/dump.h ./include/asm-alpha/dump.h > --- /home/vamsi/lkcd_cvs/2.4/include/asm-alpha/dump.h Mon Sep 24 16:23:11 2001 > +++ ./include/asm-alpha/dump.h Tue Sep 25 12:16:29 2001 > @@ -9,8 +9,8 @@ > */ > > /* This header file holds the architecture specific crash dump header */ > -#ifndef _ASM_VMDUMP_H > -#define _ASM_VMDUMP_H > +#ifndef _ASM_DUMP_H > +#define _ASM_DUMP_H > > /* necessary header files */ > #include /* for pt_regs */ > @@ -56,4 +56,4 @@ > }) > #endif > > -#endif /* _ASM_VMDUMP_H */ > +#endif /* _ASM_DUMP_H */ > diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-i386/dump.h ./include/asm-i386/dump.h > --- /home/vamsi/lkcd_cvs/2.4/include/asm-i386/dump.h Mon Sep 24 16:23:11 2001 > +++ ./include/asm-i386/dump.h Tue Sep 25 12:19:05 2001 > @@ -9,8 +9,8 @@ > */ > > /* This header file holds the architecture specific crash dump header */ > -#ifndef _ASM_VMDUMP_H > -#define _ASM_VMDUMP_H > +#ifndef _ASM_DUMP_H > +#define _ASM_DUMP_H > > /* necessary header files */ > #include /* for pt_regs */ > @@ -47,4 +47,4 @@ > > } dump_header_asm_t; > > -#endif /* _ASM_VMDUMP_H */ > +#endif /* _ASM_DUMP_H */ > diff -urN /home/vamsi/lkcd_cvs/2.4/include/asm-ia64/dump.h ./include/asm-ia64/dump.h > --- /home/vamsi/lkcd_cvs/2.4/include/asm-ia64/dump.h Mon Sep 24 16:23:11 2001 > +++ ./include/asm-ia64/dump.h Tue Sep 25 12:19:20 2001 > @@ -9,8 +9,8 @@ > */ > > /* This header file holds the architecture specific crash dump header */ > -#ifndef _ASM_VMDUMP_H > -#define _ASM_VMDUMP_H > +#ifndef _ASM_DUMP_H > +#define _ASM_DUMP_H > > /* necessary header files */ > #include /* for pt_regs */ > @@ -53,4 +53,4 @@ > > } dump_header_asm_t; > > -#endif /* _ASM_VMDUMP_H */ > +#endif /* _ASM_DUMP_H */ > diff -urN /home/vamsi/lkcd_cvs/2.4/include/linux/dump.h ./include/linux/dump.h > --- /home/vamsi/lkcd_cvs/2.4/include/linux/dump.h Mon Sep 24 14:41:34 2001 > +++ ./include/linux/dump.h Tue Sep 25 13:00:13 2001 > @@ -224,7 +224,21 @@ > extern void dump_execute(char *, struct pt_regs *); > extern volatile int dump_in_progress; > extern volatile int dumping_cpu; > +extern void (*dump_function_ptr)(char *, struct pt_regs *); > +extern int page_is_ram(unsigned long); > > +#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > +static inline void dump(char * str, struct pt_regs * regs) > +{ > + if (dump_function_ptr) { > + dump_function_ptr((char *)str, regs); > + } > +} > +#else > +static inline void dump(char * str, struct pt_regs * regs) > +{ > +} > +#endif /* CONFIG_DUMP */ > #endif /* __KERNEL__ */ > > #endif /* _DUMP_H */ > diff -urN /home/vamsi/lkcd_cvs/2.4/kernel/ksyms.c ./kernel/ksyms.c > --- /home/vamsi/lkcd_cvs/2.4/kernel/ksyms.c Mon Sep 24 16:23:11 2001 > +++ ./kernel/ksyms.c Tue Sep 25 13:00:20 2001 > @@ -66,11 +66,6 @@ > extern spinlock_t dma_spin_lock; > extern int panic_timeout; > > -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > -extern void (*dump_function_ptr)(char *, struct pt_regs *); > -extern int page_is_ram(unsigned long); > -#endif > - > #ifdef CONFIG_MODVERSIONS > const struct module_symbol __export_Using_Versions > __attribute__((section("__ksymtab"))) = { > @@ -360,11 +355,11 @@ > > /* dump (system crash dump) functions and needed parameters */ > EXPORT_SYMBOL(get_blkfops); > -EXPORT_SYMBOL(dump_in_progress); > -EXPORT_SYMBOL(dumping_cpu); > #if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > EXPORT_SYMBOL(dump_function_ptr); > EXPORT_SYMBOL(page_is_ram); > +EXPORT_SYMBOL(dump_in_progress); > +EXPORT_SYMBOL(dumping_cpu); > #endif > EXPORT_SYMBOL(panic_timeout); > > diff -urN /home/vamsi/lkcd_cvs/2.4/kernel/panic.c ./kernel/panic.c > --- /home/vamsi/lkcd_cvs/2.4/kernel/panic.c Mon Sep 24 13:37:30 2001 > +++ ./kernel/panic.c Tue Sep 25 12:12:47 2001 > @@ -75,11 +75,7 @@ > > notifier_call_chain(&panic_notifier_list, 0, NULL); > > -#if defined(CONFIG_DUMP) || defined(CONFIG_DUMP_MODULE) > - if (dump_function_ptr) { > - dump_function_ptr(buf, (struct pt_regs *)0); > - } > -#endif > + dump(buf, NULL); > > if (panic_timeout > 0) > { From owner-lkcd@oss.sgi.com Wed Sep 26 03:48:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8QAmb118120 for lkcd-outgoing; Wed, 26 Sep 2001 03:48:37 -0700 Received: from ausmtp01.au.ibm.com (ausmtp01.au.ibm.COM [202.135.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8QAlpD18109 for ; Wed, 26 Sep 2001 03:47:53 -0700 Received: from f02n15e.au.ibm.com by ausmtp01.au.ibm.com (IBM AP 2.0) with ESMTP id f8QAhUB30450; Wed, 26 Sep 2001 20:43:31 +1000 Received: from d73mta01.au.ibm.com (f06n01s [9.185.166.65]) by f02n15e.au.ibm.com (8.11.1m3/NCO v4.97.1) with SMTP id f8QAkn346916; Wed, 26 Sep 2001 20:46:50 +1000 Received: by d73mta01.au.ibm.com(Lotus SMTP MTA v4.6.5 (863.2 5-20-1999)) id CA256AD3.003B4762 ; Wed, 26 Sep 2001 20:47:28 +1000 X-Lotus-FromDomain: IBMIN@IBMAU From: vamsi_krishna@in.ibm.com To: "Matt D. Robinson" cc: lkcd@oss.sgi.com Message-ID: Date: Wed, 26 Sep 2001 16:06:33 +0530 Subject: Re: [patch] further cleanups - done Mime-Version: 1.0 Content-type: text/plain; charset=us-ascii Content-Disposition: inline Sender: owner-lkcd@oss.sgi.com Precedence: bulk Matt, I have checked in the previously mentioned cleanups along with a few more _minor_ things: - define dump_in_progress and dumping_cpu only if CONFIG_DUMP(_MODULE) is defined. - move EXPORT_SYMBOL(panic_timeout) and EXPORT_SYMBOL(get_blkfops) inside of #ifdef CONFIG_DUMP. They need not be exported when lkcd is not compiled in. - include dump.h in sched.c and remove one more #if defined(CONFIG_DUMP) code block (extern definitions) Regards.. Vamsi. Vamsi Krishna S. Linux Technology Center, IBM Software Lab, Bangalore. Ph: +91 80 5262355 Extn: 3959 Internet: vamsi_krishna@in.ibm.com "Matt D. Robinson" on 09/26/2001 01:59:49 AM Please respond to "Matt D. Robinson" To: S Vamsikrishna/India/IBM@IBMIN cc: lkcd@oss.sgi.com Subject: Re: [patch] further cleanups "Vamsi Krishna S ." wrote: > > Hello Matt, > > I would like you to consider applying the following patch to the > latest tree. I could checkin these changes if you are okay with them. > > What we are trying to do here is to: > - reduce #ifdef code by coding a static inline dump function > - keep all our extern function definitions in include/linux/dump.h > - remove the remaining VMDUMP from alpha/ia64 > > Note: you still need to remove arch/alpha/kernel/dump.c from cvs. Feel free to check all this in, Vamsi. Also, I've removed the arch/alpha/kernel/dump.c file, since the new file is in drivers/dump/dump_alpha.c. --Matt > -- > Regards, > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Labs, Bangalore. > From owner-lkcd@oss.sgi.com Wed Sep 26 11:34:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8QIYDi27438 for lkcd-outgoing; Wed, 26 Sep 2001 11:34:13 -0700 Received: from smtp.alacritech.com (smtp.alacritech.com [209.10.208.82]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8QIY8D27435 for ; Wed, 26 Sep 2001 11:34:08 -0700 Received: from alacritech.com (lambda.alacritech.com [10.1.1.32]) by smtp.alacritech.com (8.11.0/8.11.0) with ESMTP id f8QISIL31076; Wed, 26 Sep 2001 11:28:18 -0700 Message-ID: <3BB22085.3FFE218F@alacritech.com> Date: Wed, 26 Sep 2001 11:37:57 -0700 From: "Matt D. Robinson" Organization: Alacritech, Inc. X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2smp i686) X-Accept-Language: en MIME-Version: 1.0 To: vamsi_krishna@in.ibm.com CC: lkcd@oss.sgi.com Subject: Re: [patch] further cleanups - done References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Great, thanks, Vamsi. I think this closes 4.0. The new dump directory for builds is a perfect launching pad. The only thing that won't get in is gzip compression. I had hoped to have that done, but haven't finished it yet -- it's now a 4.1 thing, which will be in less time than 3.1.4 -> 4.0. --Matt vamsi_krishna@in.ibm.com wrote: > > Matt, > > I have checked in the previously mentioned cleanups along with a few more > _minor_ things: > - define dump_in_progress and dumping_cpu only if CONFIG_DUMP(_MODULE) is > defined. > - move EXPORT_SYMBOL(panic_timeout) and EXPORT_SYMBOL(get_blkfops) inside > of #ifdef CONFIG_DUMP. They need not be exported when lkcd is not compiled > in. > - include dump.h in sched.c and remove one more #if defined(CONFIG_DUMP) > code block (extern definitions) > > Regards.. Vamsi. > > Vamsi Krishna S. > Linux Technology Center, > IBM Software Lab, Bangalore. > Ph: +91 80 5262355 Extn: 3959 > Internet: vamsi_krishna@in.ibm.com > > "Matt D. Robinson" on 09/26/2001 01:59:49 AM > > Please respond to "Matt D. Robinson" > > To: S Vamsikrishna/India/IBM@IBMIN > cc: lkcd@oss.sgi.com > Subject: Re: [patch] further cleanups > > "Vamsi Krishna S ." wrote: > > > > Hello Matt, > > > > I would like you to consider applying the following patch to the > > latest tree. I could checkin these changes if you are okay with them. > > > > What we are trying to do here is to: > > - reduce #ifdef code by coding a static inline dump function > > - keep all our extern function definitions in include/linux/dump.h > > - remove the remaining VMDUMP from alpha/ia64 > > > > Note: you still need to remove arch/alpha/kernel/dump.c from cvs. > > Feel free to check all this in, Vamsi. Also, I've removed the > arch/alpha/kernel/dump.c file, since the new file is in > drivers/dump/dump_alpha.c. > > --Matt > > > -- > > Regards, > > > > Vamsi Krishna S. > > Linux Technology Center, > > IBM Software Labs, Bangalore. > > From owner-lkcd@oss.sgi.com Wed Sep 26 15:52:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8QMqT500602 for lkcd-outgoing; Wed, 26 Sep 2001 15:52:29 -0700 Received: from mx.webfountain.com (mx.digitalfountain.com [209.219.233.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8QMqRD00599 for ; Wed, 26 Sep 2001 15:52:27 -0700 Received: (qmail 24288 invoked from network); 26 Sep 2001 22:52:22 -0000 Received: from mail.intranet (10.1.1.37) by mx.digitalfountain.com with SMTP; 26 Sep 2001 22:52:22 -0000 Received: from there (yoel@ricci.intranet [10.1.3.25] (may be forged)) by mail.intranet (8.9.3/8.9.3) with SMTP id PAA25422 for ; Wed, 26 Sep 2001 15:51:56 -0700 Message-Id: <200109262251.PAA25422@mail.intranet> X-Authentication-Warning: mail.intranet: Host yoel@ricci.intranet [10.1.3.25] (may be forged) claimed to be there Content-Type: text/plain; charset="iso-8859-1" From: Yoel Inbar Organization: Digital Fountain To: lkcd@oss.sgi.com Subject: problem compiling lkcd 3.1.3 Date: Wed, 26 Sep 2001 15:50:58 -0700 X-Mailer: KMail [version 1.3.1] References: In-Reply-To: MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Hi All, making the lkcdutils from the 3.1.3 tarball fails for me with: make[1]: Entering directory `/root/orig/lkcdutils-1.0.orig/libsial' yacc -psial -v -t -d sial.y sial.y:76: type clash (`' `t') on default action sial.y:125: invalid input: ; make[1]: *** [sial.tab.h] Error 1 Has anyone seen this before? Yoel From owner-lkcd@oss.sgi.com Wed Sep 26 15:59:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8QMxN300727 for lkcd-outgoing; Wed, 26 Sep 2001 15:59:23 -0700 Received: from mx.webfountain.com (mx.digitalfountain.com [209.219.233.39]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8QMxMD00723 for ; Wed, 26 Sep 2001 15:59:22 -0700 Received: (qmail 24341 invoked from network); 26 Sep 2001 22:59:17 -0000 Received: from mail.intranet (10.1.1.37) by mx.digitalfountain.com with SMTP; 26 Sep 2001 22:59:17 -0000 Received: from there (yoel@ricci.intranet [10.1.3.25] (may be forged)) by mail.intranet (8.9.3/8.9.3) with SMTP id PAA25898 for ; Wed, 26 Sep 2001 15:58:51 -0700 Message-Id: <200109262258.PAA25898@mail.intranet> X-Authentication-Warning: mail.intranet: Host yoel@ricci.intranet [10.1.3.25] (may be forged) claimed to be there Content-Type: text/plain; charset="iso-8859-1" From: Yoel Inbar Organization: Digital Fountain To: lkcd@oss.sgi.com Subject: Re: problem compiling lkcd 3.1.3 Date: Wed, 26 Sep 2001 15:57:53 -0700 X-Mailer: KMail [version 1.3.1] References: <200109262251.PAA25422@mail.intranet> In-Reply-To: <200109262251.PAA25422@mail.intranet> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Replying to myself... My bad for not reading my old lkcd mail before I wrote this. Short story: bison does not work. byacc does. Now I know. Yoel > making the lkcdutils from the 3.1.3 tarball fails for me with: > > make[1]: Entering directory `/root/orig/lkcdutils-1.0.orig/libsial' > yacc -psial -v -t -d sial.y > sial.y:76: type clash (`' `t') on default action > sial.y:125: invalid input: ; > make[1]: *** [sial.tab.h] Error 1 > > Has anyone seen this before? > > Yoel From owner-lkcd@oss.sgi.com Fri Sep 28 10:32:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f8SHWqM17381 for lkcd-outgoing; Fri, 28 Sep 2001 10:32:52 -0700 Received: from aprilia.amazon.com (aprilia.amazon.com [209.191.164.156]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f8SHWnD17378 for ; Fri, 28 Sep 2001 10:32:49 -0700 Received: from kawasaki.amazon.com (kawasaki.amazon.com [10.16.42.209]) by aprilia.amazon.com (Postfix) with ESMTP id E3E3D70 for ; Fri, 28 Sep 2001 10:32:46 -0700 (PDT) Received: from AMZN097255X (us1-dhcp-134-56.amazon.com [10.21.134.56]) by kawasaki.amazon.com (Postfix) with SMTP id ACF144805F for ; Fri, 28 Sep 2001 10:32:46 -0700 (PDT) From: "Monty Vanderbilt" To: Subject: Base version for LKCD CVS tree Date: Fri, 28 Sep 2001 10:32:46 -0700 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2911.0) Importance: Normal X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2600.0000 Sender: owner-lkcd@oss.sgi.com Precedence: bulk It's hard to create LKCD patches from the CVS source tree because it's not clear what base kernel version these files are derived from. The Makefile appears to come from version 2.4.8. However, if I download ftp.kernel.org/pub/linux/kernel/v2.4/linux-2.4.8.tar.bz2 there are a number of differences unrelated to crash dump. For example, compare Documentation/Configure.help and kernel/smp.c. Creating a patch for 2.4.8 or other kernel versions requires an initial manual merge of unrelated changes before creating the LKCD patch. This leads to ambiguity about whether a particular change is related to LKCD or not. It would be very helpful if the base linux source from which the CVS tree is derived was listed in a LKCD README file. This would provide a reference to determine what changes were made for LKCD. Monty VanderBilt Amazon