From owner-kdb@oss.sgi.com Tue Mar 12 21:28:37 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2D5Sbw31402 for kdb-outgoing; Tue, 12 Mar 2002 21:28:37 -0800 Received: from rj.sgi.com (rj.SGI.COM [204.94.215.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2D5SZ931398 for ; Tue, 12 Mar 2002 21:28:35 -0800 Received: from nodin.corp.sgi.com (nodin.corp.sgi.com [192.26.51.193]) by rj.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g2D4SU6G005811 for ; Tue, 12 Mar 2002 20:28:30 -0800 Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by nodin.corp.sgi.com (8.11.4/8.11.2/nodin-1.0) with ESMTP id g2D4RTA35904965; Tue, 12 Mar 2002 20:27:29 -0800 (PST) Received: by kao2.melbourne.sgi.com (Postfix, from userid 16331) id 73B413000B8; Wed, 13 Mar 2002 15:27:26 +1100 (EST) Received: from kao2.melbourne.sgi.com (localhost [127.0.0.1]) by kao2.melbourne.sgi.com (Postfix) with ESMTP id EFB80BA; Wed, 13 Mar 2002 15:27:26 +1100 (EST) X-Mailer: exmh version 2.2 06/23/2000 with nmh-1.0.4 From: Keith Owens To: Thomas Duffy Cc: kdb@oss.sgi.com, sparclinux@vger.kernel.org Subject: kdb-v2.1-2.4.18-sparc64-2 Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Wed, 13 Mar 2002 15:27:21 +1100 Message-ID: <20140.1015993641@kao2.melbourne.sgi.com> Sender: owner-kdb@oss.sgi.com Precedence: bulk extract of changelog: 2002-03-11 Tom Duffy * backtracing will work right when we call kdb() directly. Ethan Solomita * kdb v2.1-2.4.18-sparc64-2 ftp://oss.sgi.com/projects/kdb/download/v2.1/kdb-v2.1-2.4.18-sparc64-2.bz2 From owner-kdb@oss.sgi.com Tue Mar 19 06:08:15 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2JE8FH29515 for kdb-outgoing; Tue, 19 Mar 2002 06:08:15 -0800 Received: from nixpbe.pdb.sbs.de (nixpbe.pdb.siemens.de [192.109.2.33]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2JE87929509 for ; Tue, 19 Mar 2002 06:08:08 -0800 Received: from trulli.pdb.fsc.net (ThisAddressDoesNotExist [172.25.96.20] (may be forged)) by nixpbe.pdb.sbs.de (8.11.2/8.11.2) with ESMTP id g2JE9S231930 for ; Tue, 19 Mar 2002 15:09:29 +0100 Received: from biker.pdb.fsc.net (biker.pdb.fsc.net [172.25.187.106]) by trulli.pdb.fsc.net (8.9.3/8.9.3) with ESMTP id PAA12383 for ; Tue, 19 Mar 2002 15:09:27 +0100 Received: from localhost (martin@localhost) by biker.pdb.fsc.net (8.11.6/8.11.6) with ESMTP id g2JECkD05089 for ; Tue, 19 Mar 2002 15:12:47 +0100 Date: Tue, 19 Mar 2002 15:12:46 +0100 (CET) From: Martin Wilck To: Subject: Problems with debugging I/O port access in kdb on i386 Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-kdb@oss.sgi.com Precedence: bulk Hi, [This message was already sent to Keith and Linux-Kernel] I have encountered the following problems with kdb 2.1 (2.4.18) when trying to catch I/O port accesses in kdb (both can probably be fixed easily): 1. This code in kdb/kdb_bp.c: if (kdba_verify_rw(addr, sizeof(kdb_machinst_t))) { kdb_printf("Invalid address for breakpoint, ignoring bp command\n"); return(0); } forbids to set I/O breakpoints on low ports (e.g. 0x20), because the address check done by kdba_verify_rw is valid for memory addresses only. AFAICS, no check whatsoever is necessary for I/O port addresses. I would submit a patch for this, but the address check must be postponed after the architecture-dependent parsing, and the information whether this is an I/O port breakpoint must be passed to the checking code. I don't know what implications that may have for the other architectures. 2. The DE flag in the CR4 register must be set (for CPUs that have it) in order to use I/O breakpoints at all. Otherwise they will be simply ignored by the CPU. Thus, a line like if (cpu_has_de) set_in_cr4 (X86_CR4_DE); must be put in kdba_init(). That may not suffice because cpu_init() (kernel/setup.c) clears the DE bit for each CPU, I don't know which one is called first. Again, I do not oversee all possible implications, so I do not submit a patch. As a hack, I inserted the above line in kdba_installdbreg() after the line dr7 |= DR7_GE; This works fine, I can now trap the I/O accesses I want. Cheers, Martin -- Martin Wilck Phone: +49 5251 8 15113 Fujitsu Siemens Computers Fax: +49 5251 8 20409 Heinz-Nixdorf-Ring 1 mailto:Martin.Wilck@Fujitsu-Siemens.com D-33106 Paderborn http://www.fujitsu-siemens.com/primergy From owner-kdb@oss.sgi.com Wed Mar 20 12:38:09 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2KKc9Q09891 for kdb-outgoing; Wed, 20 Mar 2002 12:38:09 -0800 Received: from mail.somanetworks.com ([63.204.6.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2KKc4909888 for ; Wed, 20 Mar 2002 12:38:04 -0800 Received: from somanetworks.com ([10.11.100.45]) by mail.somanetworks.com (Netscape Messaging Server 4.15) with ESMTP id GTAHD300.SE0 for ; Wed, 20 Mar 2002 12:39:03 -0800 Received: (from georgn@localhost) by somanetworks.com (8.11.6/8.11.2) id g2KKd2131736; Wed, 20 Mar 2002 15:39:02 -0500 Subject: hardware breakpoints From: "Georg Nikodym" To: kdb@oss.sgi.com Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.2.99 Preview Release Date: 20 Mar 2002 15:39:02 -0500 Message-Id: <1016656742.23750.116.camel@keller> Mime-Version: 1.0 Sender: owner-kdb@oss.sgi.com Precedence: bulk Stupid(?) question... I'm trying to catch a bug and am thinking that the bph command is just the ticket. So I set one and get a response: kdb> bph 0xcf33c554 DATAR 4 Forced Data Access BP #0 at 0xcf33c554 is enabled in dr0 for 4 bytes on cpu 0 kdb> but the breakpoint never gets hit :-( The address above is the file member of a task struct, so I'd think I could induce a look... I realize that the processor has to support these breakpoints and I'm wondering now whether my processor simply won't do what I want or I'm just a dumbass that hasn't configured the kernel correctly... My cpuinfo: [root@scum /mnt]# cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 8 model name : Pentium III (Coppermine) stepping : 3 cpu MHz : 498.674 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr sse bogomips : 996.14 Thanks for any hints, Georg From owner-kdb@oss.sgi.com Wed Mar 20 12:59:39 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2KKxd110404 for kdb-outgoing; Wed, 20 Mar 2002 12:59:39 -0800 Received: from mail.somanetworks.com ([63.204.6.12]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2KKxb910401 for ; Wed, 20 Mar 2002 12:59:37 -0800 Received: from somanetworks.com ([10.11.100.45]) by mail.somanetworks.com (Netscape Messaging Server 4.15) with ESMTP id GTAID200.LDU for ; Wed, 20 Mar 2002 13:00:38 -0800 Received: (from georgn@localhost) by somanetworks.com (8.11.6/8.11.2) id g2KL0bx31810; Wed, 20 Mar 2002 16:00:37 -0500 Subject: Re: hardware breakpoints From: "Georg Nikodym" To: kdb@oss.sgi.com In-Reply-To: <1016656742.23750.116.camel@keller> References: <1016656742.23750.116.camel@keller> Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Ximian Evolution 1.0.2.99 Preview Release Date: 20 Mar 2002 16:00:37 -0500 Message-Id: <1016658037.23750.119.camel@keller> Mime-Version: 1.0 Sender: owner-kdb@oss.sgi.com Precedence: bulk On Wed, 2002-03-20 at 15:39, Georg Nikodym wrote: > Stupid(?) question... Never mind. Setting a breakpoint on the address pointed to by %esp triggers immediately, so there's some other flaw in my thinking. -g From owner-kdb@oss.sgi.com Tue Mar 26 14:53:20 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2QMrKo11643 for kdb-outgoing; Tue, 26 Mar 2002 14:53:20 -0800 Received: from zok.sgi.com (zok.SGI.COM [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2QMrDq11637 for ; Tue, 26 Mar 2002 14:53:14 -0800 Received: from rock.csd.sgi.com (fddi-rock.csd.sgi.com [130.62.69.10]) by zok.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g2QNtWBA026900 for ; Tue, 26 Mar 2002 15:55:32 -0800 Received: from piet1.csd.sgi.com (piet1.csd.sgi.com [130.62.70.121]) by rock.csd.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id OAA64712; Tue, 26 Mar 2002 14:55:19 -0800 (PST) Received: (from piet@localhost) by piet1.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id OAA25993; Tue, 26 Mar 2002 14:55:17 -0800 (PST) Date: Tue, 26 Mar 2002 14:55:17 -0800 From: Piet/Pete Delaney To: Ashok Raj Cc: Piet Delaney , lkcd-general@lists.sourceforge.net, tjm@sgi.com, kdb@oss.sgi.com, Keith Owens , Jack Steiner , Jesse Barnes Subject: Re: [lkcd-general] kdb and lcrash Message-ID: <20020326145517.B25870@sgi.com> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.23i Sender: owner-kdb@oss.sgi.com Precedence: bulk On Tue, Mar 26, 2002 at 12:44:49PM -0800, Ashok Raj wrote: > Hello. > > Has anyone tried applying the kdb patch and the lkcd patch? i would like to > be able to use the kernel debugger, and the crash at the same time. i.e on > panic i would like kdb to get control and have a command support to dump > crash image if necessary. I think we are experiencing a spinlock problem where kdb is stopping the other CPU's and leaving a spinlock hung. I'll cc: you on the issue, if anyone else is interested in the details let me know and I'll expand the Cc: list. I suspect the the kdb processing of: kdb(KDB_REASON_PANIC, 0, NULL) may need something added to make sure spinlocks aren't being held. Perhaps we need to change kdb to behave more like: kdb(KDB_REASON_KEYBOARD, 0, (kdb_eframe_t)regs); when we are using dump and kdb together. Could also be a bug in lkcd, but curently it's only hanging on systems with kdb installed. -piet From owner-kdb@oss.sgi.com Tue Mar 26 14:54:10 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g2QMsAY11696 for kdb-outgoing; Tue, 26 Mar 2002 14:54:10 -0800 Received: from zok.sgi.com (zok.SGI.COM [204.94.215.101]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g2QMrhq11669 for ; Tue, 26 Mar 2002 14:53:43 -0800 Received: from rock.csd.sgi.com (fddi-rock.csd.sgi.com [130.62.69.10]) by zok.sgi.com (8.12.2/8.12.2/linux-outbound_gateway-1.2) with ESMTP id g2QNu2BA026921 for ; Tue, 26 Mar 2002 15:56:02 -0800 Received: from piet1.csd.sgi.com (piet1.csd.sgi.com [130.62.70.121]) by rock.csd.sgi.com (SGI-8.9.3/8.9.3) with ESMTP id OAA66379; Tue, 26 Mar 2002 14:55:50 -0800 (PST) Received: (from piet@localhost) by piet1.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) id OAA26003; Tue, 26 Mar 2002 14:55:49 -0800 (PST) Date: Tue, 26 Mar 2002 14:55:48 -0800 From: Piet/Pete Delaney To: Jack Steiner Cc: Ashok Raj , Piet Delaney , tjm@sgi.com, kdb@oss.sgi.com, Keith Owens , Jesse Barnes , David Mosberger , gjertsen@us.ibm.com, j-nomura@ce.jp.nec.com Subject: NMI - Did I miss anything? Message-ID: <20020326145548.C25489@sgi.com> References: <200203190700.g2J704Z28221@betty.americas.sgi.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200203190700.g2J704Z28221@betty.americas.sgi.com> User-Agent: Mutt/1.3.23i Sender: owner-kdb@oss.sgi.com Precedence: bulk Jack: Here's the NMI output I'm getting when I very frequently now hang dumping the 1st block of the dump: ----------------------------------------------------------------------------- 258 monica 22:50 ~> sync 259 monica 22:50 ~> panic Kernel panic: sys_setpriority Entering kdb (current=0xe000000004578000, pid 860) on processor 1 due to panic [1]kdb> dump: Dumping to device 0x802 [sd(8,2)] on CPU 1 ... dump: Compression value is 0x0, Writing dump header escaping to L2 system controller monica-001-L2>nmi 001i19: INFO: command not support on this brick type re-entering system console mode (001c28 console), to escape to L2 INIT - NASID 1, cpu 2 (monarch) MinState area at 0x8000000200009600 Control registers IIP 0x00000000ffe62760 IPSR 0x0000100001002018 IFS: 0x8000000000000409 IIP: FIRMWARE XIP 0x00000000ffe5fa10 XPSR 0x0000100001002018 XFS: 0x0000000000000000 B0 0x00000000ffe625f0 PRED 0x0000000000008495 RSC: 0x0000000000000000 ISR 0x0000020000000004 IIPA 0x00000000ffe62750 ITIR: 0x0000000000000030 IFA 0x80000000ffffffe0 NaT:0x0000000000000000 BSR:0x00000000001e0e00 LIBC 0x00000000ffe9c6a0 ELSC 0x0000000200005600 PAL:0x0000000000048010 General registers GR0 .. GR31 (bank 1) GR0 0x0000000000000000 0x00000000ffed28e0 0x00000000001e0e00 0x0000000000000000 GR4 0x0000000000000012 0x0000000000000040 0x0000000000000040 0x00000000000000c0 GR8 0x0000000000000000 0x0000000000040000 0x0000000000000000 0x0000000000000000 GR12 0x0000000203fdfe20 0x0000000000000333 0x80000a0001608018 0x80000a0001790010 GR16 0x000000000000002e 0x0000000203fdfd31 0x0000000203fdfdf0 0x0000000000005038 GR20 0x0000000000000000 0x80000000ffd28020 0x0000000001002018 0x0000000000000000 GR24 0x0000001008020000 0x0000000000000000 0x0000000000000000 0x0000000000000000 GR28 0x0000000000000100 0x0000000000000000 0x0000000000000000 0x0000000000000000 General registers GR16 .. GR31 (bank 0) GR16 0x0000000000000060 0x0000000000000000 0x00000000ffd3ac80 0x0000000000000004 GR20 0x0000000000000000 0x80000000ffffff80 0x0000140000002030 0x0000000000000003 GR24 0x18e002017ffff801 0x08012b0e2029837c 0x0000100300002038 0x0000000000195631 GR28 0x0000000000195871 0x0000000000195872 0x0000000000000010 0xfffffffffffe8aa1 Rotating Registers GR32 .. GR40 GR32 0x00000000ffe6c938 0x00000000ffe6a668 0x0000000000000010 0x80000a0001400150 GR36 0x00000000ffdb1a80 0x000000000000058e 0x00000000ffe236c0 0x0000000000000306 GR40 0x0000000203fdfdf8INFO: partition 0 system console changed: 001c28 CPU0 HARDWARE ERROR STATE: (Forced error dump) INIT - NASID 0, cpu 0 END Hardware Error State (Forced error dump) MinState area at 0x8000000000007600 Dump Spool for PI Errors - nasid 1, err stack A Control registers Entry 0: (0x130d34f01000002) IIP 0xe0020000004b2bd0 IPSR 0x0000121008022018 IFS: 0x800000000000048c IIP: schedule+0xf0 Cmd 0x02(Request:READ), RRB stat: --------E0 XIP 0xe0020000007552f0 XPSR 0x0000141008026018 XFS: 0x0000000000000812 XIP: rt_check_expire__thr+0x410 CRB #0, T5 req #0, supp 0 B0 0xe00200000040ab40 PRED 0x0000000000016069 RSC: 0x0000000000000003 Error 2 Directory Error, Cache line address 0x130d34f00 ISR 0x0000040000000000 IIPA 0xe0020000004b2bd0 ITIR: 0x0000000000000538 Dump Spool complete. IFA 0xbfffff0000000028 NaT:0x0000000000000000 BSR:0x00000000001e0e00 LIBC 0x00000000ffe9c6a0 ELSC 0x0000000000005600 PAL:0x0000000000048010 General registers GR0 .. GR31 (bank 1) GR0 0x0000000000000000 0xe002100000bbc000 0xe00000000277ff00 0xe000000002778000 GR4 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 GR8 0x0000000000000001 0x0000000000000309 0x0000000000000000 0x0000000000000206 GR12 0xe00000000277fe50 0xe000000002778000 0x0000000000000001 0xe002100000b552d0 GR16 0xe002100000b552d0 0x0000000000000001 0x0000000000000000 0x0000000000000000 GR20 0xe00000004f36b480 0x0000000000000002 0xffffffffffffffff 0xe000000002778000 GR24 0x0000000000000000 0xe000000001218058 0xe000000001218040 0xe000000001218050 GR28 0xe000000001218168 0xe000000001218060 0x0000000000000001 0x0000000000000000 General registers GR16 .. GR31 (bank 0) GR16 0xe000000002779438 0x0000000000000308 0x0000000000000000 0x0000000000000000 GR20 0xe002100000bbc000 0xe002000000754ee0 0xbfffff0000000028 0x0000000000000000 GR24 0x0000000000000000 0x0000000000000000 0x000000000000048c 0x0000000000000003 GR28 0xe0020000007552f0 0x0000141008026018 0x8000000000000812 0x00000000000161a9 Rotating Registers GR32 .. GR41 GR32 0xe002100000a94d00 0x0000001008022018 0xe0020000004c7e50 0x0000000000000388 GR36 0xe000000001218048 0xe00200000040ab40 0x000000000000050a 0x0000000000000000 GR40 0x0000000000000000 0xe002100000a94e00 Dump Spool for PI Errors - nasid 1, err stack A Entry 0: (0x130d34f01000002) Cmd 0x02(Request:READ), RRB stat: --------E0 CRB #0, T5 req #0, supp 0 Error 2 Directory Error, Cache line address 0x130d34f00 Dump Spool complete. INFO: partition 0 system console changed: 001c28 CPU2 INIT - NASID 0, cpu 2 MinState area at 0x8000000000009600 Control registers IIP 0xe0020000005521a0 IPSR 0x0000101008026018 IFS: 0x8000000000000287 IIP: kiobuf_wait_for_io+0xc0 [MIB] cmp4.eq p6,p7=0,r14 XIP 0xe0020000005521a0 XPSR 0x0000101008026018 XFS: 0x0000000000000287 B0 0xe0020000005521f0 PRED 0x0000000000061599 RSC: 0x0000000000000003 ISR 0x0000000000000000 IIPA 0xe002000000552190 ITIR: 0x0000000000000538 IFA 0xbfffff0000000038 NaT:0x0000000000000000 BSR:0x00000000001e0e00 LIBC 0x00000000ffe9c6a0 ELSC 0x0000000000005600 PAL:0x0000000000048010 General registers GR0 .. GR31 (bank 1) GR0 0x0000000000000000 0xe002100000bbc000 0x0000000000000000 0xe000000004578000 GR4 0x0000000000000000 0x0000000000000000 0x0000000000000000 0x0000000000000000 GR8 0x0000000000000000 0x0000000000000c1c 0x0000000000000000 0x0000000000061499 GR12 0xe00000000457fde0 0xe000000004578000 0x0000000000000004 0x0000000000000001 GR16 0xe002100000c0d400 0xe0000000017980aa 0xc000000000000000 0xe002100000b57000 GR20 0xe0000000017386d0 0xe0000000017386c9 0xe0000000016c8578 0xe0000000016c8548 GR24 0x0000000000000001 0xe0000000016c8584 0xe0000000016c8560 0xe0000000016c8510 GR28 0xe0000000016c8507 0x0000000000000001 0xe0000000016c8558 0xe0000000016c8506 General registers GR16 .. GR31 (bank 0) GR16 0xe000000004579540 0x0000000000000308 0x0000000000000000 0x0000000000000000 GR20 0xe002100000bbc000 0xe0020000006e2da0 0xbfffff0000000038 0xe0020000006e2da0 GR24 0x0000000000061559 0x0000000000000000 0x0000000000000287 0x0000000000000003 GR28 0xe0020000005521a0 0x0000101008026018 0x8000000000000287 0x0000000000061599 Rotating Registers GR32 .. GR38 GR32 0xe000000003ed41b8 0xe000000003ed41a8 0xe000000004578000 0xe00200000051f650 GR36 0x0000000000000f22INIT - NASID 1, cpu 0 0xe002100000b7ca20MinState area at 0x8000000200007600 0xe002100000bbc000 Control registers IIP 0x00000000ffe62750 IPSR 0x0000100001002018 IFS: 0x8000000000000409 Dump Spool for PI Errors - nasid 1, err stack A XIP 0x00000000ffe5fa10 XPSR 0x0000100001002018 XFS: 0x0000000000000000 Entry 0: (0x130d34f01000002) B0 0x00000000ffe625f0 PRED 0x0000000000018495 RSC: 0x0000000000000000 Cmd 0x02(Request:READ), RRB stat: --------E0 ISR 0x0000020000000004 IIPA 0x00000000ffe62740 ITIR: 0x0000000000000030 CRB #0, T5 req #0, supp 0 IFA 0x80000000ffffffe0 NaT:0x0000000000000000 Error 2 Directory Error, Cache line address 0x130d34f00 BSR:0x00000000001e0e00 LIBC 0x00000000ffe9c6a0 ELSC 0x0000000200005600 Dump Spool complete. PAL:0x0000000000048010 General registers GR0 .. GR31 (bank 1) GR0 0x0000000000000000 0x00000000ffed28e0 0x00000000001e0e00 0x0000000000000000 GR4 0x0000000000000012 0x0000000000000040 0x0000000000000040 0x00000000000000c0 GR8 0x0000000000000000 0x0000000000000001 0x0000000000000000 0x0000000000000000 GR12 0x0000000203fffe20 0x0000000000000333 0x80000a0001608018 0x80000a0001790010 GR16 0x0000000000000030 0x0000000203fffd32 0x0000000000000000 0x0000000000000010 GR20 0x0000000009fff800 0xffffffff00000000 0x00000000ffffffff 0xffff0000ffff0000 GR24 0x80000b01c0000408 0x000000fe00000000 0x0000000000ffffff 0xffffffff01000000 GR28 0x0000000000000100 0x0000000000000000 0x0000000000000000 0x0000000000000000 General registers GR16 .. GR31 (bank 0) GR16 0x0000000000000060 0x0000000000000000 0x00000000ffd3ac80 0x0000000000000004 GR20 0x0000000000000000 0x80000000ffffff80 0x0000140000002030 0x0000000000000003 GR24 0x18e002017ffff801 0x08012b0e2029837c 0x0000100300002038 0x0000000000195631 GR28 0x0000000000195871 0x0000000000195872 0x0000000000000010 0xfffffffffffe8aa1 Rotating Registers GR32 .. GR40 GR32 0x00000000ffe6c938 0x00000000ffe6a660 0x0000000000000010 0x80000a0001400150 GR36 0x00000000ffdb1a80 0x000000000000058e 0x00000000ffe236c0 0x0000000000000306 GR40 0x0000000203fffdf8 Dump Spool for PI Errors - nasid 1, err stack A Entry 0: (0x130d34f01000002) Cmd 0x02(Request:READ), RRB stat: --------E0 CRB #0, T5 req #0, supp 0 Dump Spool complete. C 001 001c31: C 001 001c31: *** NTLB Interruption on node 1 C 001 001c31: *** EPC: 0x0 ([Symbol Table not available]) C 001 001c31: *** IIP: 0xffd9ce80, IPSR: 0x1000000 C 001 001c31: *** Press ENTER to continue. ------------------------------------------------------------------------------------------------------- I think we are likely waiting in all of the add_wait_queue() inline code with kiobuf_wait_for_io+0xc0 being so close to the kiobuf_wait_for_io() entry point: -------------------------------------------------------------------------- /** * kiobuf_wait_for_io - wait for completion of a kiobuf request * @kiobuf: kiobuf request to wait for * * Adds a completion event for the kiobuf in question and wakes up * when the I/O has completed. */ void kiobuf_wait_for_io(struct kiobuf *kiobuf) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); if (atomic_read(&kiobuf->io_count) == 0) return; add_wait_queue(&kiobuf->wait_queue, &wait); <<---- Hanging Here repeat: -------------------------------------------------------------------- So I suppose it's more likely that I'm in the inline code for add_wait_queue(): ------------------------------------------------------------------------------------- extern inline void add_wait_queue(struct wait_queue ** p, struct wait_queue * wait) { unsigned long flags; write_lock_irqsave(&waitqueue_lock, flags); __add_wait_queue(p, wait); write_unlock_irqrestore(&waitqueue_lock, flags); } -------------------------------------------------------------------------------------------------------------- #define write_lock_irqsave(lock, flags) do { local_irq_save(flags); write_lock(lock); } while (0) -------------------------------------------------------------------------------------------------------------- # define local_irq_save(x) \ do { \ unsigned long ip, psr; \ \ __asm__ __volatile__ ("mov %0=psr;; rsm psr.i;;" : "=r" (psr) :: "memory"); \ if (psr & (1UL << 14)) { \ __asm__ ("mov %0=ip" : "=r"(ip)); \ last_cli_ip = ip; \ } \ (x) = psr; \ } while (0) -------------------------------------------------------------------------------------------------------------- #define write_lock(rw) \ do { \ __asm__ __volatile__ ( \ "mov ar.ccv = r0\n" \ "dep r29 = -1, r0, 31, 1\n" \ ";;\n" \ "1:\n" \ "ld4 r2 = [%0]\n" \ ";;\n" \ "cmp4.eq p0,p7 = r0,r2\n" \ "(p7) br.cond.spnt.few 1b \n" \ "cmpxchg4.acq r2 = [%0], r29, ar.ccv\n" \ ";;\n" \ "cmp4.eq p0,p7 = r0, r2\n" \ "(p7) br.cond.spnt.few 1b\n" \ ";;\n" \ :: "r"(rw) : "ar.ccv", "p7", "r2", "r29", "memory"); \ } while(0) -------------------------------------------------------------------------------------------------------------- On most systems I've done much debugging of spinlock and mutex hangs I've added or used audit information in the locks saying who owned it (CPU, and PC's on the stack of caller to write_lock(). IA64 Linux doesn't seem to have any audtin information even for DEBUG or BRINGUP kernels. I think there is likely something very wrong with kdb's handling of IPI's to processors. Perhaps in older 2.4.16 kernels it wasn't worrying about checking the state of the processors it was stopping to make sure they weren't processing interrupt code. On the Sequent our SPL's to block interrutps were right in the spinlock code. I don't see it in our write_lock() #define. Is that being done by local_irq_save()? I dont' see it but I'm still rusty on ia64 asm.... When I go thru panic() to dump and involve kdb I hit this bug between 20% to 90% of the time. When I go directly to dump() we sucessfully dumped the ia64 SN1 system 250 times before hanging. On a on a PC mono-processor without kdb we never hang; it ran all night (2500 dumps). -piet From owner-kdb@oss.sgi.com Sun Mar 31 18:20:14 2002 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id g312KEG21255 for kdb-outgoing; Sun, 31 Mar 2002 18:20:14 -0800 Received: from mail.pangeatech.com (pxofc151-phx1.pangeatech.com [63.110.32.151]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id g312K5v21252 for ; Sun, 31 Mar 2002 18:20:08 -0800 Received: from [65.192.22.133] by mail.pangeatech.com (NTMail 7.00.0018/NU8172.00.e2123c13) with ESMTP id khhhmaaa for kdb@oss.sgi.com; Sun, 31 Mar 2002 19:18:42 -0700 Received: from randolph by gandalf.tausq.org with local (Exim 3.34 #1 (Debian)) id 16rrPb-0005Wb-00; Sun, 31 Mar 2002 18:19:11 -0800 Date: Sun, 31 Mar 2002 18:19:11 -0800 From: Randolph Chung To: parisc-linux@parisc-linux.org, kdb@oss.sgi.com Subject: Initial port of kdb to parisc-linux Message-ID: <20020401021911.GG26353@tausq.org> Reply-To: Randolph Chung Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.27i X-PGP: for PGP key, see http://www.tausq.org/pgp.txt X-GPG: for GPG key, see http://www.tausq.org/gpg.txt Sender: owner-kdb@oss.sgi.com Precedence: bulk Greetings, It's amazing how much you can get done when you are procrastinating from doing what you are supposed to be doing. The result, a first cut of a port of SGI's kdb to parisc-linux. There is a patch against kdb version 2.1 at ftp://ftp.parisc-linux.org/patches/kdb-v2.1-2.4.18-pa11-parisc-1.bz2 It is meant to be applied over the kdb-v2.1-2.4.18-common-2.bz2 patch. (I would give a url, but sgi's ftp site seems to be down as i write this) The basic commands like disassembly, memory/register dumps, setting breakpoints, backtracing, etc work. Single stepping does not yet work though. Also, right now this only works on 32-bit kernels. There are a few things that need to be fixed for 64-bit to work -- 1. modutils doesn't handle building 64-bit kdb-enabled kernels when running a 32-bit kernel (kallsyms will fail). 2. Once you hack around #1, you run into problems because the parisc64 compiler for some reason emits .dynsym/.dynamic sections into the kernel. This confuses modutils to no end. There's a small patch in the pa patch above that discards the dynamic sections. I'm dubious this is the right thing to do, though it does build a kernel that will boot and start kdb 3. After #1/#2, we run into a problem because hppa64 binutils doesn't seem to sort unwind sections entirely correctly, so the backtracing logic will fail. I've only tested this on serial console. Virtual console probably doesn't work yet (the pause key mapping code is missing). Also as explained on the kdb mailing list the code only works with ps2 type keyboards; someone will need to write a polling usb driver to use this with a usb keyboard. There is still much work to do... the unwind code in particular is quite crude and could be much enhanced. please try out the patch and feel free to send bug reports, patches, etc :-) randolph -- @..@ http://www.TauSq.org/ (----) ( >__< ) ^^ ~~ ^^