From owner-lkcd@oss.sgi.com Mon May 1 11:38:38 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 11:38:18 -0700 Received: from zmamail05.zma.compaq.com ([161.114.64.105]:12038 "HELO zmamail05.zma.compaq.com") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 11:38:12 -0700 Received: by zmamail05.zma.compaq.com (Postfix, from userid 12345) id 78FFA1A75; Mon, 1 May 2000 14:38:06 -0400 (EDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by zmamail05.zma.compaq.com (Postfix) with SMTP id 7533C1843; Mon, 1 May 2000 14:38:05 -0400 (EDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA27766; Mon, 1 May 2000 12:38:04 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA06604; Mon, 1 May 2000 12:38:03 -0600 Received: by compaq.com (sSMTP sendmail emulation); Mon, 1 May 2000 12:37:43 -0600 Content-Length: 4177 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 In-Reply-To: <3908B22A.B11911EB@sgi.com> Date: Mon, 01 May 2000 12:37:43 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: Tom Morano Subject: Re: Alpha lcrash initialization problem - can't access memory Cc: "Matt D.Robinson" , lkcd@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing OK, after wasting some time with memory debugging libraries (dmalloc & electric fence), I realized kl_init_kern_info() is failing. I'm tracking down the reason(s) why now. At least this makes sense- I probably haven't yet made all the necessary modifications for it to work on Alpha. Hmm, looks like a problem while in cmpreadmem (set cmp_debug=1): ==================================================== (gdb) run Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash ./map.0 ./vmdump.0 map = ./map.0, vmdump = ./vmdump.0, outfile = stdout Please wait...__cmploadindex(): cannot open() index file [2]: No such file or directory! cmppindexcreate(): Number of pages in dump: 12288 cmppindexcreate(): Page size in dump: 8192 .. Attempting to save index "index.10" ... complete. ...............cmpreadmem(): 8 bytes, 0x584c81 (just a page) __cmppread(): initiating search for 0x584c81 __cmppindex(): hash = 16472, addr = 0x584c81 __cmppindex(): addr = 0x584c81, tmpptr->addr = 0x584000 __cmppread(): page not found! (0x584c81) Program exited with code 01. ==================================================== Now, I have set the system to compress the dump on the swap partition (DUMP_COMPRESS_PAGES=1); however when I look at the generated vmdump.0 file is does not look like it has been RLE encoded- there are long strings of 0x00 repeating throughout the file. The vmdump.0 is ~69MB, while the real memory size of the machine is 96MB (#pages*pagesize is also 96MB). Looks to me like either the dump is getting truncated, or the dump compression routine has a problem? (DUMP_LEVEL=4, swap partition is 512MB) I am guessing that sp->s_addr is correct (this is the 0x584c81 being searched for in cmpreadmem). Is the vmdump.0 file supposed to be compressed on the disk if DUMP_COMPRESS_PAGES=1 ? What is the deal with the "kernel_magic" symbol? I can't find it in the symbol map for i386 or Alpha. Trying to run lcrash against the running system now gives: ==================================================== (gdb) run Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash map = /boot/System.map, vmdump = /dev/mem, outfile = stdout Please wait................ kernel_magic mismatch of map and memory image Program exited with code 01. ==================================================== On 27-Apr-2000 Tom Morano wrote: > rBrian Hall wrote: >> >> I haven't changed anything in main(). After the command options are parsed > out, >> around main.c:198: (dies in register_cmds() ) >> >> init_liballoc(0, 0, 0); >> kl_init_kern_info(); >> register_cmds(cmdset); >> arch_init(ofp); >> >> Are you saying that init_liballoc() needs different arguments now? I >> followed >> the call sequence down for init_liballoc, and it appeared that values other >> than zero were assigned along the way. Changing to >> init_liballoc(100,100,100) >> had no effect (same traceback on the segfault). Upping that to 1000 didn't > help. > > The parameters to init_liballoc() are OK. Based on this, I would guess that > some memory is getting stomped on in or below the kl_init_kern_info() > function > call. You might check the block of memory causing the SEGV after returning > from the init_liballoc() call and before the kl_init_kern_info() call. See if > it > looks OK at that point (I would guess the contents of this memory is change > by > the time you get to register_cmds()). If that's the case, then walk through > the > kl_init_kern_info() function and see where the memory contents changes. From > looking at the kl_init_kern_info() function, I can't see where the problem > might > occur (it basically just does symbol lookups and reads in the contents of > memory > into some local variables). Since the Alpha is 64 bit, I assume that the > amount > of > memory being read in for these values is 8 bytes instead of 4 (and that the > local > variables, NUM_PHYSPAGES and MEM_MAP have been changed also). Little things > like > that might be a factor. Anyway, that's how I would approach narrowing it > down. > > Tom -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Mon May 1 11:55:08 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 11:54:49 -0700 Received: from mail.turbolinux.com ([38.170.88.25]:60680 "EHLO mail.turbolinux.com") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 11:54:21 -0700 Received: from localhost (yakker@localhost) by mail.turbolinux.com (8.9.3/8.9.3) with ESMTP id LAA02948; Mon, 1 May 2000 11:53:14 -0700 Date: Mon, 1 May 2000 11:53:10 -0700 (PDT) From: "Matt D. Robinson" To: Brian Hall cc: Tom Morano , lkcd@oss.sgi.com Subject: Re: Alpha lcrash initialization problem - can't access memory In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing On Mon, 1 May 2000, Brian Hall wrote: |>OK, after wasting some time with memory debugging libraries (dmalloc & electric |>fence), I realized kl_init_kern_info() is failing. I'm tracking down the |>reason(s) why now. At least this makes sense- I probably haven't yet made all |>the necessary modifications for it to work on Alpha. |> |>Hmm, looks like a problem while in cmpreadmem (set cmp_debug=1): |> |>==================================================== |>(gdb) run |>Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash |>./map.0 ./vmdump.0 |>map = ./map.0, vmdump = ./vmdump.0, outfile = stdout |> |>Please wait...__cmploadindex(): cannot open() index file [2]: No such file or |>directory! |>cmppindexcreate(): Number of pages in dump: 12288 |>cmppindexcreate(): Page size in dump: 8192 |>.. |>Attempting to save index "index.10" ... complete. |>...............cmpreadmem(): 8 bytes, 0x584c81 (just a page) |>__cmppread(): initiating search for 0x584c81 |>__cmppindex(): hash = 16472, addr = 0x584c81 |>__cmppindex(): addr = 0x584c81, tmpptr->addr = 0x584000 |>__cmppread(): page not found! (0x584c81) |> |>Program exited with code 01. |>==================================================== |> |>Now, I have set the system to compress the dump on the swap partition |>(DUMP_COMPRESS_PAGES=1); however when I look at the generated vmdump.0 file is |>does not look like it has been RLE encoded- there are long strings of 0x00 |>repeating throughout the file. The vmdump.0 is ~69MB, while the real memory |>size of the machine is 96MB (#pages*pagesize is also 96MB). Looks like it's probably getting compressed based on your data. |>Looks to me like either the dump is getting truncated, or the dump compression |>routine has a problem? (DUMP_LEVEL=4, swap partition is 512MB) I am guessing |>that sp->s_addr is correct (this is the 0x584c81 being searched for in |>cmpreadmem). The sp->s_addr is probably correct, but the real question is whether or not the page is being read in properly. You'll need to find out if that page is actually in memory or not (0x584000). You can turn on some additional debugging in kl_cmp.c to print out page information from the page header as the information is dumped out. Set cmp_debug to 2 and tell me what it says. If you see all the pages being read, then you've got what you need from the dump. If not, then there's a problem with compression. I'd typically say it isn't compression as I've used that code for 32 and 64 bit systems in the past (for a few years). Still, there could be a problem. |>Is the vmdump.0 file supposed to be compressed on the disk if |>DUMP_COMPRESS_PAGES=1 ? Yes ... /sbin/vmdump config sets /proc/sys/vmdump/dump_compress_pages to the value based on DUMP_COMPRESS_PAGES. |>What is the deal with the "kernel_magic" symbol? I can't find it in the symbol |>map for i386 or Alpha. Trying to run lcrash against the running system now |>gives: That should be in the 1.0.4 version ... I don't think it's in the 1.0.3 version. Look at init/main.c for the value. |>==================================================== |>(gdb) run |>Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash |>map = /boot/System.map, vmdump = /dev/mem, outfile = stdout |> |>Please wait................ |>kernel_magic mismatch of map and memory image |> |>Program exited with code 01. |>==================================================== |> |> |>On 27-Apr-2000 Tom Morano wrote: |>> rBrian Hall wrote: |>>> |>>> I haven't changed anything in main(). After the command options are parsed |>> out, |>>> around main.c:198: (dies in register_cmds() ) |>>> |>>> init_liballoc(0, 0, 0); |>>> kl_init_kern_info(); |>>> register_cmds(cmdset); |>>> arch_init(ofp); |>>> |>>> Are you saying that init_liballoc() needs different arguments now? I |>>> followed |>>> the call sequence down for init_liballoc, and it appeared that values other |>>> than zero were assigned along the way. Changing to |>>> init_liballoc(100,100,100) |>>> had no effect (same traceback on the segfault). Upping that to 1000 didn't |>> help. |>> |>> The parameters to init_liballoc() are OK. Based on this, I would guess that |>> some memory is getting stomped on in or below the kl_init_kern_info() |>> function |>> call. You might check the block of memory causing the SEGV after returning |>> from the init_liballoc() call and before the kl_init_kern_info() call. See if |>> it |>> looks OK at that point (I would guess the contents of this memory is change |>> by |>> the time you get to register_cmds()). If that's the case, then walk through |>> the |>> kl_init_kern_info() function and see where the memory contents changes. From |>> looking at the kl_init_kern_info() function, I can't see where the problem |>> might |>> occur (it basically just does symbol lookups and reads in the contents of |>> memory |>> into some local variables). Since the Alpha is 64 bit, I assume that the |>> amount |>> of |>> memory being read in for these values is 8 bytes instead of 4 (and that the |>> local |>> variables, NUM_PHYSPAGES and MEM_MAP have been changed also). Little things |>> like |>> that might be a factor. Anyway, that's how I would approach narrowing it |>> down. |>> |>> Tom |> |>-- |>http://www.bigfoot.com/~brihall |>Linux Consultant |> From owner-lkcd@oss.sgi.com Mon May 1 12:01:19 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 12:01:09 -0700 Received: from ztxmail03.ztx.compaq.com ([161.114.1.207]:23314 "HELO ztxmail03.ztx.compaq.com") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 12:00:50 -0700 Received: by ztxmail03.ztx.compaq.com (Postfix, from userid 12345) id 40FEA3ED; Mon, 1 May 2000 14:00:44 -0500 (CDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by ztxmail03.ztx.compaq.com (Postfix) with SMTP id 6E17333C; Mon, 1 May 2000 14:00:43 -0500 (CDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA28066; Mon, 1 May 2000 13:00:42 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA06995; Mon, 1 May 2000 13:00:41 -0600 Received: by compaq.com (sSMTP sendmail emulation); Mon, 1 May 2000 13:00:21 -0600 Content-Length: 372 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 Date: Mon, 01 May 2000 13:00:21 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: lkcd@oss.sgi.com, axp-list@redhat.com, comp.os.linux.alpha@list.deja.com, linux-kernel@vger.rutgers.edu Subject: __locore_lmabegin ? Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing __locore_lmabegin What does this kernel symbol refer to for i386? I need to figure out what the closest Alpha equivalent is. The LKCD lcrash code in question (lookup for the address of this symbol): if (!(sp = kl_lkup_symname("__locore_lmabegin"))) { return(3); } KL_LOCORE_ADDR = sp->s_addr & KL_PAGE_MASK; -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Mon May 1 12:25:29 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 12:25:20 -0700 Received: from mail.turbolinux.com ([38.170.88.25]:10250 "EHLO mail.turbolinux.com") by oss.sgi.com with ESMTP id ; Mon, 1 May 2000 12:24:54 -0700 Received: from localhost (yakker@localhost) by mail.turbolinux.com (8.9.3/8.9.3) with ESMTP id MAA04832; Mon, 1 May 2000 12:24:41 -0700 Date: Mon, 1 May 2000 12:24:40 -0700 (PDT) From: "Matt D. Robinson" To: Brian Hall cc: lkcd@oss.sgi.com Subject: Re: __locore_lmabegin ? In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Tom, isn't that an SGI specific variable for SGI's BIGMEM patch? --Matt On Mon, 1 May 2000, Brian Hall wrote: |>__locore_lmabegin |> |>What does this kernel symbol refer to for i386? I need to figure out what the |>closest Alpha equivalent is. |> |>The LKCD lcrash code in question (lookup for the address of this symbol): |> |> |>if (!(sp = kl_lkup_symname("__locore_lmabegin"))) { |> return(3); |>} |>KL_LOCORE_ADDR = sp->s_addr & KL_PAGE_MASK; From owner-lkcd@oss.sgi.com Mon May 1 12:34:59 2000 Received: by oss.sgi.com id ; Mon, 1 May 2000 12:34:39 -0700 Received: from zmamail03.zma.compaq.com ([161.114.64.103]:11794 "HELO zmamail03.zma.compaq.com") by oss.sgi.com with SMTP id ; Mon, 1 May 2000 12:34:29 -0700 Received: by zmamail03.zma.compaq.com (Postfix, from userid 12345) id 1BFDF24AE; Mon, 1 May 2000 15:34:23 -0400 (EDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by zmamail03.zma.compaq.com (Postfix) with SMTP id 1F3FE26E6; Mon, 1 May 2000 15:34:22 -0400 (EDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA28200; Mon, 1 May 2000 13:34:21 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA17635; Mon, 1 May 2000 13:34:20 -0600 Received: by compaq.com (sSMTP sendmail emulation); Mon, 1 May 2000 13:34:00 -0600 Content-Length: 3627 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 In-Reply-To: Date: Mon, 01 May 2000 13:34:00 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: "Matt D. Robinson" Subject: Re: Alpha lcrash initialization problem - can't access memory Cc: lkcd@oss.sgi.com, Tom Morano Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing On 01-May-2000 Matt D. Robinson wrote: > On Mon, 1 May 2000, Brian Hall wrote: >|>OK, after wasting some time with memory debugging libraries (dmalloc & >|>electric >|>fence), I realized kl_init_kern_info() is failing. I'm tracking down the >|>reason(s) why now. At least this makes sense- I probably haven't yet made >|>all >|>the necessary modifications for it to work on Alpha. >|> >|>Hmm, looks like a problem while in cmpreadmem (set cmp_debug=1): >|> >|>==================================================== >|>(gdb) run >|>Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash >|>./map.0 ./vmdump.0 >|>map = ./map.0, vmdump = ./vmdump.0, outfile = stdout >|> >|>Please wait...__cmploadindex(): cannot open() index file [2]: No such file >|>or >|>directory! >|>cmppindexcreate(): Number of pages in dump: 12288 >|>cmppindexcreate(): Page size in dump: 8192 >|>.. >|>Attempting to save index "index.10" ... complete. >|>...............cmpreadmem(): 8 bytes, 0x584c81 (just a page) >|>__cmppread(): initiating search for 0x584c81 >|>__cmppindex(): hash = 16472, addr = 0x584c81 >|>__cmppindex(): addr = 0x584c81, tmpptr->addr = 0x584000 >|>__cmppread(): page not found! (0x584c81) >|> >|>Program exited with code 01. >|>==================================================== >|> >|>Now, I have set the system to compress the dump on the swap partition >|>(DUMP_COMPRESS_PAGES=1); however when I look at the generated vmdump.0 file >|>is >|>does not look like it has been RLE encoded- there are long strings of 0x00 >|>repeating throughout the file. The vmdump.0 is ~69MB, while the real memory >|>size of the machine is 96MB (#pages*pagesize is also 96MB). > > Looks like it's probably getting compressed based on your data. > >|>Looks to me like either the dump is getting truncated, or the dump >|>compression >|>routine has a problem? (DUMP_LEVEL=4, swap partition is 512MB) I am guessing >|>that sp->s_addr is correct (this is the 0x584c81 being searched for in >|>cmpreadmem). > > The sp->s_addr is probably correct, but the real question is whether > or not the page is being read in properly. You'll need to find out if > that page is actually in memory or not (0x584000). You can turn on > some additional debugging in kl_cmp.c to print out page information from > the page header as the information is dumped out. Set cmp_debug to 2 > and tell me what it says. If you see all the pages being read, then > you've got what you need from the dump. If not, then there's a problem > with compression. With cmp_debug=2, here are the lines of interest: __cmppindexcreate(): addr = 0x584000, hash = 16472 , counter = 706 , cur_addr = 0x393f5a __cmppindexcreate(): addr = 0x586000, hash = 24664 , counter = 707 , cur_addr = 0x3951df The "counter" value ranges from 0 to 12287, so that is all the pages. Looks like the page is there, and the code fails to fetch it? Bad dog! grrr... > I'd typically say it isn't compression as I've used that code for 32 and > 64 bit systems in the past (for a few years). Still, there could be a > problem. > >|>Is the vmdump.0 file supposed to be compressed on the disk if >|>DUMP_COMPRESS_PAGES=1 ? > > Yes ... /sbin/vmdump config sets /proc/sys/vmdump/dump_compress_pages > to the value based on DUMP_COMPRESS_PAGES. # more /proc/sys/vmdump/dump_compress_pages 1 Viewing vmdump.0 with mc's hex text viewer, it sure doesn't look compressed to me. Could you send me a vmdump file that is compressed, so I can see what is looks like? (maybe just the first 100KB or so) -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Tue May 2 10:33:41 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 10:33:32 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:51817 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 10:33:17 -0700 Received: from loco.csd.sgi.com (loco.csd.sgi.com [150.166.1.62]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id KAA13023 for ; Tue, 2 May 2000 10:28:29 -0700 (PDT) mail_from (tjm@sgi.com) Received: from sgi.com (localhost.csd.sgi.com [127.0.0.1]) by loco.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id KAA76781; Tue, 2 May 2000 10:31:39 -0700 (PDT) Message-ID: <390F10FA.B444FD2F@sgi.com> Date: Tue, 02 May 2000 10:31:38 -0700 From: Tom Morano X-Mailer: Mozilla 4.61C-SGI [en] (X11; I; IRIX 6.5 IP22) X-Accept-Language: en MIME-Version: 1.0 To: "Matt D. Robinson" CC: Brian Hall , lkcd@oss.sgi.com, tjm@sgi.com Subject: Re: __locore_lmabegin ? References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1278 Lines: 35 "Matt D. Robinson" wrote: > > Tom, isn't that an SGI specific variable for SGI's BIGMEM patch? Yes it is. What it's there for is that some low level code gets stuffed in locore memory and we have to determine the start of locore in order to make the virtual to physical memory translation (so that we can disassemble the code). This had to be done to fix some stack trace problems with certain kernels. Brian, from your point of view, you need to make sure that you can translate (in the kl_virtop() function) any kernel address (and possibly user address) into a physical address. As for the reference to __locore_lmabegin, you can remove it, replace it with something that makes sense from an Alpha perspective, or just ignore it (failure is OK), since we don't currently check the return value from kl_init_kern_info() (perhaps we should?). Tom > > --Matt > > On Mon, 1 May 2000, Brian Hall wrote: > |>__locore_lmabegin > |> > |>What does this kernel symbol refer to for i386? I need to figure out what the > |>closest Alpha equivalent is. > |> > |>The LKCD lcrash code in question (lookup for the address of this symbol): > |> > |> > |>if (!(sp = kl_lkup_symname("__locore_lmabegin"))) { > |> return(3); > |>} > |>KL_LOCORE_ADDR = sp->s_addr & KL_PAGE_MASK; From owner-lkcd@oss.sgi.com Tue May 2 11:14:02 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 11:13:52 -0700 Received: from zmamail03.zma.compaq.com ([161.114.64.103]:59399 "HELO zmamail03.zma.compaq.com") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 11:13:31 -0700 Received: by zmamail03.zma.compaq.com (Postfix, from userid 12345) id AD16425E0; Tue, 2 May 2000 14:13:25 -0400 (EDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by zmamail03.zma.compaq.com (Postfix) with SMTP id CF85A263D; Tue, 2 May 2000 14:13:24 -0400 (EDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA32292; Tue, 2 May 2000 12:13:24 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA31648; Tue, 2 May 2000 12:13:23 -0600 Received: by compaq.com (sSMTP sendmail emulation); Tue, 2 May 2000 12:12:56 -0600 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 In-Reply-To: Date: Tue, 02 May 2000 12:12:56 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: "Matt D. Robinson" Subject: Re: Alpha lcrash initialization problem - can't access memory Cc: lkcd@oss.sgi.com, Tom Morano Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 334 Lines: 8 Okay, tracing things back, I see that page_index[]->next is always NULL. I assume this is incorrect, since it is used in __cmppindex. Any idea what could cause this? The index file appears to be read in correctly, possibly it is being generated with incorrect values or bad data? -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Tue May 2 11:17:31 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 11:17:11 -0700 Received: from mail.turbolinux.com ([38.170.88.25]:62983 "EHLO mail.turbolinux.com") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 11:16:55 -0700 Received: from localhost (yakker@localhost) by mail.turbolinux.com (8.9.3/8.9.3) with ESMTP id LAA18815; Tue, 2 May 2000 11:16:39 -0700 Date: Tue, 2 May 2000 11:16:39 -0700 (PDT) From: "Matt D. Robinson" To: Brian Hall cc: lkcd@oss.sgi.com, Tom Morano Subject: Re: Alpha lcrash initialization problem - can't access memory In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 647 Lines: 15 Off-hand, I can't tell you without looking at the dump and the code. Is this stuff on a system we can login to and look at? BTW, if you feel there's something wrong with the index, just make sure you remove the call to __cmpploadindex() in kl_cmp.c, so it isn't loaded. That shouldn't be the problem, though. --Matt On Tue, 2 May 2000, Brian Hall wrote: |>Okay, tracing things back, I see that page_index[]->next is always NULL. I |>assume this is incorrect, since it is used in __cmppindex. Any idea what could |>cause this? The index file appears to be read in correctly, possibly it is |>being generated with incorrect values or bad data? From owner-lkcd@oss.sgi.com Tue May 2 14:47:45 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 14:47:35 -0700 Received: from zmamail04.zma.compaq.com ([161.114.64.104]:26118 "HELO zmamail04.zma.compaq.com") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 14:47:25 -0700 Received: by zmamail04.zma.compaq.com (Postfix, from userid 12345) id 991B33FE; Tue, 2 May 2000 17:47:19 -0400 (EDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by zmamail04.zma.compaq.com (Postfix) with SMTP id B8D657C9; Tue, 2 May 2000 17:47:18 -0400 (EDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA00534; Tue, 2 May 2000 15:47:17 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA18125; Tue, 2 May 2000 15:47:17 -0600 Received: by compaq.com (sSMTP sendmail emulation); Tue, 2 May 2000 15:46:49 -0600 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 In-Reply-To: Date: Tue, 02 May 2000 15:46:49 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: "Matt D. Robinson" Subject: Re: Alpha lcrash initialization problem - can't access memory Cc: Tom Morano , lkcd@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 5239 Lines: 136 I've commented out the "__locore_lmabegin" lookup in kl_kern.c, and I replaced the code in kl_virtop with this (not sure if this is correct, but it seems to work better in the second case): #include kaddr_t kl_virtop(kaddr_t vaddr, void *m) { /* DEBUG */ return virt_to_phys (vaddr); } Running lcrash against the current system gives: =============================================================== (gdb) run Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash map = /boot/System.map, vmdump = /dev/mem, outfile = stdout Please wait................Registering command record #0 b = 539066368 p->blklist = 20213fe8 Program received signal SIGSEGV, Segmentation fault. 0x12001b150 in enqueue (list=0x120213fe8, new=0x20218000) at alloc.c:57 57 alloc.c: No such file or directory. (gdb) where full #0 0x12001b150 in enqueue (list=0x120213fe8, new=0x20218000) at alloc.c:57 head = (element_t *) 0x0 #1 0x12001c194 in get_page (index=4) at alloc.c:438 i = 0 b = (block_t *) 0x20218000 page = (void *) 0x12001c664 p = (page_t *) 0x120213fd0 #2 0x12001cb0c in alloc_block (size=80, flag=2, ra=0x1e) at alloc.c:695 i = 4 j = 1 blk = (void *) 0xfffffffff7f7ffdb p = (page_t *) 0x11ffffa80 b = (block_t *) 0x0 #3 0x120003908 in kl_block_alloc_func (size=80, flag=2, ra=0x1e) at util.c:279 b = (void *) 0x12002bbd4 #4 0x12002bc2c in _kl_alloc_block (size=80, flags=2, ra=0x1e) at kl_alloc.c:22 blk = (void *) 0x120003d8c #5 0x120003dac in register_cmds (cmds=0x120144888) at command.c:17 i = 0 ret = 1 max_depth = 538955536 cmd_rec = (cmd_rec_t *) 0x0 #6 0x120002b68 in main (argc=1, argv=0x11ffffb68) at main.c:208 i = 0 c = 512 errflg = 0 =============================================================== And running lcrash against map.0 and vmdump.0: =============================================================== (gdb) run map.0 vmdump.0 Starting program: /CDR_UPLOAD/hallb/linux-2.2.13-1.0.3/cmd/lcrash/./lcrash map.0 vmdump.0 map = map.0, vmdump = vmdump.0, outfile = stdout Please wait...Attempting to load previous index "index.10" ... complete. ...............cmpreadmem(): 8 bytes, 0x584c80 (just a page) __cmppread(): initiating search for 0x584c80 __cmppindex(): hash = 16472, addr = 0x584c80 __cmppindex(): addr = 0x584c80, tmpptr->addr = 0x584000 __cmppread(): found the page in the page index! 0x584000: 4725 -> 8192 COMPRESSED, writing 8192 bytes __cmppinsert(): Malloc occurred! [0] __cmppinsert(): Inserting page into cache! (0x584c80) [0]... __cmppget(): copying page of data (nbytes = 8, offset = 3200, in_addr = 0x584c80) __cmppread(): found the item in the hash table second time! cmpreadmem(): 8 bytes, 0x584c90 (just a page) __cmppread(): initiating search for 0x584c90 __cmppget(): copying page of data (nbytes = 8, offset = 3216, in_addr = 0x584c90) __cmppread(): found the item in the hash table! Registering command record #0 b = 540016640 p->blklist = 202fbf88 Program received signal SIGSEGV, Segmentation fault. 0x12001b150 in enqueue (list=0x1202fbf88, new=0x20300000) at alloc.c:57 57 alloc.c: No such file or directory. (gdb) where full #0 0x12001b150 in enqueue (list=0x1202fbf88, new=0x20300000) at alloc.c:57 head = (element_t *) 0x0 #1 0x12001c194 in get_page (index=4) at alloc.c:438 i = 0 b = (block_t *) 0x20300000 page = (void *) 0x12001c664 p = (page_t *) 0x1202fbf70 #2 0x12001cb0c in alloc_block (size=80, flag=2, ra=0x1e) at alloc.c:695 i = 4 j = 1 blk = (void *) 0xfffffffff7f7ffdb p = (page_t *) 0x11ffffb10 b = (block_t *) 0x0 #3 0x120003908 in kl_block_alloc_func (size=80, flag=2, ra=0x1e) at util.c:279 b = (void *) 0x12002bbd4 #4 0x12002bc2c in _kl_alloc_block (size=80, flags=2, ra=0x1e) at kl_alloc.c:22 blk = (void *) 0x120003d8c #5 0x120003dac in register_cmds (cmds=0x120144888) at command.c:17 i = 0 ret = 1 max_depth = 539896016 cmd_rec = (cmd_rec_t *) 0x0 #6 0x120002b68 in main (argc=3, argv=0x11ffffbf8) at main.c:208 i = 0 c = 512 errflg = 0 =============================================================== At least I'm finding a page now... On 02-May-2000 Matt D. Robinson wrote: > Off-hand, I can't tell you without looking at the dump and the code. > Is this stuff on a system we can login to and look at? BTW, if you > feel there's something wrong with the index, just make sure you > remove the call to __cmpploadindex() in kl_cmp.c, so it isn't loaded. > > That shouldn't be the problem, though. > > --Matt > > On Tue, 2 May 2000, Brian Hall wrote: >|>Okay, tracing things back, I see that page_index[]->next is always NULL. I >|>assume this is incorrect, since it is used in __cmppindex. Any idea what >|>could >|>cause this? The index file appears to be read in correctly, possibly it is >|>being generated with incorrect values or bad data? -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Tue May 2 14:55:14 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 14:55:06 -0700 Received: from zmamail05.zma.compaq.com ([161.114.64.105]:10256 "HELO zmamail05.zma.compaq.com") by oss.sgi.com with SMTP id ; Tue, 2 May 2000 14:54:54 -0700 Received: by zmamail05.zma.compaq.com (Postfix, from userid 12345) id 731641A89; Tue, 2 May 2000 17:54:48 -0400 (EDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by zmamail05.zma.compaq.com (Postfix) with SMTP id 7FFB11A7F; Tue, 2 May 2000 17:54:46 -0400 (EDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA00577; Tue, 2 May 2000 15:54:43 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA26963; Tue, 2 May 2000 15:54:43 -0600 Received: by compaq.com (sSMTP sendmail emulation); Tue, 2 May 2000 15:54:15 -0600 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 In-Reply-To: <3908B22A.B11911EB@sgi.com> Date: Tue, 02 May 2000 15:54:15 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: Tom Morano Subject: Re: Alpha lcrash initialization problem - can't access memory Cc: "Matt D.Robinson" , lkcd@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1274 Lines: 32 So basically everywhere I see a GET_BLOCK(whatever, 4, &something), I should change the 4 to an 8? On 27-Apr-2000 Tom Morano wrote: > The parameters to init_liballoc() are OK. Based on this, I would guess that > some memory is getting stomped on in or below the kl_init_kern_info() > function > call. You might check the block of memory causing the SEGV after returning > from the init_liballoc() call and before the kl_init_kern_info() call. See if > it > looks OK at that point (I would guess the contents of this memory is change > by > the time you get to register_cmds()). If that's the case, then walk through > the > kl_init_kern_info() function and see where the memory contents changes. From > looking at the kl_init_kern_info() function, I can't see where the problem > might > occur (it basically just does symbol lookups and reads in the contents of > memory > into some local variables). Since the Alpha is 64 bit, I assume that the > amount > of > memory being read in for these values is 8 bytes instead of 4 (and that the > local > variables, NUM_PHYSPAGES and MEM_MAP have been changed also). Little things > like > that might be a factor. Anyway, that's how I would approach narrowing it > down. -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Tue May 2 15:35:06 2000 Received: by oss.sgi.com id ; Tue, 2 May 2000 15:34:56 -0700 Received: from deliverator.sgi.com ([204.94.214.10]:9046 "EHLO deliverator.sgi.com") by oss.sgi.com with ESMTP id ; Tue, 2 May 2000 15:34:38 -0700 Received: from nodin.corp.sgi.com (fddi-nodin.corp.sgi.com [198.29.75.193]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via ESMTP id PAA29993 for ; Tue, 2 May 2000 15:29:50 -0700 (PDT) mail_from (tjm@sgi.com) Received: from loco.csd.sgi.com (loco.csd.sgi.com [150.166.1.62]) by nodin.corp.sgi.com (980427.SGI.8.8.8/980728.SGI.AUTOCF) via ESMTP id PAA02902 for ; Tue, 2 May 2000 15:32:52 -0700 (PDT) Received: from sgi.com (localhost.csd.sgi.com [127.0.0.1]) by loco.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id PAA81747; Tue, 2 May 2000 15:30:05 -0700 (PDT) Message-ID: <390F56EB.EF6E51A7@sgi.com> Date: Tue, 02 May 2000 15:30:03 -0700 From: Tom Morano X-Mailer: Mozilla 4.61C-SGI [en] (X11; I; IRIX 6.5 IP22) X-Accept-Language: en MIME-Version: 1.0 To: Brian Hall CC: "Matt D.Robinson" , lkcd@oss.sgi.com Subject: Re: Alpha lcrash initialization problem - can't access memory References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 1919 Lines: 50 Brian Hall wrote: > > So basically everywhere I see a GET_BLOCK(whatever, 4, &something), I should > change the 4 to an 8? If you are loading an 8 byte value from kernel memory (say a pointer value) AND you are loading it into an eight byte local variable (buffer), then yes. But it might be possible that you will have cases where the size will stay the same (say if you are loading an int value that is 4 bytes in both 32 and 64-bit world). BTW, all of these kinds of calls (reading in pointer values via GET_BLOCK()) should be located in arch specific modules. If you find one that isn't (quite possible), please let me know and I'll make sure it gets taken care of. Thanks, Tom > > On 27-Apr-2000 Tom Morano wrote: > > The parameters to init_liballoc() are OK. Based on this, I would guess that > > some memory is getting stomped on in or below the kl_init_kern_info() > > function > > call. You might check the block of memory causing the SEGV after returning > > from the init_liballoc() call and before the kl_init_kern_info() call. See if > > it > > looks OK at that point (I would guess the contents of this memory is change > > by > > the time you get to register_cmds()). If that's the case, then walk through > > the > > kl_init_kern_info() function and see where the memory contents changes. From > > looking at the kl_init_kern_info() function, I can't see where the problem > > might > > occur (it basically just does symbol lookups and reads in the contents of > > memory > > into some local variables). Since the Alpha is 64 bit, I assume that the > > amount > > of > > memory being read in for these values is 8 bytes instead of 4 (and that the > > local > > variables, NUM_PHYSPAGES and MEM_MAP have been changed also). Little things > > like > > that might be a factor. Anyway, that's how I would approach narrowing it > > down. > > -- > http://www.bigfoot.com/~brihall > Linux Consultant From owner-lkcd@oss.sgi.com Wed May 3 12:28:28 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 12:28:19 -0700 Received: from ztxmail03.ztx.compaq.com ([161.114.1.207]:43783 "HELO ztxmail03.ztx.compaq.com") by oss.sgi.com with SMTP id ; Wed, 3 May 2000 12:28:05 -0700 Received: by ztxmail03.ztx.compaq.com (Postfix, from userid 12345) id C8B8D216; Wed, 3 May 2000 14:27:53 -0500 (CDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by ztxmail03.ztx.compaq.com (Postfix) with SMTP id 4585916D for ; Wed, 3 May 2000 14:27:53 -0500 (CDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA02098; Wed, 3 May 2000 13:27:52 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA24103; Wed, 3 May 2000 13:27:51 -0600 Received: by compaq.com (sSMTP sendmail emulation); Wed, 3 May 2000 13:27:18 -0600 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 Date: Wed, 03 May 2000 13:27:17 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: lkcd@oss.sgi.com Subject: Alpha lcrash - enqueue/alloc Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 3569 Lines: 106 Well, I haven't been able to figure how the free block list is supposed to be filled; tracing the code I don't see how it would work. It certainly doesn't as-is (alloc.c), surely because I have something setup wrong (with the blklist initialized to NULL in the code, enqueue tries to *blklist=element, which segfaults of course). Anyway, to make some progress, I modified alloc_block to always allocate memory as if size > CHUNKSZ all the time. This gave me a lcrash executable that would execute without segfaulting, although it is very very slow, and can't seem to access any useful information. Also, I'm having a magic number mismatch, I had to comment out the checks for that in do_report (don't know why this occurs yet, maybe my crash dump is bad- MCL's analyis program crash2.3 also doesn't work on my crash dump). Here is what I'm currently getting for lcrash -r map.0 vmdump.0 (it "hangs" after printing this): Attempting to load previous index "index.10" ... complete. cmpreadmem(): 8 bytes, 0x584c80 (just a page) __cmppread(): initiating search for 0x584c80 __cmppindex(): hash = 16472, addr = 0x584c80 __cmppindex(): addr = 0x584c80, tmpptr->addr = 0x584000 __cmppread(): found the page in the page index! 0x584000: 4725 -> 8192 COMPRESSED, writing 8192 bytes __cmppinsert(): Malloc occurred! [0] __cmppinsert(): Inserting page into cache! (0x584c80) [0]... __cmppget(): copying page of data (nbytes = 8, offset = 3200, in_addr = 0x584c80) __cmppread(): found the item in the hash table second time! cmpreadmem(): 8 bytes, 0x584c90 (just a page) __cmppread(): initiating search for 0x584c90 __cmppget(): copying page of data (nbytes = 8, offset = 3216, in_addr = 0x584c90) __cmppread(): found the item in the hash table! ======================= LCRASH CORE FILE REPORT ======================= GENERATED ON: Wed May 3 13:24:59 2000 TIME OF CRASH: Tue Apr 18 15:08:24 2000 PANIC STRING: User created crash dump MAP: map.0 VMDUMP: vmdump.0 ================ COREFILE SUMMARY ================ The system died due to a software failure. =================== UTSNAME INFORMATION =================== sysname : Linux nodename : dhcp96-180.cxo.dec.com release : 2.2.13 version : #10 Tue Feb 8 16:06:15 MST 2000 machine : alpha domainname : gldulab =============== LOG BUFFER DUMP =============== cmpreadmem(): reading 8192 bytes, 0x587030 (a new page) __cmppread(): initiating search for 0x587030 __cmppindex(): hash = 28760, addr = 0x587030 __cmppread(): page not found! (0x587030) cmpreadmem(): reading 8192 bytes, 0x589030 (leftovers) __cmppread(): initiating search for 0x589030 __cmppindex(): hash = 36952, addr = 0x589030 __cmppread(): page not found! (0x589030) ==================== CURRENT SYSTEM TASKS ==================== ADDR UID PID PPID STATE PRI FLAGS MM NAME ============================================================================== cmpreadmem(): 1064 bytes, 0x4e8000 (just a page) __cmppread(): initiating search for 0x4e8000 __cmppindex(): hash = 32846, addr = 0x4e8000 __cmppindex(): addr = 0x4e8000, tmpptr->addr = 0x4e8000 __cmppread(): found the page in the page index! 0x4e8000: 96 -> 8192 COMPRESSED, writing 8192 bytes __cmppinsert(): Malloc occurred! [1] __cmppinsert(): Inserting page into cache! (0x4e8000) [1]... __cmppget(): copying page of data (nbytes = 1064, offset = 0, in_addr = 0x4e8000) __cmppread(): found the item in the hash table second time! -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Wed May 3 12:32:27 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 12:32:09 -0700 Received: from mail.turbolinux.com ([38.170.88.25]:57610 "EHLO mail.turbolinux.com") by oss.sgi.com with ESMTP id ; Wed, 3 May 2000 12:31:57 -0700 Received: from localhost (yakker@localhost) by mail.turbolinux.com (8.9.3/8.9.3) with ESMTP id MAA04869; Wed, 3 May 2000 12:31:44 -0700 Date: Wed, 3 May 2000 12:31:40 -0700 (PDT) From: "Matt D. Robinson" To: Brian Hall cc: lkcd@oss.sgi.com Subject: Re: Alpha lcrash - enqueue/alloc In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 4101 Lines: 116 Hey, Brian. Send me your latest tarball (source), and let me know where I can ftp your core image from. I've got an alpha system here I can test things on ... I'll try to have an answer for you by tomorrow, assuming the world doesn't blow up between now and then. :) --Matt On Wed, 3 May 2000, Brian Hall wrote: |>Well, I haven't been able to figure how the free block list is supposed to be |>filled; tracing the code I don't see how it would work. It certainly doesn't |>as-is (alloc.c), surely because I have something setup wrong (with the blklist |>initialized to NULL in the code, enqueue tries to *blklist=element, which |>segfaults of course). Anyway, to make some progress, I modified alloc_block to |>always allocate memory as if size > CHUNKSZ all the time. This gave me a lcrash |>executable that would execute without segfaulting, although it is very very |>slow, and can't seem to access any useful information. Also, I'm having a magic |>number mismatch, I had to comment out the checks for that in do_report (don't |>know why this occurs yet, maybe my crash dump is bad- MCL's analyis program |>crash2.3 also doesn't work on my crash dump). |> |>Here is what I'm currently getting for lcrash -r map.0 vmdump.0 (it "hangs" |>after printing this): |> |>Attempting to load previous index "index.10" ... complete. |>cmpreadmem(): 8 bytes, 0x584c80 (just a page) |>__cmppread(): initiating search for 0x584c80 |>__cmppindex(): hash = 16472, addr = 0x584c80 |>__cmppindex(): addr = 0x584c80, tmpptr->addr = 0x584000 |>__cmppread(): found the page in the page index! |>0x584000: 4725 -> 8192 COMPRESSED, writing 8192 bytes |>__cmppinsert(): Malloc occurred! [0] |>__cmppinsert(): Inserting page into cache! (0x584c80) [0]... |>__cmppget(): copying page of data (nbytes = 8, offset = 3200, in_addr = |>0x584c80) |>__cmppread(): found the item in the hash table second time! |>cmpreadmem(): 8 bytes, 0x584c90 (just a page) |>__cmppread(): initiating search for 0x584c90 |>__cmppget(): copying page of data (nbytes = 8, offset = 3216, in_addr = |>0x584c90) |>__cmppread(): found the item in the hash table! |>======================= |>LCRASH CORE FILE REPORT |>======================= |> |>GENERATED ON: |> Wed May 3 13:24:59 2000 |> |> |>TIME OF CRASH: |> Tue Apr 18 15:08:24 2000 |> |> |>PANIC STRING: |> User created crash dump |> |>MAP: |> map.0 |> |>VMDUMP: |> vmdump.0 |> |>================ |>COREFILE SUMMARY |>================ |> |> The system died due to a software failure. |> |>=================== |>UTSNAME INFORMATION |>=================== |> |> sysname : Linux |> nodename : dhcp96-180.cxo.dec.com |> release : 2.2.13 |> version : #10 Tue Feb 8 16:06:15 MST 2000 |> machine : alpha |>domainname : gldulab |> |>=============== |>LOG BUFFER DUMP |>=============== |> |>cmpreadmem(): reading 8192 bytes, 0x587030 (a new page) |>__cmppread(): initiating search for 0x587030 |>__cmppindex(): hash = 28760, addr = 0x587030 |>__cmppread(): page not found! (0x587030) |>cmpreadmem(): reading 8192 bytes, 0x589030 (leftovers) |>__cmppread(): initiating search for 0x589030 |>__cmppindex(): hash = 36952, addr = 0x589030 |>__cmppread(): page not found! (0x589030) |> |> |>==================== |>CURRENT SYSTEM TASKS |>==================== |> |> ADDR UID PID PPID STATE PRI FLAGS MM |>NAME |>============================================================================== |>cmpreadmem(): 1064 bytes, 0x4e8000 (just a page) |>__cmppread(): initiating search for 0x4e8000 |>__cmppindex(): hash = 32846, addr = 0x4e8000 |>__cmppindex(): addr = 0x4e8000, tmpptr->addr = 0x4e8000 |>__cmppread(): found the page in the page index! |>0x4e8000: 96 -> 8192 COMPRESSED, writing 8192 bytes |>__cmppinsert(): Malloc occurred! [1] |>__cmppinsert(): Inserting page into cache! (0x4e8000) [1]... |>__cmppget(): copying page of data (nbytes = 1064, offset = 0, in_addr = |>0x4e8000) |>__cmppread(): found the item in the hash table second time! |> |>-- |>http://www.bigfoot.com/~brihall |>Linux Consultant |> From owner-lkcd@oss.sgi.com Wed May 3 13:27:08 2000 Received: by oss.sgi.com id ; Wed, 3 May 2000 13:26:58 -0700 Received: from ztxmail05.ztx.compaq.com ([161.114.1.209]:59402 "HELO ztxmail05.ztx.compaq.com") by oss.sgi.com with SMTP id ; Wed, 3 May 2000 13:26:45 -0700 Received: by ztxmail05.ztx.compaq.com (Postfix, from userid 12345) id B28A6B9C; Wed, 3 May 2000 15:26:39 -0500 (CDT) Received: from cxo3ns.cxo.dec.com (cxo3ns.cxo.dec.com [16.63.0.10]) by ztxmail05.ztx.compaq.com (Postfix) with SMTP id 2152EBDB for ; Wed, 3 May 2000 15:26:39 -0500 (CDT) Received: from brownfur.cxo.dec.com by cxo3ns.cxo.dec.com; (5.65v4.0/1.1.8.2/11Apr96-1001AM) id AA02869; Wed, 3 May 2000 14:26:38 -0600 Received: from dhcp32-218.cxo.dec.com by brownfur.cxo.dec.com (5.65v4.0/1.1.10.5/17Feb98-0753AM) id AA25346; Wed, 3 May 2000 14:26:37 -0600 Received: by compaq.com (sSMTP sendmail emulation); Wed, 3 May 2000 14:26:03 -0600 Message-Id: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Mime-Version: 1.0 Date: Wed, 03 May 2000 14:26:03 -0600 (MDT) Reply-To: Brian Hall From: Brian Hall To: lkcd@oss.sgi.com Subject: Alpha LKCD goes on vacation Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 360 Lines: 8 Just FYI for those interested, I am leaving for my vacation about 10AM Mountain Time next Friday, 5/12 (I'll be in briefly that morning), and I'll be back in the office on 5/30. You can reach me at brihall@bigfoot.com during that time, although I won't have access to my source code (I'll be in Virginia). -- http://www.bigfoot.com/~brihall Linux Consultant From owner-lkcd@oss.sgi.com Sat May 20 08:42:02 2000 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KFg2110530 for lkcd-outgoing; Sat, 20 May 2000 08:42:02 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-lkcd@oss.sgi.com using -f Received: from localhost (mail@localhost) by oss.sgi.com (8.10.1/8.10.1) with SMTP id e4KFeNb10469; Sat, 20 May 2000 08:40:23 -0700 X-Authentication-Warning: oss.sgi.com: mail owned process doing -bs Received: by oss.sgi.com (bulk_mailer v1.12); Sat, 20 May 2000 08:40:23 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KFeNI10463 for oss-projects-outgoing; Sat, 20 May 2000 08:40:23 -0700 X-Authentication-Warning: oss.sgi.com: majordomo set sender to owner-oss-projects@oss.sgi.com using -f Received: (from cattelan@localhost) by oss.sgi.com (8.10.1/8.10.1) id e4KFeNB10461 for oss-projects@oss.sgi.com; Sat, 20 May 2000 08:40:23 -0700 Date: Sat, 20 May 2000 08:40:23 -0700 From: Russell Cattelan Message-Id: <200005201540.e4KFeNB10461@oss.sgi.com> To: oss-projects@oss.sgi.com Sender: owner-lkcd@oss.sgi.com Precedence: bulk Content-Length: 19 Lines: 1 test please ignore From owner-lkcd@oss.sgi.com Wed May 31 14:11:06 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 14:10:56 -0700 Received: from jerry.pcisys.net ([207.76.102.251]:3003 "EHLO jerry.pcisys.net") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 14:10:34 -0700 Received: from localhost.localdomain (cosdsl184.pci2.cos.pcisys.net [216.229.43.184]) by jerry.pcisys.net (8.9.3/8.9.3) with ESMTP id QAA08487; Wed, 31 May 2000 16:10:10 -0600 (MDT) Message-ID: X-Mailer: XFMail 1.4.4 on Linux X-Priority: 3 (Normal) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit MIME-Version: 1.0 In-Reply-To: Date: Wed, 31 May 2000 16:10:14 -0600 (MDT) Reply-To: Brian Hall Organization: Compaq From: Brian Hall To: Tom Morano , yakker@turbolinux.com, lkcd@oss.sgi.com Subject: RE: FW: Re: Home from vacation Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 4761 Lines: 99 Local consensus is to stick with 2.2 for now. I think I have done most of the changes necessary in my 2.2 tree to work with 64 bit already, since I can successfully (though possibly not correctly, still unverified) create a dump. My problem has been getting lcrash to work. OK, I've gotten the 2.2 and 2.3 trees from SourceForge. I've updated my tree to match as closely as possible, and brought the relevant arch changes over to the Alpha tree also. However, I'm having a new problem with the build in libklib. The new signal handler stuff, kl_signal.c, can't find a definition for "greg_t", and neither can I. Doesn't seem to be defined in the 2.2 or 2.3 cvs trees. [root@localhost libklib]# mm /bin/rm -f include/asm (cd include ; /bin/ln -s asm-alpha asm; cd ..) cc -gstabs -D__KERNEL__ -I/usr/src/linux/include -I. -Iinclude -c -o kl_signal.o kl_signal.c kl_signal.c: In function `klib_sig_handler': kl_signal.c:70: `greg_t' undeclared (first use in this function) kl_signal.c:70: (Each undeclared identifier is reported only once kl_signal.c:70: for each function it appears in.) kl_signal.c:70: `gregs' undeclared (first use in this function) kl_signal.c:70: structure has no member named `gregs' kl_signal.c:71: parse error before `esi' kl_signal.c:73: `esi' undeclared (first use in this function) kl_signal.c:73: `ESI' undeclared (first use in this function) kl_signal.c:74: `esp' undeclared (first use in this function) kl_signal.c:74: `ESP' undeclared (first use in this function) kl_signal.c:76: `badaddr' undeclared (first use in this function) kl_signal.c:76: parse error before `sip' make: *** [kl_signal.o] Error 1 Also, I am having some troubles with my ISP. Yesterday, I discovered that I could no longer get a second DHCP IP from them. I sent an email about this, and hopefully I hear why soon. In the meantime, I switched the AS200 from the hub to the network switch, placing it inside my firewall so I can work on it. On 30-May-2000 Brian Hall wrote: > > -----FW: <392DA260.ABBA8D31@sgi.com>----- > > Date: Thu, 25 May 2000 15:00:00 -0700 > Sender: tjm@sgi.com > From: Tom Morano > To: Brian Hall > Subject: Re: Home from vacation > Cc: "Gilford, Kevin" , > Matt D. Robinson , tjm@sgi.com > > Brian Hall wrote: >> >> Got back home today from Virginia. Found the AS200 offline (figures); >> restarted >> the network script. It is back online at 208.202.104.224. Any breakthroughs >> while I was gone? > > Hi Brian, > > I've been working on getting the 2.3 version of lcrash up and running > on ia64 (one of my top priorities). Although this does not directly > help you with your Alpha port, both systems have 64-bit architectures > and many of the issues you've faced are similar to ones that I've had > to deal with. I have a version of lcrash that is able to start up on a > live ia64 system (using /boot/System.map and /dev/mem), dump memory, > generate system status output, list active tasks, display kernel data > type information, etc. In other words, I have most of the basic > functionality working. In order to accomplish this, I had to move some > code around and address numerous conflicts and data type casting > issues. I've checked all these changes into our LKCD source repository > on SourceForge (project lkcd). You are welcome to pick them up and > give them a try (after converting the arch/ia64 stuff to arch/Alpha). > > If you are not able to go to 2.3 then I would suggest bringing your > lcrash source base more up-to-date with the 2.2 tree on SourceForge. > You started your porting effort early on and there have been a number > of new features added and bugs fixed since then. I haven't made any > 32-bit/64-bit changes to the 2.2 tree, so if you must stay at that > level, then some additional work will be necessary. I guess it depends > on which version of the kernel you'll be running on the Alpha systems. > In either case, having you work off a more up-to-date source base will > allow me to merge your changes in more easily. > > As far as the kernel portion of the LKCD project goes, that is > something that Matt has been looking into. For the most part, lcrash > is independent from the kernel. There are a couple of kernel changes > between 2.2 and 2.3 (names of kernel variables, pointer vs. pointer to > pointer, etc.) that had to be addressed in lcrash. The key is that > lcrash has to be compiled using the same header and configuration > files as the kernel. This is to ensure that the contents of memory > ligns up properly with kernel data type definitions. > > Let me know how you plan to proceed (2.2 or 2.3) and I'll see what > I can do to help. > > Thanks, > > Tom -- From owner-lkcd@oss.sgi.com Wed May 31 15:10:26 2000 Received: by oss.sgi.com id ; Wed, 31 May 2000 15:10:06 -0700 Received: from pneumatic-tube.sgi.com ([204.94.214.22]:58171 "EHLO pneumatic-tube.sgi.com") by oss.sgi.com with ESMTP id ; Wed, 31 May 2000 15:09:52 -0700 Received: from loco.csd.sgi.com (loco.csd.sgi.com [150.166.1.62]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via ESMTP id QAA07571 for ; Wed, 31 May 2000 16:14:39 -0700 (PDT) mail_from (tjm@sgi.com) Received: from sgi.com (localhost.csd.sgi.com [127.0.0.1]) by loco.csd.sgi.com (980427.SGI.8.8.8/970903.SGI.AUTOCF) via ESMTP id QAA03240; Wed, 31 May 2000 16:08:20 -0700 (PDT) Message-ID: <39359B62.F543E00B@sgi.com> Date: Wed, 31 May 2000 16:08:18 -0700 From: Tom Morano X-Mailer: Mozilla 4.61C-SGI [en] (X11; I; IRIX 6.5 IP22) X-Accept-Language: en MIME-Version: 1.0 To: Brian Hall CC: yakker@turbolinux.com, lkcd@oss.sgi.com Subject: Re: FW: Re: Home from vacation References: Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-lkcd@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;lkcd-outgoing Content-Length: 2920 Lines: 55 Brian Hall wrote: > > Local consensus is to stick with 2.2 for now. I think I have done most of the > changes necessary in my 2.2 tree to work with 64 bit already, since I can > successfully (though possibly not correctly, still unverified) create a dump. > My problem has been getting lcrash to work. > > OK, I've gotten the 2.2 and 2.3 trees from SourceForge. I've updated my tree to > match as closely as possible, and brought the relevant arch changes over to > the Alpha tree also. However, I'm having a new problem with the build in > libklib. The new signal handler stuff, kl_signal.c, can't find a definition for > "greg_t", and neither can I. Doesn't seem to be defined in the 2.2 or 2.3 cvs > trees. > > [root@localhost libklib]# mm > /bin/rm -f include/asm > (cd include ; /bin/ln -s asm-alpha asm; cd ..) > cc -gstabs -D__KERNEL__ -I/usr/src/linux/include -I. -Iinclude -c -o > kl_signal.o kl_signal.c > kl_signal.c: In function `klib_sig_handler': > kl_signal.c:70: `greg_t' undeclared (first use in this function) > kl_signal.c:70: (Each undeclared identifier is reported only once > kl_signal.c:70: for each function it appears in.) > kl_signal.c:70: `gregs' undeclared (first use in this function) > kl_signal.c:70: structure has no member named `gregs' > kl_signal.c:71: parse error before `esi' > kl_signal.c:73: `esi' undeclared (first use in this function) > kl_signal.c:73: `ESI' undeclared (first use in this function) > kl_signal.c:74: `esp' undeclared (first use in this function) > kl_signal.c:74: `ESP' undeclared (first use in this function) > kl_signal.c:76: `badaddr' undeclared (first use in this function) > kl_signal.c:76: parse error before `sip' > make: *** [kl_signal.o] Error 1 > Brian, I had this same problem when working on ia64 stuff in the 2.3 tree. It's because the signal handler has i386 specific stuff in it. I got around this by moving the kl_signal.c module into the arch specific portion of the tree. I haven't rearranged the 2.2 tree to be 64 bit friendly yet (or non i386 architecture friendly for that matter), since I've been doing all my recent work in the 2.3 tree. I guess I'm going to have to do this soon. In the mean time, I would just comment out the portions of the code which are breaking your build (the references to esi, esp, etc.). Also, I found a problem with the alloc.c module that was really a pain to track down (you might have been bitten by this one already). The minimum size bucket is hard coded at 8 bytes. Since the buckets get strung on a doubly linked list, the prev pointer overshoots the end of the bucket (2 64-bit points are > 8 bytes). Just make the smallest bucket in the bucket_size[] array be 16 bytes. I'll let you know when I get my 2.3 arch related changes back ported to the 2.2 tree so you can stay in synch with what I have. Sorry, but it's sometimes a pain keeping two moving targets lined up with each other. :] Tom