Ashish,
You can check out our web site at www.mclinux.com for information.
Also, I believe there is a press release coming out today that will give some
details.
On the crash front we have a version 1 available from the web site. The crash
saving
is not as slick as SGI's but the crash analysis might be better.
I am working on a version 2 which is memory based. This may have been
discussed on this list
and was discussed on the linux-kernel list. A copy of the mail is attached.
regards,
Dave
--- Begin Message ---
Hello,
I have been working on a kernel crash dump that does not rely on the
disk subsystems during the crash. Instead, the crash is saved in memory at
crash time
and then saved to a file on the subsequent boot.
The save at crash time is accomplished by selecting pages that are
not [free, user anon, user shared, file page cache] and compressing them into
pages
that are above a certain address, a certain distance from the end of memory,
not locked, and are members of [free, user anon, user shared, file page cache].
A reboot is then requested with the option to preserve memory.
Early in the boot process, the non-contiguous pages containing the dump are
copied to
contiguous pages at the end of memory. Later in the boot process, they are
written to a
file and freed. On a 96M machine the size of the compressed dump was 4M.
Scratch memory is saved at boot time for crash dump use. I use about 2M for
this,
though smaller amounts can be tuned. This ensures that a dump can be taken even
with very low free memory conditions.
For example, here is a stack trace of a crash in interrupt context,
a case that can be difficult for disk based solutions:
crash> bt
PID: 286 TASK: c0b3a000 CPU: 0 COMMAND: "in.rlogind"
#0 [c0b3be90] crash_save_current_state at c011aed0
(c0b3a000,c08e4190,4000001,c0b3bee8,tulip_interrupt+0x2c)
#1 [c0b3bea4] panic+0xac at c011367c
(media_cap+0x1446,c08e4190,4000001,9,5a8)
#2 [c0b3bee8] tulip_interrupt+0x2c at c01bc820
(9,eth0_dev,c0b3bf44,irq_desc+0x90,9)
#3 [c0b3bf08] handle_IRQ_event+0x2d at c010a551
(9,c0b3bf44,c08e4190)
#4 [c0b3bf2c] do_8259A_IRQ+0x75 at c010a319
(9,c0b3bf44,c0b3bfbc,ret_from_intr,c0e68280)
#5 [c0b3bf3c] do_IRQ+0x23 at c010a653
(c0e68280,0,4,4,c0e68284)
#6 [c0b3bfbc] ret_from_intr at c0109634
(4,bfffc9a0,0,bfffc8a0,0)
#7 [bfffd224] system_call+0x34 at c0109598
For this test crash I set a flag with a system call which instructed the tulip
interrupt
handler to call panic().
Now the request for help. Some BIOS (Dell, NEC) clear memory on reboot
even when the flags to not test or to preserve are set. Others (HP) do not
clear
memory. Can someone point me to BIOS developers at Dell or Phoenix or other
manufacturers
so that I can lobby for a flag that I can pass to the BIOS so that it will
preserve
the contents of memory?
If anyone is interested in trying my code I'd be glad to make it available today
or tomorrow.
thanks
Dave
--- End Message ---
|