I've been testing LKCD release 4 on large memory systems and have run
into a few problematic areas. I thought I'd throw them to the winds to
stimulate some discussion.
Issue 1: It takes a LONG time to dump a system with 4Gb of memory
Even when using fairly fast disks and assuming something like 20 MB/sec
as an optimal transfer rate, you're going to be looking at several
minutes to write a crash dump file. In reality the throughput is much
less since the data is not being streamed to the drive; my average time
is hovering around ten minutes.
This is with RLE compression. If you turn on GZIP compression multiply
that time by a factor of three or four.
In this type of configuration writing an uncompressed (or
RLE-compressed) dump makes a lot more sense. The total elapsed time may
still be an issue.
Issue 2: It takes a LONG time to copy a crash dump for 4Gb of memory
Copying the crash dump can take significantly longer than it took to
write the crash dump in the first place. I've been seeing copy times
that approach (or exceed!) an hour.
Part of this is likely due to the relatively simple I/O loop in
kl_cmp.c. It may be advisable to consider using a mechanism similar to
the LKCD kernel code that writes the dump; i.e., use a large staging
buffer to collect the individual pages and write the buffer in one swell
foop.
An x86 system with 4096-byte pages requires somewhere around one million
pairs of write operations during the dump copy (a short write of about
26 bytes for the page header and up to 4096 bytes for the page data).
Reducing the number of discrete I/O operations should help quite a bit.
Here are a couple of thoughts I had:
1) Add another configurable parameter which tells lcrash the format in
which a crash dump should be saved separately from how the dump is
written by the kernel; e.g., DUMP_SAVE_COMPRESS. This would allow
deferring the expensive GZIP algorithm until after the system has
rebooted.
2) Allow the user to configure the system such that the dump copy would
proceed in parallel with system restart. Today the execution of
rc.sysinit stalls until lcrash finishes copying the crash dump. If the
crash dump was written in the swap partition this could be problematic.
One thought I had in this area would be to teach lcrash (and the
swapper) to co-operatively use the swap pages. Initially all pages
would be marked as unavailable, and as lcrash progresses through the
swap partition it would release those pages to the swapper. Since the
system load (and thus the need for swap space) (theoretically) increases
over time, this staged release of swap space should help avoid cases
where swap space was needed but lcrash was still busy copying a crash
dump.
Comments?
Tony Dziedzic
Storigen Systems, Inc.
|