hi
this mail covers two problems: the sync-reboot-data lost problem and
xfs_fsr kernel crash with fs corruptions.
i dont separte it, because it happens in the same test session.
my system is a k6-500 with ide drive and via chipset (udma enabled).
/dev/hda6 is xfs root.
/dev/hda4 is an ext2 root for runnig xfs_check.
i made my sync-reboot-data lost test again with the fix
(TAKE - fix delalloc data not getting flushed to disk (page_buf.c - 1.53,
page_buf_io.c - 1.51))
the test was:
cp -av /usr/src/linux/drivers/ drivers
diff -r -u /usr/src/linux/drivers/ drivers/ (no differs)
sync
sync
sync
hit the reset button.
after the reboot diff again.
the first 6-8 test succeeded.
i cycled the tests without clean shutdowns between (hmmm maybe one or two).
then i made only one sync. after reboot some files differ. same problem,
size is ok, but no extents. the number of differs are small.
i played with the numbers of sync and the time between sync and hitting
reset.
when reset is hit just after the sync is finished i got data lost.
waiting about 10-20s after the sync finished, everything is ok. regardless
the number of syncs.
after the first sync small diskactivity is there for about 10s.
2 times i check the fs with xfs_check (ext2 root), no errors.
then i got the idea to run xfs_fsr. the result was a kernel crash and fs
corruption (this is the first time a got problems with fsr):
kernel BUG at dcache.c:356!
Entering kdb (current=0xc7fbc000, pid 3) Panic: invalid operand
due to panic @ 0xc0141ec2
eax = 0x0000001c ebx = 0xc7c870e0 ecx = 0x00000000 edx = 0x00000000
esi = 0xc7c870c0 edi = 0xc61c3840 esp = 0xc7fbdf98 eip = 0xc0141ec2
ebp = 0xffffff3b xss = 0x00000018 xcs = 0x00000010 eflags = 0x00010292
xds = 0x00000018 xes = 0x00000018 origeax = 0xffffffff ®s = 0xc7fbdf64
kdb> bt
EBP EIP Function(args)
0xffffff3b 0xc0141ec2 prune_dcache+0x76 (0x2a)
kernel .text 0xc0100000 0xc0141e4c 0xc0141f98
0xc0142201 shrink_dcache_memory+0x21 (0x6, 0x4)
kernel .text 0xc0100000 0xc01421e0 0xc0142210
0xc012ad3b do_try_to_free_pages+0x5f (0x4, 0x0)
kernel .text 0xc0100000 0xc012acdc 0xc012ad58
0xc012adcb kswapd+0x73
kernel .text 0xc0100000 0xc012ad58 0xc012ae68
0xc0107457 kernel_thread+0x23
kernel .text 0xc0100000 0xc0107434 0xc0107464
kdb> reboot
i made some tests with xfs_fsr again. i will mail console capture only to
Steve Lord <lord@xxxxxxx> because it is very very long (11238 lines).
after that (xfs_repair eleminates the corruption) i made the
sync-reboot-data lost tests again, with the same results above.
a xfs_check at end of it reports no errors.
btw: after this torture the system is running well. i noticed no corruptions
of old files (ok, not tested very well). i can not imagine what happend with
ext2 or reiserfs. the fs was made about half an year ago.
and now i will test the change from Rajagopal Ananthanarayanan.
utz
|