Bugzilla – Bug 207
Filesystem corruption at boot, with data loss
Last modified: 2003-01-12 14:39:56 CST
Kernel 2.4.20 with XFS patch 2002-11-29 and Kernel 2.4.19 with XFS patch 1.2pre4 Debian 3.0r0 with security patches, XFree 4.2.1, Gnome 2 e KDE 3. Both the patches from oss.sgi.com. At reboot, I randomly get a filesystem corruption: the first time the system wasn't able to mount /, the other times it wasn't able to read getty, so I coudn't read the other messages. Trying to fix with xfs_repair from xfsutils 2.3.6 wasn't possible because of an internal error. With the older version shipped with Debian 3.0r0 I was able to repair it. After the repairs, there was random data loss (entire files with 0 length). The problem never happened before, with kernel 2.4.19 with XFS patch 2002-09-27 and previous versions. Hardware configuration: AMD K6 1.3GHz ASUS A7V motherboard (VIA KT133) NVidia GeForce 2 MX 400 Creative SB Live! Adaptech SCSI controller 2940U DEC 10/100 ethernet controller (tulip driver) IBM 40GB hard disk (checked, it works OK) Additional kernel modules: ALSA 0.9pre6 NVidia driver 1.0-3123 Packet writing 2.4.19-2 (only with 2.4.19) All software compiled with gcc 2.95.4 (the version shipped with Debian 3.0).
Subject: Re: New: Filesystem corruption at boot, with data loss On Tue, Jan 07, 2003 at 08:22:45AM -0800, bugzilla-daemon@oss.sgi.com wrote: > At reboot, I randomly get a filesystem corruption: the first time > the system wasn't able to mount /, the other times it wasn't able to > read getty, so I coudn't read the other messages. is the hd write-caching disabled? > After the repairs, there was random data loss (entire files with 0 > length). getting this for files that were open for writes is normal, in unrelated file it it not --cw
1) My CPU is an AMD Athlon, not K6, I wrote wrong. 2) I get this corruption at boot time: the system was correctly shut down previously, so data loss isn't normal! I don't know if HD write cache is enabled: I kept factory set up, so I think it's not. I will look, but I don't think it could be related, because with 2.4.19 kernel and XFS patch 2002-09-27 all works well.
Subject: Re: Filesystem corruption at boot, with data loss On Wed, Jan 08, 2003 at 12:42:12AM -0800, bugzilla-daemon@oss.sgi.com wrote: > I don't know if HD write cache is enabled: I kept factory set up, so > I think it's not. I will look, but I don't think it could be > related, because with 2.4.19 kernel and XFS patch 2002-09-27 all > works well. *Many* IDE drives default to write-caching on. The disk may then reorder writes potentially break journalling or worse still, have unflushed data in the cache when it reboots... Please check if this is the case. "hdparm -W0 /dev/sda" or whatever should turn this off... (hdparm man pages calls this dangerous, I've never actually had a problem myself with it). --cw
I will check ASAP and I will try disabling it, but I'd like to known why this problem appeared only with these recent patches (and frequently!) and NEVER before: it'a about a year I use XFS with the same hard disk without a problem. You say that write cache could cause a data reordering, but this sould happen always. I don't agree about the flushing: the hard disk should do it if the system shuts it down correctly; again, if it's a problem of this HD, why the problem appears only now? Excuse my bad English.
*** This bug has been marked as a duplicate of 197 ***
Some final (?) comments, even if it is marked duplicate... I disabled the HD write cache (yes, it was enabled) e for some boots it was all OK. Then X hanged: I waited for the cache to be written (HD LED blinked), then I hard resetted the system, but syslogd didn't start. Soft reset (ctr+alt+del), then I got again the file system corrupted! I think that a lot of system files are corrupted now, so I don't know if this last report has some interest: I will reinstall all ASAP. Kernel 2.4.20, XFS patch 2002-12-17.