Bug 207 - Filesystem corruption at boot, with data loss
: Filesystem corruption at boot, with data loss
Status: RESOLVED DUPLICATE of bug 197
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: 1.2.x
: Linux
: critical
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-01-07 08:22 CST by dgborin
Modified: 2003-01-12 14:39 CST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description dgborin 2003-01-07 08:22:45 CST
Kernel 2.4.20 with XFS patch 2002-11-29
 and
Kernel 2.4.19 with XFS patch 1.2pre4

Debian 3.0r0 with security patches, XFree 4.2.1, Gnome 2 e KDE 3.

Both the patches from oss.sgi.com.

At reboot, I randomly get a filesystem corruption: the first time the system
wasn't able to mount /, the other times it wasn't able to read getty, so I
coudn't read the other messages.

Trying to fix with xfs_repair from xfsutils 2.3.6 wasn't possible because of an
internal error. With the older version shipped with Debian 3.0r0 I was able to
repair it.

After the repairs, there was random data loss (entire files with 0 length).

The problem never happened before, with kernel 2.4.19 with XFS patch 2002-09-27
and previous versions.

Hardware configuration:
AMD K6 1.3GHz
ASUS A7V motherboard (VIA KT133)
NVidia GeForce 2 MX 400
Creative SB Live!
Adaptech SCSI controller 2940U
DEC 10/100 ethernet controller (tulip driver)
IBM 40GB hard disk (checked, it works OK)

Additional kernel modules:
ALSA 0.9pre6
NVidia driver 1.0-3123
Packet writing 2.4.19-2 (only with 2.4.19)

All software compiled with gcc 2.95.4 (the version shipped with Debian 3.0).
Comment 1 Chris Wedgwood 2003-01-07 13:28:46 CST
Subject: Re:  New: Filesystem corruption at boot, with data loss

On Tue, Jan 07, 2003 at 08:22:45AM -0800, bugzilla-daemon@oss.sgi.com wrote:

> At reboot, I randomly get a filesystem corruption: the first time
> the system wasn't able to mount /, the other times it wasn't able to
> read getty, so I coudn't read the other messages.

is the hd write-caching disabled?

> After the repairs, there was random data loss (entire files with 0
> length).

getting this for files that were open for writes is normal, in
unrelated file it it not



  --cw

Comment 2 dgborin 2003-01-08 00:42:11 CST
1) My CPU is an AMD Athlon, not K6, I wrote wrong.

2) I get this corruption at boot time: the system was correctly shut down
previously, so data loss isn't normal!

I don't know if HD write cache is enabled: I kept factory set up, so I think
it's not. I will look, but I don't think it could be related, because with
2.4.19 kernel and XFS patch 2002-09-27 all works well.
Comment 3 Chris Wedgwood 2003-01-08 12:44:03 CST
Subject: Re:  Filesystem corruption at boot, with data loss

On Wed, Jan 08, 2003 at 12:42:12AM -0800, bugzilla-daemon@oss.sgi.com wrote:

> I don't know if HD write cache is enabled: I kept factory set up, so
> I think it's not. I will look, but I don't think it could be
> related, because with 2.4.19 kernel and XFS patch 2002-09-27 all
> works well.

*Many* IDE drives default to write-caching on.  The disk may then
reorder writes potentially break journalling or worse still, have
unflushed data in the cache when it reboots...

Please check if this is the case.  "hdparm -W0 /dev/sda" or whatever
should turn this off... (hdparm man pages calls this dangerous, I've
never actually had a problem myself with it).



  --cw

Comment 4 dgborin 2003-01-09 05:38:58 CST
I will check ASAP and I will try disabling it, but I'd like to known why this
problem appeared only with these recent patches (and frequently!) and NEVER
before: it'a about a year I use XFS with the same hard disk without a problem.

You say that write cache could cause a data reordering, but this sould happen
always. I don't agree about the flushing: the hard disk should do it if the
system shuts it down correctly; again, if it's a problem of this HD, why the
problem appears only now?

Excuse my bad English.
Comment 5 Russell Cattelan 2003-01-09 10:25:45 CST

*** This bug has been marked as a duplicate of 197 ***
Comment 6 dgborin 2003-01-12 12:39:56 CST
Some final (?) comments, even if it is marked duplicate...

I disabled the HD write cache (yes, it was enabled) e for some boots it was all
OK. Then X hanged: I waited for the cache to be written (HD LED blinked), then I
hard resetted the system, but syslogd didn't start. Soft reset (ctr+alt+del),
then I got again the file system corrupted!

I think that a lot of system files are corrupted now, so I don't know if this
last report has some interest: I will reinstall all ASAP.

Kernel 2.4.20, XFS patch 2002-12-17.