xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 6380] New: XFS corruption with Linux 2.6.16

To: xfs-masters@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 6380] New: XFS corruption with Linux 2.6.16
From: bugme-daemon@xxxxxxxxxxxxxxxxxxx
Date: Wed, 12 Apr 2006 07:26:29 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://bugzilla.kernel.org/show_bug.cgi?id=6380

           Summary: XFS corruption with Linux 2.6.16
    Kernel Version: 2.6.16.4
            Status: NEW
          Severity: blocking
             Owner: xfs-masters@xxxxxxxxxxx
         Submitter: Martin@xxxxxxxxxxxx


Most recent kernel where this bug did not occur: no later kernel version tested
Distribution: Debian Linux Etch/Sid
Hardware Environment: IBM ThinkPad T23, P3 with 1.13 GHz, 384 MB RAM
Software Environment: Kernel 2.6.16.1, kernel 2.6.16.4 with sws2 2.2.4
Problem Description:

I get severe XFS corruption on random occasions. It happens with my Debian 
Linux root partition. /home on a different XFS partition was not yet affected 
(lucky me). And OpenSUSE 10 on yet another partition was not affected to. I 
used OpenSUSE only to recover my broken Debian partition tough.

I have no reproducible pattern do that and it will be triggered, it just 
happens. I got XFS corruption three times within 1 week:

1) I don't know when it happened, but I noticed it as dpkg complained about 
several errors in /var/lib/dpkg/available. I first suspected that it was 
corrupted due to some bug in Debian package management, but then found out that 
it just contained lots of garbage characters at the end of the file. In the 
middle of the file some text where missing or duplicated. 

I booted to OpenSUSE 10 and xfs_check reported errors beyond that usual stuff 
about old deleted files (agi unlinked node or something like that) that can 
easily be fixed. It has been just a few errors and I restored the "available" 
via "apt-cache dumpavail".

2) Next time I wanted to start a mindmap in kdissert, a mind mapping tool for 
KDE, which I used with KDE 3.5.2. I just clicked around a bit, added an item or 
two and then the machine become unresponsive and finally the X.org (modular 
X.org 7 from Debian experimental) died. Then the machine seemed to be locked up 
completely. I switched it off finally. 

The machine didnt boot again, but GRUB found its menu.lst and I managed to boot 
into OpenSUSE 10. OpenSUSE (Kernel 2.6.13.5 or something like that) didn't 
manage to mount my root partition: error 990. I do not remember what happened 
with xfs_check... I think it reported tons of errors or I started with 
xfs_repair straight away. I had to use xfs_repair -L to force log zeroing. It 
reported tons of stuff. Unfortunately I did not log it to a file.

Debian linux booted again upto KDE 3.5 nicely. I tried to repair it, finally 
giving up due to about 200 MB of stuff in lost+found. I restored a backup from 
my externel USB harddisk via rsync.

This was yesterday. I updated my system from 2.6.15.6 to 2.6.16.4, before I had 
2.6.16.1 in use and the third crash happened.

martin@deepdance:~ -> dpkg -l | grep kdissert
ii  kdissert                1.0.5.debian-3          mindmapping tool
(I doubt its related to kdissert)

3) Today XFS got corrupted again. I had extensive apt-get updating running to 
make up for the 3 weeks since the last backup I restored and it also installed 
a new koffice version (release 1.5). I wanted to try out kword, it crashed 
straight away. I tried from console: bash told me "error while starting the 
executable". I did apt-get --reinstall install kword - then it worked.

Ok, once again OpenSUSE 10 and xfs_check. Errors again. Quite a few. This time 
I made a log file. Then xfs_repair, also with log file. I attach those two to 
this bug report.

One thing that I found was that at least with 2) and 3) I had an empty 
file /core in that corrupted XFS filesystem. I thought about the possibility of 
a kernel crash that overwrote XFS in-memory datastructures, but I learned, that 
the Linux kernel itself usually does not core dump to the filesystem.

On occasion 3 I made sure as I compiled 2.6.16.4 that I disable core dumping 
for ELF files. I still got that empty /core in the corrupted Debian root 
filesystem.


Steps to reproduce:

I am not really interested to reproduce this ;-). Well I have no idea. Probably 
use similar kernel, similar hardware and try to use that system productively 
for a while.

I have not had any XFS corruption during my usage of the various 2.6.15 kernels 
I had in use.

This bug report is probably related to:
  #6180


I will revert to 2.6.15.6 for now or even compile 2.6.15.7 as I can not afford 
the time to restore my Debian system from scratch once again. I will however 
restore it from the backup once again to make absolutely sure that it is 
consistent.

I know I probably won't be of much help debugging this, but I just don't have 
the resources to do fs debugging with my laptop that is in heavy productive use 
and at least at home I have no spare system either.

I may try again with 2.6.17 as soon as I am convinced that its stable enough.

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>
  • [xfs-masters] [Bug 6380] New: XFS corruption with Linux 2.6.16, bugme-daemon <=