xfs
[Top] [All Lists]

Re: xfs crash with linux 2.6.15.7 and disabled write caches (long)

To: linux-xfs@xxxxxxxxxxx
Subject: Re: xfs crash with linux 2.6.15.7 and disabled write caches (long)
From: Martin Steigerwald <Martin@xxxxxxxxxxxx>
Date: Fri, 23 Jun 2006 22:01:29 +0200
In-reply-to: <200606230912.35640.Martin@xxxxxxxxxxxx>
References: <200606230156.04907.Martin@xxxxxxxxxxxx> <200606230912.35640.Martin@xxxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: KMail/1.9.1
Am Freitag 23 Juni 2006 09:12 schrieben Sie:

> Hello,
>
> sorry for posting last mail twice. I thought I entered the wrong
> mailing list address (xfs@xxxxxxxxxxx) first as I read a mailing list
> mail with linux-xfs@xxxxxxxxxxx instead.

Hello again,

not that I am too keen replying to myself, but the story got a new 
chapter:

I had another xfs corruption. I was using 2.6.17.1 and rebooting into 
2.6.16.11 once to report whether the "dma_intr" message I talked about in 
my last mail  will be shown in the log with it (see 
http://bugzilla.kernel.org/show_bug.cgi?id=6737). All was with 
writecaches disabled. This kernel bug report also contains all the 
relevant kernel configurations.

I had no kernel crash this time, no suspend failure, no nothing, except:

When I rebooted to 2.6.17.1 and as KDE was closed KMail and KWallet 
crashed. As I had that before when I had a filesystem crash, I booted 
into SUSE 10.1 and checked the debian root partition:

It again had errors:

deepdance:~ # xfs_check /dev/hda5
bad free block nvalid/nused 4/-1 for dir ino 1747843 block 16777216
missing free index for data block 0 in dir ino 1747843
missing free index for data block 1 in dir ino 1747843
missing free index for data block 2 in dir ino 1747843
missing free index for data block 3 in dir ino 1747843
bad free block nvalid/nused 7/-1 for dir ino 5012689 block 16777216
missing free index for data block 0 in dir ino 5012689
missing free index for data block 1 in dir ino 5012689
missing free index for data block 2 in dir ino 5012689
missing free index for data block 3 in dir ino 5012689
missing free index for data block 4 in dir ino 5012689
missing free index for data block 5 in dir ino 5012689
missing free index for data block 6 in dir ino 5012689
bad free block nvalid/nused 8/-1 for dir ino 30448144 block 16777216
missing free index for data block 0 in dir ino 30448144
missing free index for data block 1 in dir ino 30448144
missing free index for data block 2 in dir ino 30448144
missing free index for data block 3 in dir ino 30448144
missing free index for data block 4 in dir ino 30448144
missing free index for data block 5 in dir ino 30448144
missing free index for data block 6 in dir ino 30448144
missing free index for data block 7 in dir ino 30448144
bad free block nvalid/nused 21/-1 for dir ino 33641428 block 16777216
missing free index for data block 0 in dir ino 33641428
missing free index for data block 1 in dir ino 33641428
missing free index for data block 2 in dir ino 33641428
missing free index for data block 3 in dir ino 33641428
missing free index for data block 4 in dir ino 33641428
missing free index for data block 5 in dir ino 33641428
missing free index for data block 6 in dir ino 33641428
missing free index for data block 7 in dir ino 33641428
missing free index for data block 8 in dir ino 33641428
missing free index for data block 9 in dir ino 33641428
missing free index for data block 10 in dir ino 33641428
missing free index for data block 11 in dir ino 33641428
missing free index for data block 12 in dir ino 33641428
missing free index for data block 13 in dir ino 33641428
missing free index for data block 14 in dir ino 33641428
missing free index for data block 15 in dir ino 33641428
missing free index for data block 16 in dir ino 33641428
missing free index for data block 17 in dir ino 33641428
missing free index for data block 18 in dir ino 33641428
missing free index for data block 19 in dir ino 33641428
missing free index for data block 20 in dir ino 33641428
bad free block nvalid/nused 26/-1 for dir ino 42681258 block 16777216
missing free index for data block 0 in dir ino 42681258
missing free index for data block 1 in dir ino 42681258
missing free index for data block 2 in dir ino 42681258
missing free index for data block 3 in dir ino 42681258
missing free index for data block 4 in dir ino 42681258
missing free index for data block 5 in dir ino 42681258
missing free index for data block 6 in dir ino 42681258
missing free index for data block 7 in dir ino 42681258
missing free index for data block 8 in dir ino 42681258
missing free index for data block 9 in dir ino 42681258
missing free index for data block 10 in dir ino 42681258
missing free index for data block 11 in dir ino 42681258
missing free index for data block 12 in dir ino 42681258
missing free index for data block 13 in dir ino 42681258
missing free index for data block 14 in dir ino 42681258
missing free index for data block 15 in dir ino 42681258
missing free index for data block 19 in dir ino 42681258
missing free index for data block 22 in dir ino 42681258
missing free index for data block 23 in dir ino 42681258
missing free index for data block 24 in dir ino 42681258
missing free index for data block 25 in dir ino 42681258
bad free block nvalid/nused 25/-1 for dir ino 46142796 block 16777216
missing free index for data block 0 in dir ino 46142796
missing free index for data block 1 in dir ino 46142796
missing free index for data block 2 in dir ino 46142796
missing free index for data block 3 in dir ino 46142796
missing free index for data block 4 in dir ino 46142796
missing free index for data block 5 in dir ino 46142796
missing free index for data block 6 in dir ino 46142796
missing free index for data block 7 in dir ino 46142796
missing free index for data block 8 in dir ino 46142796
missing free index for data block 9 in dir ino 46142796
missing free index for data block 10 in dir ino 46142796
missing free index for data block 11 in dir ino 46142796
missing free index for data block 12 in dir ino 46142796
missing free index for data block 13 in dir ino 46142796
missing free index for data block 14 in dir ino 46142796
missing free index for data block 15 in dir ino 46142796
missing free index for data block 16 in dir ino 46142796
missing free index for data block 17 in dir ino 46142796
missing free index for data block 18 in dir ino 46142796
missing free index for data block 19 in dir ino 46142796
missing free index for data block 20 in dir ino 46142796
missing free index for data block 21 in dir ino 46142796
missing free index for data block 22 in dir ino 46142796
missing free index for data block 23 in dir ino 46142796
missing free index for data block 24 in dir ino 46142796
bad free block nvalid/nused 65/-1 for dir ino 55176185 block 16777216
missing free index for data block 0 in dir ino 55176185
missing free index for data block 1 in dir ino 55176185
missing free index for data block 2 in dir ino 55176185
missing free index for data block 3 in dir ino 55176185
missing free index for data block 4 in dir ino 55176185
missing free index for data block 5 in dir ino 55176185
missing free index for data block 6 in dir ino 55176185
missing free index for data block 7 in dir ino 55176185
missing free index for data block 8 in dir ino 55176185
missing free index for data block 9 in dir ino 55176185
missing free index for data block 10 in dir ino 55176185
missing free index for data block 11 in dir ino 55176185
missing free index for data block 12 in dir ino 55176185
missing free index for data block 13 in dir ino 55176185
missing free index for data block 14 in dir ino 55176185
missing free index for data block 15 in dir ino 55176185
missing free index for data block 16 in dir ino 55176185
missing free index for data block 17 in dir ino 55176185
missing free index for data block 18 in dir ino 55176185
missing free index for data block 19 in dir ino 55176185
missing free index for data block 20 in dir ino 55176185
missing free index for data block 21 in dir ino 55176185
missing free index for data block 22 in dir ino 55176185
missing free index for data block 23 in dir ino 55176185
missing free index for data block 24 in dir ino 55176185
missing free index for data block 25 in dir ino 55176185
missing free index for data block 26 in dir ino 55176185
missing free index for data block 27 in dir ino 55176185
missing free index for data block 28 in dir ino 55176185
missing free index for data block 29 in dir ino 55176185
missing free index for data block 30 in dir ino 55176185
missing free index for data block 31 in dir ino 55176185
missing free index for data block 32 in dir ino 55176185
missing free index for data block 33 in dir ino 55176185
missing free index for data block 34 in dir ino 55176185
missing free index for data block 35 in dir ino 55176185
missing free index for data block 36 in dir ino 55176185
missing free index for data block 37 in dir ino 55176185
missing free index for data block 38 in dir ino 55176185
missing free index for data block 39 in dir ino 55176185
missing free index for data block 40 in dir ino 55176185
missing free index for data block 41 in dir ino 55176185
missing free index for data block 42 in dir ino 55176185
missing free index for data block 43 in dir ino 55176185
missing free index for data block 44 in dir ino 55176185
missing free index for data block 45 in dir ino 55176185
missing free index for data block 46 in dir ino 55176185
missing free index for data block 47 in dir ino 55176185
missing free index for data block 48 in dir ino 55176185
missing free index for data block 49 in dir ino 55176185
missing free index for data block 50 in dir ino 55176185
missing free index for data block 51 in dir ino 55176185
missing free index for data block 52 in dir ino 55176185
missing free index for data block 53 in dir ino 55176185
missing free index for data block 54 in dir ino 55176185
missing free index for data block 56 in dir ino 55176185
missing free index for data block 58 in dir ino 55176185
missing free index for data block 60 in dir ino 55176185
missing free index for data block 63 in dir ino 55176185
missing free index for data block 64 in dir ino 55176185
bad free block nvalid/nused 5/-1 for dir ino 59806790 block 16777216
missing free index for data block 0 in dir ino 59806790
missing free index for data block 1 in dir ino 59806790
missing free index for data block 2 in dir ino 59806790
missing free index for data block 3 in dir ino 59806790
missing free index for data block 4 in dir ino 59806790
bad free block nvalid/nused 21/-1 for dir ino 62915542 block 16777216
missing free index for data block 0 in dir ino 62915542
missing free index for data block 1 in dir ino 62915542
missing free index for data block 2 in dir ino 62915542
missing free index for data block 3 in dir ino 62915542
missing free index for data block 4 in dir ino 62915542
missing free index for data block 5 in dir ino 62915542
missing free index for data block 6 in dir ino 62915542
missing free index for data block 7 in dir ino 62915542
missing free index for data block 8 in dir ino 62915542
missing free index for data block 9 in dir ino 62915542
missing free index for data block 10 in dir ino 62915542
missing free index for data block 11 in dir ino 62915542
missing free index for data block 12 in dir ino 62915542
missing free index for data block 13 in dir ino 62915542
missing free index for data block 14 in dir ino 62915542
missing free index for data block 15 in dir ino 62915542
missing free index for data block 16 in dir ino 62915542
missing free index for data block 17 in dir ino 62915542
missing free index for data block 18 in dir ino 62915542
missing free index for data block 19 in dir ino 62915542
missing free index for data block 20 in dir ino 62915542

Seemed that xfs_repair was able to repair it losslessly - lost+found was 
empty after repair:

deepdance:/mnt # xfs_repair /dev/hda5
Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - zero log...
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan and clear agi unlinked lists...
        - process known inodes and perform inode discovery...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - clear lost+found (if it exists) ...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
        - agno = 4
        - agno = 5
        - agno = 6
        - agno = 7
        - agno = 8
        - agno = 9
        - agno = 10
        - agno = 11
        - agno = 12
        - agno = 13
        - agno = 14
        - agno = 15
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - ensuring existence of lost+found directory
        - traversing filesystem starting at / ...
free block 16777216 for directory inode 1747843 bad nused
rebuilding directory inode 1747843
free block 16777216 for directory inode 30448144 bad nused
rebuilding directory inode 30448144
free block 16777216 for directory inode 59806790 bad nused
rebuilding directory inode 59806790
free block 16777216 for directory inode 55176185 bad nused
rebuilding directory inode 55176185
free block 16777216 for directory inode 5012689 bad nused
rebuilding directory inode 5012689
free block 16777216 for directory inode 42681258 bad nused
rebuilding directory inode 42681258
free block 16777216 for directory inode 46142796 bad nused
rebuilding directory inode 46142796
free block 16777216 for directory inode 33641428 bad nused
rebuilding directory inode 33641428
free block 16777216 for directory inode 62915542 bad nused
rebuilding directory inode 62915542
        - traversal finished ...
        - traversing all unattached subtrees ...
        - traversals finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done


I reconsidered my conclusion that a hardware failure is unlikely and 
tested the partition:

-------------------------------------------------------
deepdance:~ # 
badblocks -s -v -n -o /home/martin/XFS-Probleme/badblocks.txt /dev/hda5
Suche nach defekten Bloecken im zerstoerungsfreien Lesen+Schreiben-Modus
Von Block 0 bis 9767488
Suche nach defekten Bloecken (zerstoerungsfreier Lesen+Schreiben-Modus)
Teste mit zufaelligen Mustern: erledigt
Durchgang beendet, 0 defekte Bloecke gefunden.
-------------------------------------------------------

Its in german, I forgot to change the locale, it reports 0 defect blocks 
found. The badblock file is zero bytes as well:

-------------------------------------------------------
martin@deepdance:~/XFS-Probleme> ls -l badblocks.txt
-rw-r--r-- 1 root root 0 2006-06-23 18:31 badblocks.txt
-------------------------------------------------------


I did a long SMART selftest using "smartctl -t long /dev/hda". It 
completed without errors:

-------------------------------------------------------
deepdance:~ # smartctl -l selftest /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  
LifeTime(hours)  LBA_of_first_error
# 1  Extended offline    Completed without error       00%      
5868         -
# 2  Short offline       Completed without error       00%      
2950         -
# 3  Extended offline    Completed without error       00%      
2944         -
# 4  Short offline       Completed without error       00%      
2913         -
{... all further tests without error ...]
-------------------------------------------------------

There have been no tests for a long time due to a mistake 
in /etc/smartd.conf which I hopefully corrected today.


So it seems the harddisk is okay. Only thing is 5 of these errors in the 
error log (all on disk power-on lifetime 393 hours):

-------------------------------------------------------
deepdance:~ # smartctl -l error /dev/hda
[...]
Error 200 occurred at disk power-on lifetime: 393 hours (16 days + 9 
hours)
  When the command that caused the error occurred, the device was active 
or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  10 59 01 73 65 6c ee  Error: IDNF at LBA = 0x0e6c6573 = 241984883

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  20 03 01 73 65 6c ee 00      00:00:50.900  READ SECTOR(S)
  c8 03 01 73 65 6c ee 00      00:00:50.900  READ DMA
  20 03 01 73 65 6c ee 00      00:00:50.800  READ SECTOR(S)
  c8 03 01 73 65 6c ee 00      00:00:50.800  READ DMA
  20 03 01 73 65 6c ee 00      00:00:50.700  READ SECTOR(S)
-------------------------------------------------------


They are strange, cause the device does not have that much sectors:

-------------------------------------------------------
deepdance:~ # LANG=C fdisk -lu /dev/hda

Disk /dev/hda: 60.0 GB, 60011642880 bytes
255 heads, 63 sectors/track, 7296 cylinders, total 117210240 sectors
Units = sectors of 1 * 512 = 512 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/hda1   *          63     9767519     4883728+   7  HPFS/NTFS
/dev/hda2        10233405    19535039     4650817+   c  W95 FAT32 (LBA)
/dev/hda3         9767520    10233404      232942+   6  FAT16
/dev/hda4        19535040   117210239    48837600    5  Extended
/dev/hda5        19535103    39070079     9767488+  83  Linux
/dev/hda6        39070143    58605119     9767488+  83  Linux
/dev/hda7        58605183    60565049      979933+  82  Linux swap / 
Solaris
/dev/hda8        60565113   117210239    28322563+  83  Linux

Partition table entries are not in disk order
[I know, I added the extended partition before I resized the FAT32 
partition to add a FAT16 one for FreeDOS;)]
-------------------------------------------------------


I intend to ask on the smartmontools mailinglist about those.

And I will run a memtest86 over night.

Any other tips to diagnose a hardware problem? Or do above XFS errors hint 
at a software bug?

I did not file this as bug report yet - cause I am not too sure that it is 
not a hardware failure. I will do, if you want me to.

I really want to trace that all down. I do not really have the feeling 
that my data is safe at the moment (well I have a backup as of today on 
an external USB drive).

If XFS gets corrupted again I may switch that partition to ext3. If it 
then crashes with ext3 I may be better off replacing that harddisk, even 
when I could not diagnose an error with it.

But first that memtest this night... maybe this reveals something.

Regards,
Martin
-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7


<Prev in Thread] Current Thread [Next in Thread>