xfs
[Top] [All Lists]

silent corruption after kernel panic?

To: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Subject: silent corruption after kernel panic?
From: "Assarsson, Emil" <Emil.Assarsson@xxxxxxxxxxxxxxxx>
Date: Mon, 19 Sep 2011 14:28:23 +0200
Accept-language: en-US, sv-SE
Acceptlanguage: en-US, sv-SE
Thread-index: Acx2x59zCZScR9F5Ta+T943Q0sW4Gw==
Thread-topic: silent corruption after kernel panic?
Hi,

We are running a 20TB XFS filesystem on top of LVM2 and SAN storage (HP
Open-V) with multipathd. Ubuntu Lucid. The disk write cache is enabled
and we use mount options rw.

This is a log of events taken from my memory and can have missed out
things :-P

The system panicked and automatically restarted after 30 seconds.

It seemed to be ok but after awhile we got cases where users got files
with zero length. We tried to run xfs_check on the filesystem but it
couldn't find any problems with it. After that we restarted the system
and the files (even the files that was zero length) seemed ok again. But
then we got messages (short version):
-----
Sep 16 06:40:34 seldlnx034 kernel: [54607.977261] XFS internal error
XFS_WANT_CORRUPTED_RETURN at line 381 of
file /build/buildd/linux-2.6.32/fs/xfs/xfs_alloc.c.  Caller
0xffffffffa01eed36
Sep 16 06:40:34 seldlnx034 kernel: [54607.996676]  [<ffffffffa0215383>]
xfs_error_report+0x43/0x50 [xfs]
Sep 16 06:40:34 seldlnx034 kernel: [54607.996689]
-----

... and files written during this period became corrupt (zero length).

We did a xfs_repair on the filesystem (short version):
-----
entry "fw-radmp_all.deb" at block 0 offset 944 in directory inode
157891962 references free inode 195983876
        clearing inode number in entry at offset 944...
Phase 5 - rebuild AG headers and trees...
        - reset superblock...
Phase 6 - check inode connectivity...
        - resetting contents of realtime bitmap and summary inodes
        - traversing filesystem ...
bad hash table for directory inode 13786 (no data entry): rebuilding
rebuilding directory inode 13786
bad hash table for directory inode 2130829772 (no data entry):
rebuilding
rebuilding directory inode 2130829772
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify and correct link counts...
done
------

We have made a verification of the files now I we don't have any known
problems with the file system now but the files created when the file
system was broken needed to be recreated.



How can I avoid this in the future and how can I ensure that I get
informed about a problem? Do I do anything wrong with the setup that you
can see?

--
Emil Assarsson
<Prev in Thread] Current Thread [Next in Thread>