Re: XFS corruption on ubuntu 2.6.27-9-server

To: George Barnett <george@xxxxxxxxxxx>
Subject: Re: XFS corruption on ubuntu 2.6.27-9-server
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Tue, 03 Feb 2009 20:05:27 -0600
Cc: xfs@xxxxxxxxxxx
In-reply-to: <7B2E904E-498E-4EEC-A09F-4DE823E4FAB0@xxxxxxxxxxx>
References: <2653B83E-85DA-4949-BCED-AF2BA3D324E1@xxxxxxxxxxx> <4988EF37.7020306@xxxxxxxxxxx> <5CCF20F5-33D5-409E-BB27-5E1C5CB4D9E5@xxxxxxxxxxx> <4988F363.1070708@xxxxxxxxxxx> <7B2E904E-498E-4EEC-A09F-4DE823E4FAB0@xxxxxxxxxxx>
User-agent: Thunderbird (Macintosh/20081209)
George Barnett wrote:
> On 04/02/2009, at 12:46 PM, Eric Sandeen wrote:
>>> bad version number 0x0 on inode 18046
>>> bad magic number 0x0 on inode 18047
>>> bad version number 0x0 on inode 18047
>>> bad directory block magic # 0 in block 0 for directory inode 18000
>> Interesting that all the bad magic numbers were 0... not sure what to
>> make of that, offhand, I'm afraid...
> Oh dear.
> I'm going to try moving the filesystem to ext3 to see if this  
> continues.  If it does, it would suggest a bug in the underlying  
> raid10 implementation or a problem with the disks, although they're  
> not reporting any errors [1].

one thing to note is that xfs is very good at detecting on-disk
corruption, not sure ext3 will be as good.  So ext3 may seem to run
finer, longer, even if there is an underlying problem.

> Is there any further debugging I can do before I start fresh?

well, it'd be great to have an isolated testcase, if you can reproduce
it succinctly.

Also I don't know what exact kernel ubuntu uses or what patches are in
it; you might try a stock upstream kernel w/ the same config,
2.6.27.$LATEST, and see if you continue to have problems.


> George
> 1.  The hardware ecc recovered smartctl metric is /very/ high,  
> although I'm told this may be normal for samsung drives.  I cant think  
> of any way to confirm a disk problem without a CRC checking fs though.

