[Top] [All Lists]

RE: XFS corruption on 2.4.28

To: "'Eric Sandeen'" <sandeen@xxxxxxx>
Subject: RE: XFS corruption on 2.4.28
From: "Renaat Dumon" <renaat.dumon@xxxxxxxxxx>
Date: Tue, 1 Nov 2005 00:07:50 +0100
Cc: <linux-xfs@xxxxxxxxxxx>
In-reply-to: <436688CF.9050204@xxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Thread-index: AcXeX9sn5JvZ5dHhQSuTWLaNKLBCFAAD1lBQ
Well , about trying a stock kernel, I'm affraid this is going to be hard. I
get these systems installed - we're talking about a backup appliance here -
and the software is set up around gentoo ...

About the lost+found , that happens when I leave the system accumulating the
errors for a week or so. If I don't wait that long and then remount, I'm
good for another day... That's why xfs_repair isn't spewing stuff...

I will mount the filesystem without the geometry options, and see how that
goes. Since this is a backup appliance and it's now 00:06 AM here, you can
imagine I can't do that just right now with all these backups going on..

Kind regards,


-----Original Message-----
From: Eric Sandeen [mailto:sandeen@xxxxxxx] 
Sent: 31 October 2005 22:13
To: Renaat Dumon
Cc: linux-xfs@xxxxxxxxxxx
Subject: Re: XFS corruption on 2.4.28

Renaat Dumon wrote:
> I don't think I'm using extended attribs, I just do mkfs.xfs and mount 
> :) And I don't know what xfs_fsr is :s I'm running a Gentoo kernel 
> 2.4.28 , but I'm not sure of it's a stock one or not.

any chance you could try a stock kernel from 2.4.28, untouched by our
friends at gentoo.... ?

> I unmount/remounted the filesystem, and after a while the problem 
> re-appears, albeit for other files. While the one previously mentioned 
> is still good (for now anyway)
> One file:

> Another file:

Ok, no attributes.

And the "wrong" du output is always 0x7FFFFF8C in both cases, odd.

> I haven't tried mounting the filesystem without the geometry options, 
> my solution vendor insists that these parameters are "performance
> and that the application involved is not guaranteed to work as well 
> without these...

it will probably slow things down, yes... but that should be the worst of
Depending on how easily/quickly you can normally reproduce, it might be a
good data point.

Alternatively, perhaps a test that replicates what your application is doing
could be devised to reproduce the problem... can you say which application
this is, or what its IO pattern looks like?

> The output of an xfs_repair is squeaky clean, but when I wait really 
> long before doing an unmount/Remount, I get the move to lost+found again.

Well, xfs_repair should say -something- if it's going to move everything to 
lost+found/ ....

> The hardware is just fine, I tried dd over & over again to check the
> I have 2 disks mirrored using mdadm.
> I just realized something you talking about small files,
> I have a dir structure [0-9a-f]/[0-9a-f]/[0-9a-f] where all these 
> files reside. I did the following to easily detect which files were 
> currently
> affected:
> bacardi 0 # du -sk * |sort -n
> < .. Cut .. >
> 2147483532      00005d697a5a05795f53cb7b081f242d.65536.db

> All these files should be 28 bytes !!!  

And they are always reported as 2147483532 / 0x7FFFFF8C

Hmmmm... thinking.


<Prev in Thread] Current Thread [Next in Thread>