xfs
[Top] [All Lists]

RE: XFS corruption on 2.4.28

To: "'Eric Sandeen'" <sandeen@xxxxxxx>
Subject: RE: XFS corruption on 2.4.28
From: "Renaat Dumon" <renaat.dumon@xxxxxxxxxx>
Date: Mon, 31 Oct 2005 20:33:59 +0100
Cc: <linux-xfs@xxxxxxxxxxx>
In-reply-to: <43665311.60904@xxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
Thread-index: AcXeP9NYr9O58zN4TvOoHK4vHLyiWAADmDEQ

I don't think I'm using extended attribs, I just do mkfs.xfs and mount :)
And I don't know what xfs_fsr is :s
I'm running a Gentoo kernel 2.4.28 , but I'm not sure of it's a stock one or
not. 

For what it's worth:


bacardi 1 # emerge -s xfs
Searching...   
[ Results for search key : xfs ]
[ Applications found : 1 ]
 
*  sys-apps/xfsprogs
      Latest version available: 2.3.9
      Latest version installed: 2.3.9
      Size of downloaded files: 750 kB
      Homepage:    http://oss.sgi.com/projects/xfs
      Description: xfs filesystem utilities


I unmount/remounted the filesystem, and after a while the problem
re-appears, albeit for other files. While the one previously mentioned is
still good (for now anyway)

One file:

bacardi 0 # ls -al 0006825d9eea315d1db6193f7095f895.353.db 
-rw-------    1 root     root           44 Oct 31 18:33
0006825d9eea315d1db6193f7095f895.353.db

bacardi 0 # du -sk 0006825d9eea315d1db6193f7095f895.353.db 
2147483532      0006825d9eea315d1db6193f7095f895.353.db

bacardi 0 # xfs_bmap -v 0006825d9eea315d1db6193f7095f895.353.db 
0006825d9eea315d1db6193f7095f895.353.db:
 EXT: FILE-OFFSET      BLOCK-RANGE          AG AG-OFFSET          TOTAL
   0: [0..7]:          145927304..145927311 17 (3320968..3320975)     8

bacardi 0 # xfs_bmap -a -v 0006825d9eea315d1db6193f7095f895.353.db 
0006825d9eea315d1db6193f7095f895.353.db: no extents


Another file:

bacardi 0 # du -sk 000686957ffc98b67b53cb9d67d5b37e.269.db 
2147483532      000686957ffc98b67b53cb9d67d5b37e.269.db
bacardi 0 # xfs_bmap -v 000686957ffc98b67b53cb9d67d5b37e.269.db 
000686957ffc98b67b53cb9d67d5b37e.269.db:
 EXT: FILE-OFFSET      BLOCK-RANGE          AG AG-OFFSET          TOTAL
   0: [0..7]:          145994376..145994383 17 (3388040..3388047)     8
bacardi 0 # xfs_bmap -a -v 000686957ffc98b67b53cb9d67d5b37e.269.db 
000686957ffc98b67b53cb9d67d5b37e.269.db: no extents


I haven't tried mounting the filesystem without the geometry options, my
solution vendor insists that these parameters are "performance enhancing"
and that the application involved is not guaranteed to work as well without
these...


The output of an xfs_repair is squeaky clean, but when I wait really long
before doing an unmount/Remount, I get the move to lost+found again.
The hardware is just fine, I tried dd over & over again to check the disks.
I have 2 disks mirrored using mdadm.

I just realized something you talking about small files,

I have a dir structure [0-9a-f]/[0-9a-f]/[0-9a-f] where all these files
reside. I did the following to easily detect which files were currently
affected:

bacardi 0 # du -sk * |sort -n
< .. Cut .. >
2147483532      00005d697a5a05795f53cb7b081f242d.65536.db
2147483532      00007ed327608e3263372b751dd3bdf3.65536.db
2147483532      0001129845438970777067e99b440123.65536.db
2147483532      0002511297400034364acb5e62c3b04c.65536.db
2147483532      0002525115de4daa012fd9453846c0ab.353.db
2147483532      0002b490e8d2b3ed1d8a7c2bf5c622dd.65536.db
2147483532      0002cf7afa259d56df746769000df7ed.2290.db
2147483532      00030b50559133bdd4189b2114dd56fc.65536.db
2147483532      000311cac771d6745d836d0cb13c5f38.65536.db
2147483532      0003131c2f484c01315a2d6f77b8d036.65536.db
2147483532      0003161725e7f00fcbac96536c09c76c.65536.db
2147483532      0004fd9823a98abeb9b4836432f16085.65536.db
2147483532      000585cd18eeeebd2313681f77cac815.65536.db
2147483532      00064742bc5ba0fe41c3ef38cc5bf250.65536.db
2147483532      0006825d9eea315d1db6193f7095f895.353.db
2147483532      000686957ffc98b67b53cb9d67d5b37e.269.db
2147483532      00079f69432f3aaf071c8305186ac508.65536.db
2147483532      00093637366cbd6da3a99d3e0d9f7afc.65536.db
2147483532      00096d5b955d0d71d0f6ed5823cfeeb4.65536.db
2147483532      0009bbac9ce5bfef3a80bdcd8d1b00e8.65536.db
2147483532      000aa5dcab569b899bcc19d66ece6e51.65536.db
2147483532      000c9447411e4fc7f931fbc79104a8eb.65536.db
2147483532      000cd9b389a17bafa77bd8bf5d045118.65536.db
2147483532      000d54e31742ef5fd847fa715218b418.65536.db
2147483532      000d84c59cb65e30c4dd079fbc5dfa4b.65536.db
2147483532      000dc9c7eb627821dc52b4d82f8fe3bb.65536.db
2147483532      000e02d2078fab322f13972337e2d844.65536.db
2147483532      000f5f2304e482dbc4af0e80331da36f.353.db
2147483532      000f9b244ecdf330dafcb1dc94d146d3.269.db
2147483532      000fcf428f92a1cba92ae11d8ca7d6c2.65536.db
bacardi 0 # 

All these files should be 28 bytes !!!  Maybe there's a problem with writing
small files ?! Note that not EVERY .db shows up bad, there are lots of them
that are OK.

For every .db file there is a corresponding "data" file, these are always
created together (1 "data" file, and 1 db containing the metadata). The
"data" files all show up correctly.

 
Thanks for your comments!


Renaat


-----Original Message-----
From: Eric Sandeen [mailto:sandeen@xxxxxxx] 
Sent: 31 October 2005 18:23
To: Renaat Dumon
Cc: linux-xfs@xxxxxxxxxxx
Subject: Re: XFS corruption on 2.4.28

Renaat Dumon wrote:

A couple questions early on - is this stock 2.4.28 from kernel.org?

Are you using extended attributes?  Have you run xfs_fsr on this filesystem?

I doubt that fsr is the culprit here, because your files are only 28 bytes
long, so fsr would not touch them.

> When I then cd into 0/0/0 and I do a 'du -sk *' :

> 2147483532      000fe1c2b17a7b4b4d2c4eea341cfb08.65536.db

0x7FFFFF8C - hm, a lot of binary 1's in there...


> bacardi 0 # ls -al 000fe1c2b17a7b4b4d2c4eea341cfb08.65536.db
> 
> -rw-------    1 root     root           28 Oct 30 18:53
> 000fe1c2b17a7b4b4d2c4eea341cfb08.65536.db

0x1C

can you try an xfs_bmap -v, and xfs_bmap -a -v of this file?  Just out of
curiosity.

> 
>  
> 
> The correct filesize is indeed 28 bytes! The file mentioned here is 
> just an example, but there are quite some files like that actually :(
> 

It might be interesting to gather the reported/correct values for several
files, so we can possibly identify a pattern.

> 
> Unmounting/remounting the filesystem makes the issue go away 
> temporarily, it is back after a couple of hours of operation.
> 

so a file which shows up as bad is ok after a remount?  So at least this
problem is not on-disk, but...

> 
> I did a xfs_check / xfs_repair before ; but that just dumped (ALMOST) 
> EVERYTHING in lost+found , so I'm losing data :(
> 

... something -is- bad on disk.  What is the output of xfs_repair?

> 
>  
> 
> The fact that I'm having this on multiple systems is what worries me, 
> the filesystems are created with default options, but are mounted with
> su=256,sw=128

what volume manager are you using?

Can you verify whether using the stripe geometry contributes to the problem?

Without su,sw does the problem go away?


> 
> Does this sound familiar to any of you ? Thanks a bunch!

not quite, the last du reporting problem was only on files with extended
attributes, and only after an xfs_fsr run - but in your case the files are
small enough that fsr probably ignores them.

Thanks,

-Eric


<Prev in Thread] Current Thread [Next in Thread>