xfs
[Top] [All Lists]

Re: xfs filesystem corruption with kernel 2.6.37

To: xfs@xxxxxxxxxxx
Subject: Re: xfs filesystem corruption with kernel 2.6.37
From: Kamal Dasu <kdasu.kdev@xxxxxxxxx>
Date: Fri, 9 Nov 2012 13:18:13 -0800 (PST)
In-reply-to: <20121103222518.GH29378@dastard>
References: <CAC=U0a2T_J9Y6WzvWyFfbBSDy__Pr7f4gfQBie2o0VhAm2jCaQ@xxxxxxxxxxxxxx> <20121025224713.GF29378@dastard> <34630253.post@xxxxxxxxxxxxxxx> <20121102012728.GT29378@dastard> <34633803.post@xxxxxxxxxxxxxxx> <20121102225509.GZ29378@dastard> <37CBCD87-5B71-4B9C-8FAE-5BBF85804983@xxxxxxxxx> <20121103222518.GH29378@dastard>
Dave,

On more analysis of the corrupt disks seems exhibit a problem with duplicate
rt inode extents.

On mount this would be the assert:
 prev.br_state == XFS_EXT_NORM assert with Anita disk)
Starting XFS recovery on filesystem: sda2 (logdev: internal)
Assertion failed: del.br_startblock == prev.br_startblock +
prev.br_blockcount, file: fs/xfs/xfs_bmap.c, line: 5201

# xfs_repair –n /dev/sda2 –r /dev/sda3
..
..
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
agi unlinked bucket 38 is 138790 in ag 3 (inode=50470438)
..
..
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 0
        - agno = 1
        - agno = 2
        - agno = 3
data fork in rt ino 50470438 claims dup rt extent,off - 1024, start -
12963072, count 160
bad data fork in inode 50470438
would have cleared inode 50470438

sh-3.1# xfs_db  /dev/sda2
xfs_db> inode 50470438
xfs_db> bmap
data offset 0 startblock 11024384 (10/538624) count 256 flag 0
data offset 1024 startblock 12963072 (12/380160) count 160 flag 1
data offset 1184 startblock 12963232 (12/380320) count 1120 flag 0
data offset 2304 startblock 12951808 (12/368896) count 1024 flag 0
data offset 3328 startblock 12824064 (12/241152) count 768 flag 0
data offset 6356034 startblock 2392537314989568 (2281701388/366080) count
928 flag 0

Similar problem in context of xfs_repair was reported with respect to
realtime extents on 2.6.37. Seems like unaligned extent size can cause
sharing of the  unwritten free part of a written  with the next allocated
unwritten adjecent extent .  During runtime (with XFS_DEBUG enabled) we do
notice that the same extent is being unlinked multiple times and an assert
for.

Assertion failed: fsb != NULLFSBLOCK, file: fs/xfs/xfs_rtalloc.c, line: 875
 
I am wondering if this can cause corruption when it happens during runtime
without XFS_DEBUG build.

I was looking at a similar thread where you have commented regarding a
possible fix in the kernel bug duplicate extent which overlap due unaligned
rt extent size calculation.

http://oss.sgi.com/archives/xfs/2012-09/msg00292.html

Thanks
Kamal
-- 
View this message in context: 
http://old.nabble.com/xfs-filesystem-corruption-with-kernel-2.6.37-tp34601185p34662440.html
Sent from the Xfs - General mailing list archive at Nabble.com.

<Prev in Thread] Current Thread [Next in Thread>