Read corruption on ARM
Eric Sandeen
sandeen at sandeen.net
Wed Feb 27 15:10:17 CST 2013
On 2/27/13 12:15 PM, Jason Detring wrote:
> On 2/27/13, Eric Sandeen <sandeen at sandeen.net> wrote:
>> On 2/27/13 10:28 AM, Jason Detring wrote:
>>> find-502 [000] 207.983594: xfs_da_btree_corrupt: dev 7:0
>>> bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller
>>> xfs_dir2_leaf_readbuf
>>
>> Was this on the same image as you sent earlier?
>
> Yes, sorry, I should have said that. I'm now using the demo image
> with the RasPi exclusively for testing.
>
>
>> Ok, so this tells us that it was trying to read sector nr. 0x5a4f8 (369912),
>> or fsblock 46239
>>
>> What's really on disk there?
>>
>> $ xfs_db problemimage.xfs
>> xfs_db> blockget -n
>> xfs_db> daddr 369912
>> xfs_db> blockuse
>> block 49152 (3/0) type sb
>> xfs_db> type text
>> xfs_db> p
>> 000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 f0 d3 XFSB............
>> ...
>>
>> So it really did have a superblock location that it was reading
>> at that point - the backup SB in the 3rd allocation group, to be exact.
>> But it shouldn't have been trying to read a superblock at this point
>> in the code...
>>
>> Hm, maybe I should have had you enable all xfs tracepoints to get
>> more info about where we thought we were on disk when we were doing this.
>> If you used trace-cmd you can do "trace-cmd record -e xfs*" IIRC.
>> You can do similar echo 1 > /<blah>/xfs*/enable I think for the sysfs
>> route.
>>
>> Can you identify which directory it was that tripped the above error?
>
> # modprobe xfs-O1-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # find /loop -type d -print0 > list.txt
> # umount /loop
> # rmmod xfs
> # modprobe xfs-O2-g
> # mount -o loop,ro /xfsdebug/problemimage.xfs /loop
> # cat list.txt | xargs -0 -P1 -n1 -I{} sh -c '(dir="{}" ; ls "${dir}"
>> /dev/null ; sleep 0.1 ; dmesg | tail -n1 | grep Corruption && echo
> "${dir} is causing problems")'
> ls: reading directory /loop/ruby/1.9.1: Structure needs cleaning
> [35689.975822] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> /loop/ruby/1.9.1 is causing problems
> ...
>
> OK, I now have a name. Rebooting to get a clean slate.
Ok, and an inode number:
134 test/ruby/1.9.1
xfs_db> inode 134
xfs_db> p
core.format = 2 (extents)
...
core.aformat = 2 (extents)
...
u.bmx[0-1] = [startoff,startblock,blockcount,extentflag] 0:[0,53675,1,0] 1:[8388608,60304,1,0]
so those are the blocks it should live in.
Or, if you prefer:
# xfs_bmap -vv test/ruby/1.9.1
test/ruby/1.9.1:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..7]: 406096..406103 3 (36184..36191) 8
Here's the relevant part of the trace, from the readdir of that inode:
ls-520 xfs_readdir: ino 0x86
ls-520 xfs_perag_get: agno 3 refcount 2 caller _xfs_buf_find
ls-520 xfs_perag_put: agno 3 refcount 1 caller _xfs_buf_find
ls-520 xfs_buf_init: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_get_map
by here we're already looking for the block which isn't related to the dir.
ls-520 xfs_perag_get: agno 3 refcount 2 caller _xfs_buf_find
ls-520 xfs_buf_get: bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_buf_read_map
ls-520 xfs_buf_read: bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 flags READ caller xfs_trans_read_buf_map
ls-520 xfs_buf_iorequest: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
ls-520 xfs_buf_hold: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
ls-520 xfs_buf_rele: bno 0x5a4f8 nblks 0x8 hold 2 pincount 0 lock 0 flags READ|PAGES caller xfs_buf_iorequest
ls-520 xfs_buf_iowait: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_read
loop0-514 xfs_buf_ioerror: bno 0x5a4f8 len 0x1000 hold 1 pincount 0 lock 0 error 0 flags READ|PAGES caller xfs_buf_bio_end_io
loop0-514 xfs_buf_iodone: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags READ|PAGES caller _xfs_buf_ioend
ls-520 xfs_buf_iowait_done: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller _xfs_buf_read
ls-520 xfs_da_btree_corrupt: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 0 flags DONE|PAGES caller xfs_dir2_leaf_readbuf
and here's where we notice that fact I think.
ls-520 xfs_buf_unlock: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse
ls-520 xfs_buf_rele: bno 0x5a4f8 nblks 0x8 hold 1 pincount 0 lock 1 flags DONE|PAGES caller xfs_trans_brelse
Not yet sure what's up here. I'd probably need to get a cross-compiled xfs.ko going on my rpi to do more debugging...
-Eric
More information about the xfs
mailing list