xfs
[Top] [All Lists]

Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'
From: Török Edwin <edwin@xxxxxxxxxxxx>
Date: Thu, 11 Jun 2015 18:28:04 +0300
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Christopher Squires <christopher.squires@xxxxxxxx>, Wayne Burri <wayne.burri@xxxxxxxx>, Luca Gibelli <luca@xxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150611151620.GB59168@xxxxxxxxxxxxxxx>
Organization: Skylable Ltd.
References: <5579296A.8010208@xxxxxxxxxxxx> <20150611151620.GB59168@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Icedove/31.7.0
On 06/11/2015 06:16 PM, Brian Foster wrote:
> On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote:
>> [1.] XFS on ARM corruption 'Structure needs cleaning'
>> [2.] Full description of the problem/report:
>>
>> I have been running XFS sucessfully on x86-64 for years, however I'm having 
>> trouble running it on ARM.
>>
>> Running the testcase below [7.] reliably reproduces the filesystem 
>> corruption starting from a freshly
>> created XFS filesystem: running ls after 'sxadm node --new --batch 
>> /export/dfs/a/b' shows a 'Structure needs cleaning' error,
>> and dmesg shows a corruption error [6.].
>> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the 
>> repair filesystem
>> I still get the 'Structure needs cleaning' error.
>>
>> Note: using /export/dfs/a/b is important for reproducing the problem: if I 
>> only use one level of directories in /export/dfs then the problem
>> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer 
>> database files then the problem doesn't reproduce either.
>>
>> [3.] Keywords: filesystems, XFS corruption, ARM
>> [4.] Kernel information
>> [4.1.] Kernel version (from /proc/version):
>> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l 
>> GNU/Linux
>>
> ...
>> [5.] Most recent kernel version which did not have the bug: Unknown, first 
>> kernel I try on ARM
>>
>> [6.] dmesg stacktrace
>>
>> [4627578.440000] XFS (sda4): Mounting Filesystem
>> [4627578.510000] XFS (sda4): Ending clean mount
>> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00  
>> XFSB........7@!.
>> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>> ................
>> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d  
>> [..y.:F=..&..b..
>> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80  
>> .... ...........
> 
> Just a data point... the magic number here looks like a superblock magic
> (XFSB) rather than one of the directory magic numbers. I'm wondering if
> a buffer disk address has gone bad somehow or another.
> 
> Does this happen to be a large block device? I don't see any partition
> or xfs_info data below. If so, it would be interesting to see if this
> reproduces on a smaller device. It does appear that the large block
> device option is enabled in the kernel config above, however, so maybe
> that's unrelated.

This is mkfs.xfs /dev/sda4:
meta-data=/dev/sda4              isize=256    agcount=4, agsize=231737408 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=926949632, imaxpct=5
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=452612, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

But it also reproduces with this small loopback file:
meta-data=/tmp/xfs.test          isize=256    agcount=2, agsize=5120 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=10240, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=1200, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

You can have a look at xfs.test here: 
http://vol-public.s3.indian.skylable.com:8008/armel/testcase/xfs.test.gz

If I loopback mount that on an x86-64 box it doesn't show the corruption 
message though ...

Best regards,
--Edwin

<Prev in Thread] Current Thread [Next in Thread>