xfs
[Top] [All Lists]

Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'

To: Török Edwin <edwin@xxxxxxxxxxxx>
Subject: Re: PROBLEM: XFS on ARM corruption 'Structure needs cleaning'
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Thu, 11 Jun 2015 11:16:20 -0400
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Christopher Squires <christopher.squires@xxxxxxxx>, Wayne Burri <wayne.burri@xxxxxxxx>, Luca Gibelli <luca@xxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <5579296A.8010208@xxxxxxxxxxxx>
References: <5579296A.8010208@xxxxxxxxxxxx>
User-agent: Mutt/1.5.23 (2014-03-12)
On Thu, Jun 11, 2015 at 09:23:38AM +0300, Török Edwin wrote:
> [1.] XFS on ARM corruption 'Structure needs cleaning'
> [2.] Full description of the problem/report:
> 
> I have been running XFS sucessfully on x86-64 for years, however I'm having 
> trouble running it on ARM.
> 
> Running the testcase below [7.] reliably reproduces the filesystem corruption 
> starting from a freshly
> created XFS filesystem: running ls after 'sxadm node --new --batch 
> /export/dfs/a/b' shows a 'Structure needs cleaning' error,
> and dmesg shows a corruption error [6.].
> xfs_repair 3.1.9 is not able to repair the corruption: after mounting the 
> repair filesystem
> I still get the 'Structure needs cleaning' error.
> 
> Note: using /export/dfs/a/b is important for reproducing the problem: if I 
> only use one level of directories in /export/dfs then the problem
> doesn't reproduce. Also if I use a tuned version of sxadm that creates fewer 
> database files then the problem doesn't reproduce either.
> 
> [3.] Keywords: filesystems, XFS corruption, ARM
> [4.] Kernel information
> [4.1.] Kernel version (from /proc/version):
> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l 
> GNU/Linux
> 
...
> [5.] Most recent kernel version which did not have the bug: Unknown, first 
> kernel I try on ARM
> 
> [6.] dmesg stacktrace
> 
> [4627578.440000] XFS (sda4): Mounting Filesystem
> [4627578.510000] XFS (sda4): Ending clean mount
> [4627621.470000] dd6ee000: 58 46 53 42 00 00 10 00 00 00 00 00 37 40 21 00  
> XFSB........7@!.
> [4627621.480000] dd6ee010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> ................
> [4627621.490000] dd6ee020: 5b 08 7f 79 0e 3a 46 3d 9b ea 26 ad 9d 62 17 8d  
> [..y.:F=..&..b..
> [4627621.490000] dd6ee030: 00 00 00 00 20 00 00 04 00 00 00 00 00 00 00 80  
> .... ...........

Just a data point... the magic number here looks like a superblock magic
(XFSB) rather than one of the directory magic numbers. I'm wondering if
a buffer disk address has gone bad somehow or another.

Does this happen to be a large block device? I don't see any partition
or xfs_info data below. If so, it would be interesting to see if this
reproduces on a smaller device. It does appear that the large block
device option is enabled in the kernel config above, however, so maybe
that's unrelated.

Brian

> [4627621.500000] XFS (sda4): Internal error xfs_dir3_data_read_verify at line 
> 274 of file fs/xfs/xfs_dir2_data.c.  Caller 0xc01c1528
> [4627621.510000] CPU: 0 PID: 37 Comm: kworker/0:1H Not tainted 
> 3.14.3-00088-g7651c68 #24
> [4627621.510000] Workqueue: xfslogd xfs_buf_iodone_work
> [4627621.510000] [<c0013948>] (unwind_backtrace) from [<c0011058>] 
> (show_stack+0x10/0x14)
> [4627621.510000] [<c0011058>] (show_stack) from [<c01c3dc4>] 
> (xfs_corruption_error+0x54/0x70)
> [4627621.510000] [<c01c3dc4>] (xfs_corruption_error) from [<c01f7854>] 
> (xfs_dir3_data_read_verify+0x60/0xd0)
> [4627621.510000] [<c01f7854>] (xfs_dir3_data_read_verify) from [<c01c1528>] 
> (xfs_buf_iodone_work+0x7c/0x94)
> [4627621.510000] [<c01c1528>] (xfs_buf_iodone_work) from [<c00309f0>] 
> (process_one_work+0xf4/0x32c)
> [4627621.510000] [<c00309f0>] (process_one_work) from [<c0030fb4>] 
> (worker_thread+0x10c/0x388)
> [4627621.510000] [<c0030fb4>] (worker_thread) from [<c0035e10>] 
> (kthread+0xbc/0xd8)
> [4627621.510000] [<c0035e10>] (kthread) from [<c000e8f8>] 
> (ret_from_fork+0x14/0x3c)
> [4627621.510000] XFS (sda4): Corruption detected. Unmount and run xfs_repair
> [4627621.520000] XFS (sda4): metadata I/O error: block 0x6e804200 
> ("xfs_trans_read_buf_map") error 117 numblks 8
> 
> [7.] Testcase:
> 
> $ curl -O 
> http://vol-public.s3.indian.skylable.com:8008/armel/testcase/libsx2_1.1-1_armel.deb
> $ curl -O 
> http://vol-public.s3.indian.skylable.com:8008/armel/testcase/sx_1.1-1_armel.deb
> $ sudo dpkg -i libsx2_*.deb sx_*.deb
> $ sudo umount /export/dfs
> $ sudo mkfs.xfs -f /dev/sda4
> $ sudo mount /dev/sda4 /export/dfs
> $ sudo mkdir /export/dfs/a
> $ sudo sxadm node --new --batch /export/dfs/a/b
> $ sudo ls /export/dfs/a/b
> ls: reading directory /export/dfs/a/b: Structure needs cleaning
> $ dmesg
> $ sudo umount /export/dfs
> $ sudo xfs_repair /dev/sda4
> $ sudo mount /dev/sda4 /export/dfs
> $ sudo ls /export/dfs/a/b
> ls: reading directory /export/dfs/a/b: Structure needs cleaning
> 
> 'sxadm node --new' uses SQLite3 to create a set of new databases and 
> reproduces the problem reliably.
> However I was not able to reproduce this by just using the command-line 
> sqlite tools.
> 
> The source code of sxadm can be found here if you want to build manually 
> instead of using a package:
> http://gitweb.skylable.com/gitweb/?p=sx.git;a=summary
> 
> [8.] Environment
> [8.1.] Software (add the output of the ver_linux script here)
> Linux hornet34 3.14.3-00088-g7651c68 #24 Thu Apr 9 16:13:46 MDT 2015 armv7l 
> GNU/Linux
>  
> Gnu C                  
> binutils               
> util-linux             2.20.1
> mount                  support
> module-init-tools      16
> e2fsprogs              1.42.9
> xfsprogs               3.1.9
> Linux C Library        2.17
> Dynamic linker (ldd)   2.17
> Procps                 3.3.9
> Net-tools              1.60
> Kbd                    
> Sh-utils               8.21
> Modules Loaded 
> 
> [8.2.] Processor information (from /proc/cpuinfo):
> processor       : 0
> model name      : ARMv7 Processor rev 0 (v7l)
> Features        : swp half thumb fastmult edsp tls 
> CPU implementer : 0x41
> CPU architecture: 7
> CPU variant     : 0x4
> CPU part        : 0xc09
> CPU revision    : 0
> 
> Hardware        : hornet
> Revision        : 0000
> Serial          : 0000000000000000
> 
> [8.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem)
> 14872000-14872fff : sn_dev
> 14a00000-14a3ffff : sn_dev
> 14050000-14050fff : uart
>   14050000-14050fff : uart-pl011
> 30000000-9fffffff : System RAM
>   30008000-304f616f : Kernel code
>   30518000-305ae327 : Kernel data
> 
> [8.6.] /proc/scsi/scsi:
> Attached devices:
> Host: scsi0 Channel: 00 Id: 00 Lun: 00
>   Vendor: HGST     Model: HUS724040ALS640  Rev: aH1F
>   Type:   Direct-Access                    ANSI  SCSI revision: 06
> 
> [8.7.] /proc/fs/xfs/stat:
> extent_alloc 199 5043 79 570
> abt 0 0 0 0
> blk_map 16325 7059 666 192 363 24055 0
> bmbt 0 0 0 0
> dir 461 594 476 18
> trans 9 4656 252
> ig 0 215 0 383 0 383 954
> log 30 546 0 74041 11
> push_ail 4971 0 1821 59 0 270 0 250 0 3
> xstrat 184 0
> rw 8005 8827
> attr 412 0 0 0
> icluster 22 17 250
> vnodes 4294966698 0 0 0 598 598 598 0
> buf 6693 101 6661 5 0 32 0 45 15
> abtb2 287 399 11 10 0 0 0 0 0 0 0 0 0 0 24
> abtc2 557 766 279 278 0 0 0 0 0 0 0 0 0 0 560
> bmbt2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> ibt2 1896 3700 7 5 0 0 0 0 0 0 0 0 0 0 2
> qm 0 0 0 0 0 0 0 0
> xpc 20508672 20922056 263794352
> debug 0
> 
> Best regards,
> --Edwin
> 
> 
> _______________________________________________
> xfs mailing list
> xfs@xxxxxxxxxxx
> http://oss.sgi.com/mailman/listinfo/xfs

<Prev in Thread] Current Thread [Next in Thread>