XFS File system in trouble

Leslie Rhorer lrhorer at mygrande.net
Thu Aug 13 01:21:16 CDT 2015


	The compressed tarball containing the dump file and the image are on my 
web site.

http://fletchergeek.com/images/metadump.tar.gz

	It's 22G in size.

On 8/9/2015 8:37 PM, Leslie Rhorer wrote:
> Well, nice try, but it doesn't wash for several reasons:
>
> 1. Power supply issues would be highly unlikely to be the cause of such
> a highly specific failure at always a very specific point in a process.
>   Problems would crop up all over the place, not just with one, very
> specific failure.  While I am thinking of it, I also ran memtest86+
> again on the new memory.  It passed all tests with flying colors.
>
> 2. The system has not been under a heavy load when this happens.  In
> fact, it's piddling.  Rsync and tar are single threaded, eating up at
> most 1 CPU core at a time.  I have processes that can regularly bang all
> 8 cores right to the wall with no errors.  The I/O stream is even more
> piddling.  Rsync is transferring nearly 120 MBps (it's a 1G link) during
> the process, and some portions of the tar process can bang out well over
> 2Gbps.  Creating a directory is nothing.
>
> 3.  All the power supply rails are nominal - I checked.
>
> 4. Most damning of all, I am able to reproduce the issue, now, on
> another machine.  I'm not entirely sure why creating the image on one
> partition and then copying it to the root or across the LAN stopped it
> from failing, but I took the 1.5T drive and moved it to the backup
> machine, which as I related earlier is nearly identical in hardware and
> highly similar in software to the primary system.  It's failing there
> repeatedly and consistently:
>
> RR274x/Driver/Freebsd/rr274x_3x-bsd-8.0-v1.0.10.0712.tgz
> RR274x/Driver/Linux/
> RR274x/Driver/Linux/Debian/
> tar: RR274x/Driver/Linux/Debian: Cannot mkdir: Structure needs cleaning
> RR274x/Driver/Linux/Debian/rr274x_3x-debian-5.0.1-i386/
> tar: RR274x/Driver/Linux/Debian: Cannot mkdir: Input/output error
> tar: RR274x/Driver/Linux/Debian/rr274x_3x-debian-5.0.1-i386: Cannot
> mkdir: No such file or directory
> RR274x/Driver/Linux/Debian/rr274x_3x-debian-5.0.1-i386/boot/
> tar: RR274x/Driver/Linux/Debian: Cannot mkdir: Input/output error
> tar: RR274x/Driver/Linux/Debian/rr274x_3x-debian-5.0.1-i386/boot: Cannot
> mkdir: No such file or directory
> RR274x/Driver/Linux/Debian/rr274x_3x-debian-5.0.1-i386/boot/rr274x_3x2.6.26-2-486i386.ko.gz
>
> tar: RR274x/Driver/Linux/Debian: Cannot mkdir: Input/output error
>
> gzip: stdin: Input/output error
> tar: Unexpected EOF in archive
> tar: RR274x/Driver/Linux: Cannot utime: Input/output error
> tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000:
> Input/output error
> tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output
> error
> tar: RR274x/Driver: Cannot utime: Input/output error
> tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000:
> Input/output error
> tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error
> tar: RR274x: Cannot utime: Input/output error
> tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error
> tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error
> tar: Error is not recoverable: exiting now
>
>
> dmesg:
> [26743.775522] XFS (sdk): Mounting V4 Filesystem
> [26743.904281] XFS (sdk): Ending clean mount
> [26743.912614] Loading kernel module for a network device with
> CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev- instead.
>
> <repeats>
>
> [26772.528827] loop: module loaded
> [26772.601043] XFS (loop0): Mounting V4 Filesystem
> [26772.764360] XFS (loop0): Ending clean mount
> [26772.770627] Loading kernel module for a network device with
> CAP_SYS_MODULE (deprecated).  Use CAP_NET_ADMIN and alias netdev- instead.
>
> <repeats>
>
> [26899.019942] XFS (loop0): xfs_iread: validation failed for inode
> 124656869424 failed
> [26899.019952] ffff8800b473e000: 49 4e 00 00 03 02 00 00 00 30 00 70 00
> 00 03 e8  IN.......0.p....
> [26899.019957] ffff8800b473e010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00
> 00 00 16  ..... .o........
> [26899.019960] ffff8800b473e020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00
> 00 00 20  .W7.+]"...a....
> [26899.019964] ffff8800b473e030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00
> 00 00 00  ......'.........
> [26899.019993] XFS (loop0): Internal error xfs_iread at line 392 of file
> /build/linux-u5KAtC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c.  Caller
> xfs_iget+0x24b/0x690 [xfs]
> [26899.020000] CPU: 6 PID: 3756 Comm: tar Not tainted 3.16.0-4-amd64 #1
> Debian 3.16.7-ckt11-1+deb8u2
> [26899.020004] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./SABERTOOTH 990FX R2.0, BIOS 0803 08/15/2012
> [26899.020007]  0000000000000001 ffffffff8150b3d5 ffff8800065b9800
> ffffffffa06bd5cb
> [26899.020014]  0000018800000010 ffffffffa06c2f6b ffff88000a680400
> ffff8800065b9800
> [26899.020019]  0000000000000075 ffff88000527f140 ffffffffa0708b3a
> ffffffffa06c2f6b
> [26899.020024] Call Trace:
> [26899.020034]  [<ffffffff8150b3d5>] ? dump_stack+0x41/0x51
> [26899.020052]  [<ffffffffa06bd5cb>] ? xfs_corruption_error+0x5b/0x80 [xfs]
> [26899.020069]  [<ffffffffa06c2f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [26899.020090]  [<ffffffffa0708b3a>] ? xfs_iread+0xea/0x400 [xfs]
> [26899.020106]  [<ffffffffa06c2f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [26899.020124]  [<ffffffffa06c2f6b>] ? xfs_iget+0x24b/0x690 [xfs]
> [26899.020146]  [<ffffffffa0702de6>] ? xfs_ialloc+0xa6/0x500 [xfs]
> [26899.020192]  [<ffffffffa06d258e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs]
> [26899.020215]  [<ffffffffa07032a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs]
> [26899.020237]  [<ffffffffa06d11e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs]
> [26899.020261]  [<ffffffffa07039a9>] ? xfs_create+0x489/0x700 [xfs]
> [26899.020267]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
> [26899.020286]  [<ffffffffa06c85ea>] ? xfs_generic_create+0xca/0x250 [xfs]
> [26899.020292]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
> [26899.020296]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
> [26899.020303]  [<ffffffff8151158d>] ?
> system_call_fast_compare_end+0x10/0x15
> [26899.020307] XFS (loop0): Corruption detected. Unmount and run xfs_repair
> [26899.020337] XFS (loop0): Internal error xfs_trans_cancel at line 959
> of file /build/linux-u5KAtC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c.
> Caller xfs_create+0x2b2/0x700 [xfs]
> [26899.020342] CPU: 6 PID: 3756 Comm: tar Not tainted 3.16.0-4-amd64 #1
> Debian 3.16.7-ckt11-1+deb8u2
> [26899.020345] Hardware name: To be filled by O.E.M. To be filled by
> O.E.M./SABERTOOTH 990FX R2.0, BIOS 0803 08/15/2012
> [26899.020347]  000000000000000c ffffffff8150b3d5 ffff88000527f140
> ffffffffa06d1e07
> [26899.020354]  ffff88000a729800 ffff8800066e3ec8 ffff8800065b9800
> ffffffffa07037d2
> [26899.020359]  0000000000000001 ffff8800066e3e20 ffff8800066e3e1c
> ffff8800066e3eb0
> [26899.020364] Call Trace:
> [26899.020370]  [<ffffffff8150b3d5>] ? dump_stack+0x41/0x51
> [26899.020388]  [<ffffffffa06d1e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs]
> [26899.020409]  [<ffffffffa07037d2>] ? xfs_create+0x2b2/0x700 [xfs]
> [26899.020414]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
> [26899.020432]  [<ffffffffa06c85ea>] ? xfs_generic_create+0xca/0x250 [xfs]
> [26899.020437]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
> [26899.020442]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
> [26899.020447]  [<ffffffff8151158d>] ?
> system_call_fast_compare_end+0x10/0x15
> [26899.020454] XFS (loop0): xfs_do_force_shutdown(0x8) called from line
> 960 of file /build/linux-u5KAtC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c.
> Return address = 0xffffffffa06d1e20
> [26899.407181] XFS (loop0): Corruption of in-memory data detected.
> Shutting down filesystem
> [26899.407190] XFS (loop0): Please umount the filesystem and rectify the
> problem(s)
> [26923.319559] XFS (loop0): xfs_log_force: error 5 returned.
>
> <repeats>
>
> Xfs_repair still reports no faults.  I'm compressing the dump file and
> image file right now to be posted on http:/flethergeek.com/images when
> it is done, but it is taking a very long time.  I'll also try
> decompresssing the image to the other array to see if it still fails
> before I upload the file.  'No point in uploading if putting it through
> the compression process results in an image that does not fail.
>
> On 8/4/2015 5:42 PM, Dave Chinner wrote:
>> On Tue, Aug 04, 2015 at 02:52:33AM -0500, Leslie Rhorer wrote:
>>>     It's failing, again.  The rsync job failed and when I attempt to
>>> untar the file in the image mount, it fails there, as well.  See
>>> below.  I formatted a 1.5T drive as xfs and mounted it under /media.
>>> I then dumped the failing FS to a file on /media using xfs_metadump
>>> and used xfs_mdrestore to create an image of the FS.  I then mounted
>>> the image, copied over the tarball to its location, and ran tar to
>>> extract the files:
>>>
>>> [131874.545344] loop: module loaded
>>> [131874.549914] XFS (loop0): Mounting V4 Filesystem
>>                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>
>>> [131874.555540] XFS (loop0): Ending clean mount
>>> [132020.964431] XFS (loop0): xfs_iread: validation failed for inode
>>> 124656869424 failed
>>> [132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70
>>> 00 00 03 e8  IN.......0.p....
>>> [132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00
>>> 00 00 00 16  ..... .o........
>>> [132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c
>>> 00 00 00 20  .W7.+]"...a....
>>> [132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00
>>> 00 00 00 00  ......'.........
>>> [132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of
>>> file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c.
>>> Caller xfs_iget+0x24b/0x690 [xfs]
>>
>> That's a different error to all the ones you've previously posted.
>> This is an inode allocation that has found a bad inode on disk.
>>
>> Decoding the 64 bytes above:
>>
>>     di_magic = 0x494e
>>     di_mode = 0
>>     di_version = 3            <<< That's *wrong*
>>     di_format = 2
>>     di_onlink = 0
>>     di_uid = 0x300070        <<< Looks unlikely
>>     di_gid = 0x3e8
>> ----
>>     di_nlink = 0
>>     di_projlo = 0x620        <<< should be zero
>>     di_projhi = 0xb06f        <<< should be zero
>>     di_pad[6] = 0x1 0x2e 0 0 0 0    <<< should be zero
>>     di_flushiter = 0x16        <<< should be zero for v3 inode
>> ---
>>     di_atime    <random>
>>     di_mtime    <random, should be similar to atime>
>>     di_ctime    <random, should be similar/same as mtime>
>>     di_size = 0x20ffff00d2        <<< should be zero
>> ----
>>     di_nblocks = 0x1bf6279000000000 <<< should be zero
>>     di_extsize = 0
>> ----
>>
>> You've just created and mounted a v4 filesystem, which means it is
>> using v2 inodes. This inode read back as a v3 inode, with lots of
>> crap in places where there should be zeros for either v2 or v3 inodes.
>>
>> This does not look like a filesystem problem - it's clear that what
>> has come from disk (or a cached memory buffer) is full of garbage
>> and contains invalid configuration, and the filesystem has quite
>> correctly detected the corruption and shut down. The filesystem
>> would give the same errors if it tried to *write* such a corrupt
>> block, so we know what was just been detected has not come from the
>> filesytem code...
>>
>> FWIW, I've occasionally seen this sort of thing happen when a power
>> supply had gone bad - it wasn't bad enough to make things fail, it
>> ust caused transient issues under load that manifest as corruptions
>> and crashes. Given that you've already found one set of hardware
>> problems and the corruption patterns are unlike any
>> filesystem/storage problem I've ever seen, I'd suggest that you
>> still have some kind of hardware issue...
>>
>> Cheers,
>>
>> Dave.
>>
>



More information about the xfs mailing list