XFS File system in trouble

Leslie Rhorer lrhorer at mygrande.net
Tue Aug 4 02:52:33 CDT 2015


	It's failing, again.  The rsync job failed and when I attempt to untar 
the file in the image mount, it fails there, as well.  See below.  I 
formatted a 1.5T drive as xfs and mounted it under /media.  I then 
dumped the failing FS to a file on /media using xfs_metadump and used 
xfs_mdrestore to create an image of the FS.  I then mounted the image, 
copied over the tarball to its location, and ran tar to extract the files:

RAID-Server:/# mount -o nouuid /media/md0.img /TEST

RAID-Server:/# cd "/TEST/Server-Main/Equipment/Drive 
Controllers/HighPoint Adapters/Rocket 2722/Driver"/

RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint 
Adapters/Rocket 2722/Driver# cp "/RAID/Server-Main/Equipment/Drive 
Controllers/HighPoint Adapters/Rocket 2722/Driver/RR_27xx.tar.gz" ./

RAID-Server:/TEST/Server-Main/Equipment/Drive Controllers/HighPoint 
Adapters/Rocket 2722/Driver# tar -xzvf RR_27xx.tar.gz
DC7280/
DC7280/Linux/
DC7280/Linux/Opensource/
DC7280/Linux/Opensource/DC7280-linux-src-v1.0-110621-1313.tar.gz
DC7280/Windows/
DC7280/Windows/Vista-Win2008-Win7/
DC7280/Windows/Vista-Win2008-Win7/x32/
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x32/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/x64/
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.cat
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.inf
DC7280/Windows/Vista-Win2008-Win7/x64/dc7280.sys
DC7280/Windows/Vista-Win2008-Win7/Readme.txt
DC7280/.ddinfo
R272x/
R272x/Linux/
R272x/Linux/Opensource/
R272x/Linux/Opensource/partial/
R272x/Linux/Opensource/partial/include/

...

RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/pcitable
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/readme.txt
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhdd
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step1.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-i386/rhel-install-step2.sh
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Structure needs cleaning
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/install.sh: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/installmethod.py: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modinfo: Cannot 
open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias
tar: RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: 
Cannot mkdir: Input/output error
tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.alias: 
Cannot open: No such file or directory
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64/modules.cgz

gzip: tar: 
RR274x/Driver/Linux/RHEL_CentOS/rr274x_3x-rhel_centos-4u8-x86_64: Cannot 
mkdir: Input/output errorstdin: Input/output error

tar: Unexpected EOF in archive
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot utime: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change ownership to uid 0, 
gid 1000: Input/output error
tar: RR274x/Driver/Linux/RHEL_CentOS: Cannot change mode to rwxr-xr-x: 
Input/output error
tar: RR274x/Driver/Linux: Cannot utime: Input/output error
tar: RR274x/Driver/Linux: Cannot change ownership to uid 0, gid 1000: 
Input/output error
tar: RR274x/Driver/Linux: Cannot change mode to rwxr-xr-x: Input/output 
error
tar: RR274x/Driver: Cannot utime: Input/output error
tar: RR274x/Driver: Cannot change ownership to uid 0, gid 1000: 
Input/output error
tar: RR274x/Driver: Cannot change mode to rwxr-xr-x: Input/output error
tar: RR274x: Cannot utime: Input/output error
tar: RR274x: Cannot change ownership to uid 0, gid 1000: Input/output error
tar: RR274x: Cannot change mode to rwxr-xr-x: Input/output error
tar: Error is not recoverable: exiting now


dmesg:
[131329.013475] XFS (md0): Mounting V4 Filesystem
[131329.918438] XFS (md0): Ending clean mount
[131499.357099] XFS (md0): Mounting V4 Filesystem
[131499.709248] XFS (md0): Ending clean mount
[131874.545344] loop: module loaded
[131874.549914] XFS (loop0): Mounting V4 Filesystem
[131874.555540] XFS (loop0): Ending clean mount
[132020.964431] XFS (loop0): xfs_iread: validation failed for inode 
124656869424 failed
[132020.964435] ffff88028b078000: 49 4e 00 00 03 02 00 00 00 30 00 70 00 
00 03 e8  IN.......0.p....
[132020.964437] ffff88028b078010: 00 00 00 00 06 20 b0 6f 01 2e 00 00 00 
00 00 16  ..... .o........
[132020.964438] ffff88028b078020: 01 57 37 fd 2b 5d 22 9e 1e 0a 61 8c 00 
00 00 20  .W7.+]"...a....
[132020.964440] ffff88028b078030: ff ff 00 d2 1b f6 27 90 00 00 00 00 00 
00 00 00  ......'.........
[132020.964454] XFS (loop0): Internal error xfs_iread at line 392 of 
file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_inode_buf.c. 
Caller xfs_iget+0x24b/0x690 [xfs]
[132020.964457] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 
#1 Debian 3.16.7-ckt11-1
[132020.964459] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964460]  0000000000000001 ffffffff8150b405 ffff880424059800 
ffffffffa09115cb
[132020.964463]  0000018800000010 ffffffffa0916f6b ffff88030f5c6c00 
ffff880424059800
[132020.964465]  0000000000000075 ffff8800ad1afe98 ffffffffa095cb3a 
ffffffffa0916f6b
[132020.964467] Call Trace:
[132020.964471]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964478]  [<ffffffffa09115cb>] ? xfs_corruption_error+0x5b/0x80 [xfs]
[132020.964483]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964492]  [<ffffffffa095cb3a>] ? xfs_iread+0xea/0x400 [xfs]
[132020.964497]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964503]  [<ffffffffa0916f6b>] ? xfs_iget+0x24b/0x690 [xfs]
[132020.964511]  [<ffffffffa0956de6>] ? xfs_ialloc+0xa6/0x500 [xfs]
[132020.964517]  [<ffffffffa092658e>] ? kmem_zone_alloc+0x6e/0xe0 [xfs]
[132020.964525]  [<ffffffffa09572a2>] ? xfs_dir_ialloc+0x62/0x2a0 [xfs]
[132020.964531]  [<ffffffffa09251e5>] ? xfs_trans_reserve+0x1f5/0x200 [xfs]
[132020.964538]  [<ffffffffa09579a9>] ? xfs_create+0x489/0x700 [xfs]
[132020.964541]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964548]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964550]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964551]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964554]  [<ffffffff815115cd>] ? 
system_call_fast_compare_end+0x10/0x15
[132020.964555] XFS (loop0): Corruption detected. Unmount and run xfs_repair
[132020.964564] XFS (loop0): Internal error xfs_trans_cancel at line 959 
of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. 
Caller xfs_create+0x2b2/0x700 [xfs]
[132020.964566] CPU: 2 PID: 21474 Comm: tar Not tainted 3.16.0-4-amd64 
#1 Debian 3.16.7-ckt11-1
[132020.964567] Hardware name: To be filled by O.E.M. To be filled by 
O.E.M./SABERTOOTH 990FX R2.0, BIOS 1503 01/11/2013
[132020.964568]  000000000000000c ffffffff8150b405 ffff8800ad1afe98 
ffffffffa0925e07
[132020.964570]  ffff880002530800 ffff880079e03ec8 ffff880424059800 
ffffffffa09577d2
[132020.964571]  0000000000000001 ffff880079e03e20 ffff880079e03e1c 
ffff880079e03eb0
[132020.964573] Call Trace:
[132020.964575]  [<ffffffff8150b405>] ? dump_stack+0x41/0x51
[132020.964581]  [<ffffffffa0925e07>] ? xfs_trans_cancel+0xc7/0xf0 [xfs]
[132020.964588]  [<ffffffffa09577d2>] ? xfs_create+0x2b2/0x700 [xfs]
[132020.964590]  [<ffffffff811b40ea>] ? kern_path_create+0xaa/0x190
[132020.964596]  [<ffffffffa091c5ea>] ? xfs_generic_create+0xca/0x250 [xfs]
[132020.964598]  [<ffffffff811b7ad0>] ? vfs_mkdir+0xb0/0x160
[132020.964600]  [<ffffffff811b868b>] ? SyS_mkdirat+0xab/0xe0
[132020.964602]  [<ffffffff815115cd>] ? 
system_call_fast_compare_end+0x10/0x15
[132020.964604] XFS (loop0): xfs_do_force_shutdown(0x8) called from line 
960 of file /build/linux-QZaPpC/linux-3.16.7-ckt11/fs/xfs/xfs_trans.c. 
Return address = 0xffffffffa0925e20
[132021.196487] XFS (loop0): Corruption of in-memory data detected. 
Shutting down filesystem
[132021.196491] XFS (loop0): Please umount the filesystem and rectify 
the problem(s)
[132024.791456] XFS (loop0): xfs_log_force: error 5 returned.
[132054.854625] XFS (loop0): xfs_log_force: error 5 returned.
[132084.917775] XFS (loop0): xfs_log_force: error 5 returned.
[132114.980927] XFS (loop0): xfs_log_force: error 5 returned.
[132145.044086] XFS (loop0): xfs_log_force: error 5 returned.
[132175.107307] XFS (loop0): xfs_log_force: error 5 returned.
[132205.170404] XFS (loop0): xfs_log_force: error 5 returned.
[132235.233587] XFS (loop0): xfs_log_force: error 5 returned.


On 8/2/2015 3:24 PM, Leslie Rhorer wrote:
>
>      OK, this is goofy.  It seems to be working, now.  As usual, I've
> been doing some work on the server this weekend, but I can't think of
> anything I have done that would fix the issue.  I did replace the
> remaining good 4G RAM module with a pair of 8G RAM modules, but memtest
> reported the remaining 4G module as good, and I verified the removed
> module really was bad.  I also replaced the removable drive carrier and
> cables that were feeding the two SSDs, once of which was reporting
> failures as noted in the syslog.  It's hard for me to believe either of
> those things could have been causing the issue, though.
>
>      I attached a 1.5T external drive to the server and formatted it as
> XFS in preparation to continue troubleshooting.  To make sure of things,
> I tried decompressing the tarball, again, and this time it worked all
> the way to the end.  I then deleted the entire directory structure
> created by the tarball and decompressed the file again twice.  I'll see
> if the rsync process works.  That will take a couple of days.
>
> On 7/28/2015 5:11 PM, Brian Foster wrote:
>> On Tue, Jul 28, 2015 at 10:13:01AM -0500, Leslie Rhorer wrote:
>>> On 7/28/2015 7:33 AM, Brian Foster wrote:
>>>> On Tue, Jul 28, 2015 at 02:46:45AM -0500, Leslie Rhorer wrote:
>>>>> On 7/20/2015 6:17 AM, Brian Foster wrote:
>>>>>> On Sat, Jul 18, 2015 at 08:02:50PM -0500, Leslie Rhorer wrote:
>>>>>>>
>> ...
>>>>
>>>>>     I then copied both the tarball and the image over to the root,
>>>>> and while
>>>>> the system would not let me create the image on the root, it did
>>>>> let me copy
>>>>> the image to the root.  I then umounted the RAID array, mounted the
>>>>> image,
>>>>> and attempted to cd to the original directory in the image mount
>>>>> where the
>>>>> tarball was saved.  That failed with an I/O error:
>>>>>
>>>>
>>>> It sounds a bit strange for the mdrestore to fail on root but a cp of
>>>> the resulting image to work. Do the resulting images have the same file
>>>> size or is the rootfs copy truncated? If the latter, you could be
>>>> missing part of the fs and thus any of the following tests are probably
>>>> moot.
>>>
>>>     Well, it can't be as large as it is reported, let's put it that way,
>>> although the reported file size is the same.  Ls claims it to be 16T in
>>> size, which cannot be the case on a 100G partition.  I forgot to
>>> mention cp
>>> does complain:
>>>
>>> RAID-Server:/# cp /RAID/TEST/RAIDfile.img ./
>>> cp: cannot lseek ‘./RAIDfile.img’: Invalid argument
>>>
>>>     But it does the same thing on the backup server, and it works
>>> there.  I
>>> tried a cmp, and it seems to be hung.  It just may be taking a long
>>> time,
>>> however.
>>>
>>
>> Yeah, you can't really trust the resulting image. It doesn't take much
>> space to create a very large sparse file, but different filesystems have
>> different maximum file size limits. The problem here is that some
>> metadata near the beginning of the file might reference or depend on
>> something near the end, and I/Os beyond the end of the file will
>> probably result in errors.
>>
>> I'd probably try the nouuid approach since the hardware is similar as
>> well as some of the other interesting suggestions that have been made to
>> try and get the image on the rootfs and see what happens there too.
>>
>> Brian
>>
>>>> Brian
>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/"
>>>>> bash: cd: /media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722/Driver/: Input/output error
>>>>>
>>>>>     I changed directories to a point two directories above the
>>>>> previous attempt
>>>>> and did a long listing:
>>>>>
>>>>> RAID-Server:/# cd "/media/Server-Main/Equipment/Drive
>>>>> Controllers/HighPoint
>>>>> Adapters"
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# ll
>>>>> ls: cannot access RocketRAID 2722: Input/output error
>>>>> total 4
>>>>> drwxr-xr-x 6 root lrhorer 4096 Jul 18 19:26 Rocket 2722
>>>>> ?????????? ? ?    ?          ?            ? RocketRAID 2722
>>>>>
>>>>>     As you can see, Rocket 2722 is still there, but RocketRAID 2722
>>>>> is very
>>>>> sick.  Rocket 2722 is the parent of where the tarbal was, however,
>>>>> so I did
>>>>> a cd and an ll again:
>>>>>
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters# cd "Rocket 2722"/
>>>>> RAID-Server:/media/Server-Main/Equipment/Drive Controllers/HighPoint
>>>>> Adapters/Rocket 2722# ll
>>>>> ls: cannot access BIOS: Input/output error
>>>>> ls: cannot access Driver: Input/output error
>>>>> ls: cannot access HighPoint RAID Management Software: Input/output
>>>>> error
>>>>> ls: cannot access Manual: Input/output error
>>>>> total 248
>>>>> -rwxr--r-- 1 root lrhorer 245760 Nov 20  2008 autorun.exe
>>>>> -rwxr--r-- 1 root lrhorer     51 Mar 21  2001 autorun.inf
>>>>> ?????????? ? ?    ?            ?            ? BIOS
>>>>> ?????????? ? ?    ?            ?            ? Driver
>>>>> ?????????? ? ?    ?            ?            ? HighPoint RAID
>>>>> Management
>>>>> Software
>>>>> ?????????? ? ?    ?            ?            ? Manual
>>>>> -rwxr--r-- 1 root lrhorer   1134 Feb  5  2012 readme.txt
>>>>>
>>>>>     So now, what?
>>>>>
>>>>> _______________________________________________
>>>>> xfs mailing list
>>>>> xfs at oss.sgi.com
>>>>> http://oss.sgi.com/mailman/listinfo/xfs
>>>>
>>>
>>> _______________________________________________
>>> xfs mailing list
>>> xfs at oss.sgi.com
>>> http://oss.sgi.com/mailman/listinfo/xfs
>>
>



More information about the xfs mailing list