xfs
[Top] [All Lists]

Re: xfs_repair fails after trying to format log cycle?

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: xfs_repair fails after trying to format log cycle?
From: Andrew Ryder <tireman@xxxxxxx>
Date: Tue, 12 Apr 2016 02:54:27 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=shaw.ca; s=s20150330; t=1460444049; bh=KpablPIG80SuJVHEYf/MEMVom81NXGsebj4a4yCoXpM=; h=Subject:To:References:Cc:From:Date:In-Reply-To; b=NuulD8JxgSe3zkdRTPccOVrMHYZC+0bnaBDWgNKYsUaLGB2dDyHaYS0uWWXHgvUUi HyuFdDLcGHzPGHRQEkC6nYDVSNJ8zvUnSRvpWFN5RzFGTEIY7INxT/AW7ZzW5Lmb2+ 3kg40R7zDRdMJNd6Hry9Dco6gLOPCIJsesPksgivgaLOHcGtAR7QvRSnNPcZclKMCj aWcCciJEOrZizAJ7wcJVHqVqUQ5gjE8XoPA8HmVVmNacPJiinjkDAvihGkQvLTognC QyorOEMLeLW68p5jMg0dvMZYlIePPWHbyBK4/JGwUcsSPCEL4/0eAkcouz/96Bk+e8 RFYLfbuJUq3tA==
In-reply-to: <20160328085541.GA27040@xxxxxxxxxxxxxxx>
References: <56F6DE67.60403@xxxxxxx> <20160328085541.GA27040@xxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.6.0
Attached are the outputs of strace for the 3 runs required of xfs_repair to get he fs to mount again.



On 03/28/2016 04:55 AM, Brian Foster wrote:
On Sat, Mar 26, 2016 at 03:09:27PM -0400, Andrew Ryder wrote:
Hello,

I have an mdadm array with a xfs v5 filesystem on it and its begun to give
me issues when trying to mount it as well as complete xfs_repair. Not sure
if anyone might be able to shed some light on what is going on or how to
correct the issue?

When I try and mount the fs, it complains with:

[  388.479847] XFS (md2): Mounting V5 Filesystem
[  388.494686] XFS (md2): metadata I/O error: block 0x15d6d39c0
("xlog_bread_noalign") error 5 numblks 8192
[  388.495013] XFS (md2): failed to find log head
[  388.495018] XFS (md2): log mount/recovery failed: error -5
[  388.495090] XFS (md2): log mount failed


So a read I/O error from the kernel...


This is where its not making any sense for me, If I try and run "xfs_repair
/dev/md2" it fails with:

Phase 1 - find and verify superblock...
         - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
         - zero log...
xfs_repair: read failed: Input/output error
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=-5)

fatal error -- ERROR: The log head and/or tail cannot be discovered. Attempt
to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.


... similar read error from xfsprogs...


But if I run "xfs_repair -L /dev/md2" which gives:

Phase 1 - find and verify superblock...
         - reporting progress in intervals of 15 minutes
Phase 2 - using internal log
         - zero log...
xfs_repair: read failed: Input/output error
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=-5)
xfs_repair: libxfs_device_zero write failed: Input/output error


... and it looks like it fails to write as well when trying to zero the
log...

then try and re-run "xfs_repair /dev/md2" it starts traversing the
filesystem all the way to "Phase 7" then errors with:

Phase 7 - verify and correct link counts...
         - 14:36:55: verify and correct link counts - 33 of 33 allocation
groups done
Maximum metadata LSN (64:2230592) is ahead of log (0:0).
Format log to cycle 67.
xfs_repair: libxfs_device_zero write failed: Input/output error


Yet at this point I can now mount the filesystem..


... and this is effectively a repeat of the write error as we try to
format the log with a correct LSN based on the metadata LSN tracked by
the repair process. Your kernel is old enough that runtime probably
won't complain either way (note that 3.19 might be considered a fairly
early kernel for using CRC support). Perhaps the first write attempt
zeroed enough of the log before it failed that log recovery wasn't
required, and thus these problematic I/Os were avoided.

What's the history of this fs? Has it been working for some time, just
recently formatted? What lead to the need for log recovery? What's the
mdadm --detail info, member device size, total array size, xfs_info of
the filesystem, etc..?

Does xfs_repair run clean at this point? If so, does 'xfs_repair -L'
still reproduce the write error (note that I'm assuming you have a clean
log such that this command will not cause data loss). If so, an strace
of the repair process might be interesting...

Brian


Checking the drives with smartctl shows no errors nor does 'dmesg' show any
hardware i/o or controller related errors...

I've tried scrubbing the array and no bad sectors are found either..

I'm running kernel 3.19.8 with xfsprogs 4.5.

Thanks,
Andrew

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs

Attachment: strace_2.1.txt
Description: Text document

Attachment: strace_2.2.txt
Description: Text document

Attachment: strace_2.3.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>