<div dir="ltr">Hi Eric,<div><br></div><div>Thanks for the prompt response.</div><div>Sorry for the missing parts, I was wrongly assuming that everybody knows our environment :-)</div><div><br></div><div>More information:<br></div><div><div>uname -a: Linux vsa-00000142 3.8.13-030813-generic #201305111843 SMP Sat May 11 22:44:40 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux</div></div><div>xfs_repair version 3.1.7<br></div><div><div><br></div><div>We are using modified xfs. Mainly, added some reporting features and changed discard operation to be aligned with chunk sizes used in our systems.</div><div>The modified code resides at <span style="font-size:12.8000001907349px"> <a href="https://github.com/zadarastorage/zadara-xfs-pushback" rel="noreferrer" target="_blank" style="font-size:12.8000001907349px">https://github.com/zadarastora</a></span><a href="https://github.com/zadarastorage/zadara-xfs-pushback" rel="noreferrer" target="_blank" style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">ge/zadara-xfs-</span><span class="" style="font-size:12.8000001907349px">pushback</span></a>.</div></div><div><br></div><div>We were in a hurry at the time we run xfs_repair with -L. Was not so smart...</div><div>Any way, the xfs_dump was taken before running xfs_repair.</div><div>We will use the original xfs meta data to run xfs_repair after mount and get back with the results.</div><div><br></div><div>Regards,</div><div>Danny</div><div><br></div><div><br></div><div><br></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Sep 3, 2015 at 4:22 PM, Eric Sandeen <span dir="ltr"><<a href="mailto:sandeen@sandeen.net" target="_blank">sandeen@sandeen.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On 9/3/15 6:09 AM, Danny Shavit wrote:<br>
> Hi Dave,<br>
><br>
> We couple of more xfs corruption that we would like to share:<br>
<br>
</span>On the same box as the one that seemed to be experiencing some<br>
bit-flips in your earlier email?<br>
<br>
As a general note: You are not providing enough information for<br>
us to effectively help you.<br>
<br>
<a href="http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F" rel="noreferrer" target="_blank">http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F</a><br>
<br>
Kernel version? xfsprogs version? At a bare minimum...<br>
<br>
Your dmesg snippets are edited. You've provided what you feel is<br>
important, omitting the parts that may actually be important or<br>
informational.<br>
<br>
You haven't described the sequence of events that led to these issues.<br>
<br>
You haven't made clear what these attachments are; which repair log goes<br>
with which kernel event?<br>
<br>
Etc...<br>
<span class=""><br>
> 1. This is an interesting one, since xfs reported corruption but when<br>
> running xfs_repair, no error was found. Attached is the kernel log<br>
> section regarding the corruption (6458). Does xfs_repair explicitly<br>
> read data from the disk? In such case it might be a memory<br>
> corruption. Are you familiar with such cases?<br>
<br>
</span>Yes, xfs_repair opens the block device O_DIRECT.<br>
<br>
your 6485-kernel.log shows a failure in xfs_allocbt_verify(), right<br>
after the allocation btree is read from disk. i.e. this is an in-kernel<br>
metadata consistency check that is failing.<br>
<br>
It also shows:<br>
<br>
kworker/0:1H Tainted: GF W<br>
<br>
So it's tainted:<br>
<br>
2: 'F' if any module was force loaded by "insmod -f", ' ' if all<br>
modules were loaded normally.<br>
<br>
10: 'W' if a warning has previously been issued by the kernel.<br>
(Though some warnings may set more specific taint flags.)<br>
<br>
You force-loaded a module? And previous warnings were emitted (though we<br>
can't see them in your edited dmesg).<br>
All bets are off. If you had included the full dmesg, we might know<br>
more about what's going on, at least.<br>
<span class=""><br>
> 2. xfs corruption occurred suddenly with no apparent external event.<br>
> Attached are xfs_repair and kernel logs are. Xfs dump can be found<br>
> in: <a href="https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz" rel="noreferrer" target="_blank">https://zadarastorage-public.s3.amazonaws.com/xfs/82.metadump.gz</a><br>
<br>
</span>Your 6442-82-xfs_repair.log is from an xfs_repair -L, so of course it<br>
is finding corruption, and the output is more or less meaningless<br>
from a triage POV. Repair said:<br>
<br>
> Note that destroying the log may cause corruption -- please attempt a mount<br>
> of the filesystem before doing this.<br>
<br>
Why did you run it with -L? Did mount fail? If so how?<br>
<br>
dm-82-kernel.log also shows a failing verifier, this time xfs_bmbt_verify,<br>
when reading metadata from disk.<br>
<br>
You've truncated other parts, though:<br>
<br>
Aug 22 23:24:48 vsa-00000110-vc-0 kernel: [4194599.685353] ffff88010ec36000: ea bb 12 3a 5f 44 01 a8 b9 2a 80 10 b3 a7 d5 af ...:_D...*<br>
......<br>
<br>
so there's not a ton to go on, just hints that there is more information<br>
that's not provided.<br>
<span class="HOEnZb"><font color="#888888"><br>
<br>
-Eric<br>
</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div dir="ltr"><div>Regards,<br></div>Danny<br></div></div>
</div>