xfs
[Top] [All Lists]

Re: Read corruption on ARM

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Read corruption on ARM
From: Jason Detring <detringj@xxxxxxxxx>
Date: Thu, 28 Feb 2013 15:38:51 -0600
Cc: xfs-oss <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=03qaXRHIViAiuNuQE3NAkLxmY5q5BBjSrdwXEUzQvos=; b=C3MEYQE+LWrOSal+cmpe7z6uqY2hdwl9/zk4TtaKaW3aysUIU5USkrdugw0P8UUMkb vq3fJS6and38JAGKaWWkfe2rMmZrk65CpCwCV6cV78co2qTfGs+UPA2e6+QWbszM2YuK PxJr9nE+BQbeJiORSPNOXgIYpg2+B+HG/9XtHfExYYCzxvNAV6G69DhB/05SMwOhTDtR m2Ly6q66ReG73I86OhsCTApIOqrh53g+IH357dusfzcTVjhFGnFKT1VzD4MMJU6jbRBf i+DnLdye4ONW3XUeJQ0+5YW/5aM851wS0dOXAmIkFYKRGNtS3tBtUuXeVxS7GGBBAsaK M/Tg==
In-reply-to: <512EEAB8.4070306@xxxxxxxxxxx>
References: <CA+AKrqBQ=VG0oVsai+agywDKRgO9cG9AvT6mCTSZxKO3Si5Aiw@xxxxxxxxxxxxxx> <512D3856.5050305@xxxxxxxxxxx> <CA+AKrqC+6nXuCxdY08MBLsjv1fOPJ6=1ruTHsfGqxosQmCi_jQ@xxxxxxxxxxxxxx> <512D49E2.40003@xxxxxxxxxxx> <CA+AKrqCrphO-eKy0n=70O9hmB3mXttOsKmTdfRnPxgJM3_PAkQ@xxxxxxxxxxxxxx> <512E3BB2.6060407@xxxxxxxxxxx> <CA+AKrqDq5xCNQo1X=MeRBq54ka0FGJEV5Rn6OzwY7eBfJ+8Wkw@xxxxxxxxxxxxxx> <512E7639.20205@xxxxxxxxxxx> <512E89C2.9000302@xxxxxxxxxxx> <CA+AKrqDaY4cgP+EPLepzUOU2jAOygTuj-0xDtOaGf+O0aRZV_g@xxxxxxxxxxxxxx> <512E903A.2020405@xxxxxxxxxxx> <CA+AKrqAv7-5gGj_cNBNj=-nChKPzi+_HZmH=z2UABG9pDOmpBg@xxxxxxxxxxxxxx> <512EDF37.4050802@xxxxxxxxxxx> <512EE20A.7010103@xxxxxxxxxxx> <512EEAB8.4070306@xxxxxxxxxxx>
On 2/27/13, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
> On 2/27/13 10:50 PM, Eric Sandeen wrote:
>> On 2/27/13 10:38 PM, Eric Sandeen wrote:
>>
>> ...
>>
>>> re-cc'ing xfs list
>>>
>>> So I used pahole to look at all structs, objdump -d to disassemble,
>>> and md5sum'd the results to see what's different.
>>>
>>> pi@raspberrypi ~ $ md5sum cross/*.dis cross/*.pahole native/*.dis
>>> native/*.pahole
>>>
>>> <manual sort>
>>>
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O1-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-O2-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  cross/xfs-Os-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O1-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-O2-g.ko.pahole
>>> c0abd80c3bf049db5e1909fd851261cc  native/xfs-Os-g.ko.pahole
>>>
>>> so all structures look identical, good - but:
>>>
>>> while disassembly of these two modules match:
>>>
>>> d76f6ebf4d8a1b9f786facefbcf16f69  cross/xfs-O1-g.ko.dis
>>> d76f6ebf4d8a1b9f786facefbcf16f69  native/xfs-O1-g.ko.dis
>>>
>>> do you see the problem w/ the cross-compiled xfs-O1-g.ko as well?

No, I didn't.  The problem has only shown itself on the -O2 builds,
both native and cross-compiled.  Lower optimization levels don't show
any of the symptoms.

Perhaps a better comparison would be-O2 builds among working and
non-working compilers?   You'd asked for these before, but I just
finished them today.  The modules, build logs, and fs/xfs/ build trees
are up at
  <http://www.splack.org/~jason/projects/xfs-arm-corruption/3.6.11-g89caf39/>
A quick rundown:
  -cross-gcc4.4:  OK
  -cross-gcc4.5:  OK
  -cross-gcc4.6:  BAD
  -cross-gcc4.7:  BAD
  -cross-gcc4.8:  OK
Some of these don't seem to want to rmmod after they've been inserted.
 Argh reboots.


>>> the others differ:
>>>
>>> 349f3490a49f2ce539c2b058914f64f0  native/xfs-Os-g.ko.dis
>>> 91c8e8230774808b538c21a83106a5d7  cross/xfs-Os-g.ko.dis
>>>
>>> 649338e1b8eeed6a294504fc76a39cb0  native/xfs-O2-g.ko.dis
>>> e52c2a48277326c313bba76aa0b33ab7  cross/xfs-O2-g.ko.dis
>>>
>>> The diff of the disassembly of the others is huge, hard to
>>> know where to start just yet.  Need an objdump mode that only
>>> shows function-relative addresses or something to cut down
>>> on the noise.
>>
>> Could you try the same, to isolate the differences: objdump -d
>> all of the *.o files for, say, the -O2 build, md5sum & compare,
>> and see which ones differ?

Er, uh...  oops! :-)    I'd scrubbed the objects between each test, so
each module had to be regenerated.  So, the intermediate objects won't
match the various xfs-O2-g.ko's you've already downloaded.  Look in
the -cross-gcc4.7 and -native-gcc4.7 subdirectories for new copies.


# pwd
/xfsdebug/tracetest/3.6.11-g89caf39/xfs-modules-native-gcc4.7/xfs-O2-g-obj
# for obj in *.o; do
if [ "$(objdump -d $obj | md5sum)" != "$(cd
../../xfs-modules-cross-gcc4.7/xfs-O2-g-obj/ && objdump -d $obj |
md5sum)" ]; then
echo "obj $obj is different";  fi; done
obj xfs.o is different
obj xfs_attr_leaf.o is different
obj xfs_bmap.o is different
obj xfs_dir2_block.o is different
obj xfs_itable.o is different
obj xfs_log.o is different
obj xfs_log_recover.o is different



> And one more test.  Every time you hit the error, it causes
> a log replay on the next mount since the fs has shut down.
>
> Can you try
>
> # mount; umount; mount; test
>
> so that you start the test from a clean mount, and see if you still hit it?
>
> Maybe save that image off before you do that test just in case it changes
> the state.

I'm not sure on that.  Even in read-write mode, the notice in my
kernel log has always been "Corruption detected.  Unmount and run
xfs_repair".  It's never been a forced filesystem shutdown, just a
stern warning and half-accessible files.  The next mount always seems
to be clean.

[89574.079876] XFS (loop0): Corruption detected. Unmount and run xfs_repair
[89587.269316] XFS (loop0): Mounting Filesystem
[89587.444629] XFS (loop0): Ending clean mount

I usually mount read-only and it doesn't seem like the image's md5sum
doesn't change between runs.  I made a copy then mounted it read-write
a time or two.  The md5sum changed between mounts.  However, I am
still seeing the error when attempting to read the directory.  The
mounted-rw-checked image is up at
  <http://www.splack.org/~jason/projects/xfs-arm-corruption/journalreplaytest/>


Jason

<Prev in Thread] Current Thread [Next in Thread>