[Top] [All Lists]

Re: Xfs_repair segfaults.

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Xfs_repair segfaults.
From: Filippo Stenico <filippo.stenico@xxxxxxxxx>
Date: Wed, 8 May 2013 19:30:05 +0200
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=3eEZNd6sxAm22ev4Q3gIuXU5MlStPQUiM+G1QcWMOnY=; b=mVm6V3TIhsD9XI/dxFdRPh36esJs3k2e3wnSROCxWl7ZsnhCwA7l1X9B56wi3/Qmxh M/OeW2CpNJ1bga2JCYCYYc0vJE1Hg34rIrnYIG44HW7PU/KR7yP0koC5YG85WCMgLZPQ j3d9X9Z0z7HtZff1HGX9T1BekuwuMFPDq9fpZ4rEa4KaUIGE9tcXMtFHZwu/Ybhp3cQI 1jGttnWvnIHoERNeLvvt116y3/6MFUrXpgkQEJdbpJKRD2TP7Ah4oyLdNTS0LBR6dGap VBEEmxN/RZeT8DyHJk7AyUXKUVnFSzkBjTru1cLkSbImaIkxrCBBo4OeKy2B5U7BkoJY z7XA==
In-reply-to: <CADNx=Kv=7TFkJKom6_JM8B+A6YRckj175TbCcHrTWYL_N-qkYw@xxxxxxxxxxxxxx>
References: <CADNx=KsT9DC=vveyTZx8EovddFx9mhRS-yzygaORHZ_4VyXfzQ@xxxxxxxxxxxxxx> <5187BF8A.2040303@xxxxxxxxxxx> <CADNx=KvZPhbRn9Kc3+KdoAY7jZ3U0uyG6wgUB5vcxX85CkFdQg@xxxxxxxxxxxxxx> <CADNx=Kv0bt3fNGW8Y24GziW9MOO-+b7fBGub4AYP70b5gAegxw@xxxxxxxxxxxxxx> <5188FF88.6000508@xxxxxxxxxxx> <CADNx=KvmA7jgqBUO0YvKddHvFaqxHNZKfF3eWajOW8GWKwNhbA@xxxxxxxxxxxxxx> <CADNx=Kv=7TFkJKom6_JM8B+A6YRckj175TbCcHrTWYL_N-qkYw@xxxxxxxxxxxxxx>
-m option seems not to handle the excessive memory consumption I ran into.
I actually ran xfs_repair -vv -m1750 and  looking into kern.log it  seems that xfs_repair invoked oom killer, but was killed itself ( !! )

This is last try to reproduce segfault:
xfs_repair -vv -P -m1750

On Tue, May 7, 2013 at 8:20 PM, Filippo Stenico <filippo.stenico@xxxxxxxxx> wrote:
xfs_repair -L -vv -P /dev/mapper/vg0-lv0 does the same kernel panic as my first report. No use to double info on this.
I'll try xfs_repair -L -vv -P -m 2000 to keep memory consuption at a limit.

On Tue, May 7, 2013 at 3:36 PM, Filippo Stenico <filippo.stenico@xxxxxxxxx> wrote:

On Tue, May 7, 2013 at 3:20 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 5/7/13 4:27 AM, Filippo Stenico wrote:
> Hello,
> this is a start-over to try hard to recover some more data out of my raid5 - lvm - xfs toasted volume.
> My goal is either to try the best to get some more data out of the volume, and see if I can reproduce the segfault.
> I compiled xfsprogs 3.1.9 from deb-source. I ran a xfs_metarestore to put original metadata on the cloned raid volume i had zeroed the log on before via xfs_repair -L (i figured none of the actual data was modified before as I am just working on metadata.. right?).
> Then I ran a mount, checked a dir that I knew it was corrupted, unmount and try an xfs_repair (commands.txt attached for details)
> I went home to sleep, but at morning I found out that kernel paniced due "out of memory and no killable process".
> I ran repair without -P... Should I try now disabling inode prefetch?
> Attached are also output of "free" and "top" at time of panic, as well as the output of xfs_repair and strace attached to it. Dont think gdb symbols would help here....


Ho hum, well, no segfault this time, just an out of memory error?
That's right....
No real way to know where it went from the available data I think.

A few things:

> root@ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
> mount: Structure needs cleaning

mount failed?  Now's the time to look at dmesg to see why.
>From attached logs it seems to be:

> XFS internal error xlog_valid_rec_header(1) at line 3466 of file [...2.6.32...]/fs/xfs/xfs_log_recover.c
> XFS: log mount/recovery failed: error 117

> root@ws1000:~# mount

<no raid0 mounted>

> root@ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
> root@ws1000:~# mount | grep raid0
> /dev/mapper/vg0-lv0 on /raid0/data type xfs (rw,relatime,attr2,noquota)

Uh, now it worked, with no other steps in between?  That's a little odd.
Looks odd to me too. But i just copied the commands issued as they where on my console... so yes, nothing in between. 
It found a clean log this time:

> XFS mounting filesystem dm-1
> Ending clean XFS mount for filesystem: dm-1

which is unexpected.

So the memory consumption might be a bug but there's not enough info to go on here.

> PS. Let me know if you wish reports like this one on list.

worth reporting, but I'm not sure what we can do with it.
Your storage is in pretty bad shape, and xfs_repair can't make something out
of nothing.

I still got back around 6TB out of 7.2 TB of total data stored, so this tells xfs is reliable even when major faults occur...

Thanks anyways, I am trying with a "-L" repair, at this step I expect another fail (due out of memory or something, as it happened last time) then I will try with "xfs_repair -L -vv -P" and I expect to see that segfault again.

Will report next steps, maybe something interesting for you will pop up... for me is not a waste of time, since this last try is worth being made.



<Prev in Thread] Current Thread [Next in Thread>