[Top] [All Lists]

Re: Xfs_repair segfaults.

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Xfs_repair segfaults.
From: Filippo Stenico <filippo.stenico@xxxxxxxxx>
Date: Wed, 8 May 2013 19:42:26 +0200
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=x-received:mime-version:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=dxDjSvpJON9an+3tUP8zzHNrnDF0lKqoCAdnbHB2DTo=; b=Un+QozZXYkID5xlM7Be4CfJ8jBRlRO6MtAbcZm60jfFMLtKSxdbJC8Firl9oNlpoi0 PUQslgXJJMbDgt/FGxGowfIlcAemn18RJhAXYmW1MGHTi730nkpNsHnr5wuXkPIdXPKx 5bkWKm95Ym9b9eoqHerRybsV3o+zxFNXkFQdR1GNs3bTbMdyl4tye5KtCeHmuZtMNCgY 3YCeDYvY9mU5OQYJSHQ6vmXyjfpRVnyJBHPuC4liGbVgHGPEboTu/j97ZDZ4NO39au5L uO4aGZ34f9rRf9AysZw6Y3XHTu+6Hx9Kz+D5H1Pz9gQjK9bk4WuVuQXhegw9lLRaW/Hh 7yuw==
In-reply-to: <CADNx=KvxtCHEk9G07yS88-casVAXBxRTghV5nGKd_0LtcqpYpQ@xxxxxxxxxxxxxx>
References: <CADNx=KsT9DC=vveyTZx8EovddFx9mhRS-yzygaORHZ_4VyXfzQ@xxxxxxxxxxxxxx> <5187BF8A.2040303@xxxxxxxxxxx> <CADNx=KvZPhbRn9Kc3+KdoAY7jZ3U0uyG6wgUB5vcxX85CkFdQg@xxxxxxxxxxxxxx> <CADNx=Kv0bt3fNGW8Y24GziW9MOO-+b7fBGub4AYP70b5gAegxw@xxxxxxxxxxxxxx> <5188FF88.6000508@xxxxxxxxxxx> <CADNx=KvmA7jgqBUO0YvKddHvFaqxHNZKfF3eWajOW8GWKwNhbA@xxxxxxxxxxxxxx> <CADNx=Kv=7TFkJKom6_JM8B+A6YRckj175TbCcHrTWYL_N-qkYw@xxxxxxxxxxxxxx> <CADNx=KvxtCHEk9G07yS88-casVAXBxRTghV5nGKd_0LtcqpYpQ@xxxxxxxxxxxxxx>
As I was writing it happened.

I got segfault at same place as reported first time.

So, uhm, what do you need to see?

I was putting together:
- machine info (cat /proc/meminfo, cat /proc/cpuinfo, uname -r)
- kernel log
- core.dump with debugging symbols
- output of xfs_repair
- output of strace $(xfs_repair)

I will put all the logs in place and send tomorrow as ... i forgot to raise core size limit for my shell ... and today I am toasted and I am better to go home and sleep.

Tell me if you need anything else.


On Wed, May 8, 2013 at 7:30 PM, Filippo Stenico <filippo.stenico@xxxxxxxxx> wrote:
-m option seems not to handle the excessive memory consumption I ran into.
I actually ran xfs_repair -vv -m1750 and  looking into kern.log it  seems that xfs_repair invoked oom killer, but was killed itself ( !! )

This is last try to reproduce segfault:
xfs_repair -vv -P -m1750

On Tue, May 7, 2013 at 8:20 PM, Filippo Stenico <filippo.stenico@xxxxxxxxx> wrote:
xfs_repair -L -vv -P /dev/mapper/vg0-lv0 does the same kernel panic as my first report. No use to double info on this.
I'll try xfs_repair -L -vv -P -m 2000 to keep memory consuption at a limit.

On Tue, May 7, 2013 at 3:36 PM, Filippo Stenico <filippo.stenico@xxxxxxxxx> wrote:

On Tue, May 7, 2013 at 3:20 PM, Eric Sandeen <sandeen@xxxxxxxxxxx> wrote:
On 5/7/13 4:27 AM, Filippo Stenico wrote:
> Hello,
> this is a start-over to try hard to recover some more data out of my raid5 - lvm - xfs toasted volume.
> My goal is either to try the best to get some more data out of the volume, and see if I can reproduce the segfault.
> I compiled xfsprogs 3.1.9 from deb-source. I ran a xfs_metarestore to put original metadata on the cloned raid volume i had zeroed the log on before via xfs_repair -L (i figured none of the actual data was modified before as I am just working on metadata.. right?).
> Then I ran a mount, checked a dir that I knew it was corrupted, unmount and try an xfs_repair (commands.txt attached for details)
> I went home to sleep, but at morning I found out that kernel paniced due "out of memory and no killable process".
> I ran repair without -P... Should I try now disabling inode prefetch?
> Attached are also output of "free" and "top" at time of panic, as well as the output of xfs_repair and strace attached to it. Dont think gdb symbols would help here....


Ho hum, well, no segfault this time, just an out of memory error?
That's right....
No real way to know where it went from the available data I think.

A few things:

> root@ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
> mount: Structure needs cleaning

mount failed?  Now's the time to look at dmesg to see why.
>From attached logs it seems to be:

> XFS internal error xlog_valid_rec_header(1) at line 3466 of file [...2.6.32...]/fs/xfs/xfs_log_recover.c
> XFS: log mount/recovery failed: error 117

> root@ws1000:~# mount

<no raid0 mounted>

> root@ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
> root@ws1000:~# mount | grep raid0
> /dev/mapper/vg0-lv0 on /raid0/data type xfs (rw,relatime,attr2,noquota)

Uh, now it worked, with no other steps in between?  That's a little odd.
Looks odd to me too. But i just copied the commands issued as they where on my console... so yes, nothing in between. 
It found a clean log this time:

> XFS mounting filesystem dm-1
> Ending clean XFS mount for filesystem: dm-1

which is unexpected.

So the memory consumption might be a bug but there's not enough info to go on here.

> PS. Let me know if you wish reports like this one on list.

worth reporting, but I'm not sure what we can do with it.
Your storage is in pretty bad shape, and xfs_repair can't make something out
of nothing.

I still got back around 6TB out of 7.2 TB of total data stored, so this tells xfs is reliable even when major faults occur...

Thanks anyways, I am trying with a "-L" repair, at this step I expect another fail (due out of memory or something, as it happened last time) then I will try with "xfs_repair -L -vv -P" and I expect to see that segfault again.

Will report next steps, maybe something interesting for you will pop up... for me is not a waste of time, since this last try is worth being made.




<Prev in Thread] Current Thread [Next in Thread>