Xfs_repair segfaults.

Filippo Stenico filippo.stenico at gmail.com
Wed May 8 12:42:26 CDT 2013


As I was writing it happened.

I got segfault at same place as reported first time.

So, uhm, what do you need to see?

I was putting together:
- machine info (cat /proc/meminfo, cat /proc/cpuinfo, uname -r)
- kernel log
- core.dump with debugging symbols
- output of xfs_repair
- output of strace $(xfs_repair)

I will put all the logs in place and send tomorrow as ... i forgot to raise
core size limit for my shell ... and today I am toasted and I am better to
go home and sleep.

Tell me if you need anything else.

Regards.

On Wed, May 8, 2013 at 7:30 PM, Filippo Stenico
<filippo.stenico at gmail.com>wrote:

> Hello,
> -m option seems not to handle the excessive memory consumption I ran into.
> I actually ran xfs_repair -vv -m1750 and  looking into kern.log it  seems
> that xfs_repair invoked oom killer, but was killed itself ( !! )
>
> This is last try to reproduce segfault:
> xfs_repair -vv -P -m1750
>
> On Tue, May 7, 2013 at 8:20 PM, Filippo Stenico <filippo.stenico at gmail.com
> > wrote:
>
>> xfs_repair -L -vv -P /dev/mapper/vg0-lv0 does the same kernel panic as my
>> first report. No use to double info on this.
>> I'll try xfs_repair -L -vv -P -m 2000 to keep memory consuption at a
>> limit.
>>
>>
>>
>> On Tue, May 7, 2013 at 3:36 PM, Filippo Stenico <
>> filippo.stenico at gmail.com> wrote:
>>
>>>
>>>
>>> On Tue, May 7, 2013 at 3:20 PM, Eric Sandeen <sandeen at sandeen.net>wrote:
>>>
>>>> On 5/7/13 4:27 AM, Filippo Stenico wrote:
>>>> > Hello,
>>>> > this is a start-over to try hard to recover some more data out of my
>>>> raid5 - lvm - xfs toasted volume.
>>>> > My goal is either to try the best to get some more data out of the
>>>> volume, and see if I can reproduce the segfault.
>>>> > I compiled xfsprogs 3.1.9 from deb-source. I ran a xfs_metarestore to
>>>> put original metadata on the cloned raid volume i had zeroed the log on
>>>> before via xfs_repair -L (i figured none of the actual data was modified
>>>> before as I am just working on metadata.. right?).
>>>> > Then I ran a mount, checked a dir that I knew it was corrupted,
>>>> unmount and try an xfs_repair (commands.txt attached for details)
>>>> > I went home to sleep, but at morning I found out that kernel paniced
>>>> due "out of memory and no killable process".
>>>> > I ran repair without -P... Should I try now disabling inode prefetch?
>>>> > Attached are also output of "free" and "top" at time of panic, as
>>>> well as the output of xfs_repair and strace attached to it. Dont think gdb
>>>> symbols would help here....
>>>> >
>>>>
>>>> >
>>>>
>>>> Ho hum, well, no segfault this time, just an out of memory error?
>>>>
>>> That's right....
>>>
>>>> No real way to know where it went from the available data I think.
>>>>
>>>> A few things:
>>>>
>>>> > root at ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
>>>> > mount: Structure needs cleaning
>>>>
>>>> mount failed?  Now's the time to look at dmesg to see why.
>>>>
>>>
>>> From attached logs it seems to be:
>>>>
>>>> > XFS internal error xlog_valid_rec_header(1) at line 3466 of file
>>>> [...2.6.32...]/fs/xfs/xfs_log_recover.c
>>>> > XFS: log mount/recovery failed: error 117
>>>>
>>>> > root at ws1000:~# mount
>>>>
>>>> <no raid0 mounted>
>>>>
>>>> > root at ws1000:~# mount /dev/mapper/vg0-lv0 /raid0/data/
>>>> > root at ws1000:~# mount | grep raid0
>>>> > /dev/mapper/vg0-lv0 on /raid0/data type xfs
>>>> (rw,relatime,attr2,noquota)
>>>>
>>>> Uh, now it worked, with no other steps in between?  That's a little odd.
>>>>
>>> Looks odd to me too. But i just copied the commands issued as they where
>>> on my console... so yes, nothing in between.
>>>
>>>> It found a clean log this time:
>>>>
>>>> > XFS mounting filesystem dm-1
>>>> > Ending clean XFS mount for filesystem: dm-1
>>>>
>>>> which is unexpected.
>>>>
>>>> So the memory consumption might be a bug but there's not enough info to
>>>> go on here.
>>>>
>>>> > PS. Let me know if you wish reports like this one on list.
>>>>
>>>> worth reporting, but I'm not sure what we can do with it.
>>>> Your storage is in pretty bad shape, and xfs_repair can't make
>>>> something out
>>>> of nothing.
>>>>
>>>> -Eric
>>>>
>>>
>>> I still got back around 6TB out of 7.2 TB of total data stored, so this
>>> tells xfs is reliable even when major faults occur...
>>>
>>> Thanks anyways, I am trying with a "-L" repair, at this step I expect
>>> another fail (due out of memory or something, as it happened last time)
>>> then I will try with "xfs_repair -L -vv -P" and I expect to see that
>>> segfault again.
>>>
>>> Will report next steps, maybe something interesting for you will pop
>>> up... for me is not a waste of time, since this last try is worth being
>>> made.
>>>
>>> --
>>> F
>>
>>
>>
>>
>> --
>> F
>
>
>
>
> --
> F




-- 
F
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130508/1816e55d/attachment.html>


More information about the xfs mailing list