[Top] [All Lists]

Re: Segmentation fault during xfs_repair

To: Richard Kolkovich <richard@xxxxxxxxxxxxx>
Subject: Re: Segmentation fault during xfs_repair
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Fri, 05 Jun 2009 23:44:05 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20090605222236.GA39825@xxxxxxxxxxxxxxxxxxxxxx>
References: <20090605222236.GA39825@xxxxxxxxxxxxxxxxxxxxxx>
User-agent: Thunderbird (Macintosh/20090302)
Richard Kolkovich wrote:
> We have a corrupted XFS partition on a storage server.  Attempting to run 
> xfs_repair the first time yielded the message about a corrupt log file, so I 
> have run xfs_repair with -L to clear that.  Now, xfs_repair segfaults in 
> Phase 3.  I have tried -P and a huge -m to no avail.  It always seems to 
> segfault at the same point:
> bad directory block magic # 0 in block 11 for directory inode 341521797
> corrupt block 11 in directory inode 341521797
>         will junk block
> Segmentation fault (core dumped)


> I can provide the full core file, if need be (956M).  The xfs_metadump can be 
> found at:
> http://files.intrameta.com/metadump.gz (735M)
> Any suggestions/ideas on how to proceed are welcome.  Please Reply-All, as 
> I'm not subscribed to the ML.

Ok, on a -g (not -02) build:

Program terminated with signal 11, Segmentation fault.
#0  0x0000000000418d05 in traverse_int_dir2block (mp=0x7ffff4c4f150,
da_cursor=0x7ffff4c4eb30, rbno=0x7ffff4c4ebdc) at dir2.c:356
356                     da_cursor->level[i].hashval =
(gdb) p i
$1 = 46501

i is set from

i = da_cursor->active = be16_to_cpu(node->hdr.level);

(gdb) p node->hdr.level // note this is big endian
$2 = 42421

that's a crazily deep btree, well beyond anything sane:

#define XFS_DA_NODE_MAXDEPTH    5       /* max depth of Btree */

So repair really should be checking for this before it goes off and
indexes it:

356                     da_cursor->level[i].hashval =

because the cursor only has this much in the array:

        dir2_level_state_t      level[XFS_DA_NODE_MAXDEPTH];

I'll have to ponder what repair should do in this case ... and I'll see
if there's something we can do in xfs_db to just whack out this problem
and let repair continue for now.


<Prev in Thread] Current Thread [Next in Thread>