[Top] [All Lists]

Re: Seg fault during xfs repair (segmentation fault / segv)

To: Jesse Stroik <jstroik@xxxxxxxxxxxxx>
Subject: Re: Seg fault during xfs repair (segmentation fault / segv)
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Wed, 01 Jul 2009 14:53:48 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <4A4A7D44.7040009@xxxxxxxxxxxxx>
References: <4A4A596D.8030800@xxxxxxxxxxxxx> <4A4A5C4E.7030605@xxxxxxxxxxx> <4A4A7D44.7040009@xxxxxxxxxxxxx>
User-agent: Thunderbird (Macintosh/20090605)
Jesse Stroik wrote:
> Eric,
> Eric Sandeen wrote:
>> Jesse Stroik wrote:
>>> I have a server with a ~20TB xfs file system on Linux 
>>> (2.6.18-92.1.22.el5) and am running xfsprogs-2.9.4-4.el5.  We had a few 
>>> corrupted files which I believe were due to a SCSI issue after a recent 
>>> power outage.  Due to the corruption, I ran xfs_check and would like to 
>>> run xfs_repair on the system.
>> It'd really be great to test more recent xfsprogs first, that one is
>> about 2 years old.
>> You can probably grab any recent fedora src.rpm and rebuild it, and
>> later go back to the centos version if you wish.
> I fetched the current version from SVN using these directions: 
> http://xfs.org/index.php/Getting_the_latest_source_code
> I get identical results.
> --------
> ...
> reset bad sb for ag 31
> reset bad agf for ag 31
> reset bad agi for ag 31
> Segmentation fault

Ok, from a metadump image Jesse provided (thanks!) it's dying in here:

                bno = be32_to_cpu(agfl->agfl_bno[i]);
                printf("agfl at %p i is %d agfl_bno[i] %u bno is %u\n",
agfl, i, agfl->agfl_bno[i], bno);
                if (verify_agbno(mp, be32_to_cpu(agf->agf_seqno), bno))
                        set_agbno_state(mp, be32_to_cpu(agf->agf_seqno),
                                        bno, XR_E_FREE);

agfl_bno looks corrupt, and bno is coming out to be huge.

set_agbno_state() does:

*(ba_bmap[(agno)] + (ag_blockno)/XR_BB_NUM) = ....

where ag_blockno is that bno above; this wanders us off into bad memory
and boom.  I'll see what we can do to fix it up.


<Prev in Thread] Current Thread [Next in Thread>