xfs
[Top] [All Lists]

Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
From: Shaun Adolphson <shaun@xxxxxxxxxxxxx>
Date: Sun, 11 Jul 2010 21:44:07 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <AANLkTilOGA-XF2V_Z_7WntgA23CHNlfx1u0I8xJtHZDz@xxxxxxxxxxxxxx>
References: <AANLkTimKKEvfJx6EQZeQF_HnlBLj6B8Kjfy6jUHGPnz5@xxxxxxxxxxxxxx> <20100706231856.GC25018@dastard> <AANLkTilOGA-XF2V_Z_7WntgA23CHNlfx1u0I8xJtHZDz@xxxxxxxxxxxxxx>
On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
> On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>>
>> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
>> > Hi,
>> >
>> > We have been able to repeatably produce xfs internal errors
>> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
>> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
>> > xfs drive. The copy gets about 96% of the way through and we get the
>> > following messages:
>> >
>> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
>> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
>> > Caller 0xffffffff8837446f
>>
>> Interesting. That's a corrupted inode extent btree - I haven't seen
>> one of them for a long while. Were there any errors (like IO errors)
>> reported before this?
>>
>> However, the first step is to determine if the error is on disk or an
>> in-memory error. Can you post output of:
>>
>>        - xfs_info <mntpt>

meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
agsize=32768 blks
              =                      sectsz=512   attr=1
data        =                      bsize=4096   blocks=4272433152, imaxpct=25
              =                      sunit=0      swidth=0 blks
naming   =version 2         bsize=4096   ascii-ci=0
log         =internal            bsize=4096   blocks=2560, version=1
             =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime  =none               extsz=4096   blocks=0, rtextents=0


>>        - xfs_repair -n after a shutdown

The out out of the xfs_repair -n is 6mb, below is the condensed
version. I can post the whole output if required.

Phase 1 - find and verify superblock...
Phase 2 - using internal log
        - scan filesystem freespace and inode maps...
        - found root inode chunk
Phase 3 - for each AG...
        - scan (but don't clear) agi unlinked lists...
        - process known inodes and perform inode discovery...
       - agno = 0
.
.
.
       - agno = 130384
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.




>>
>> Can you upgrade xfsprogs (i.e. xfs_repair) to the latest version
>> (3.1.2) before you do this as well?

# xfs_repair -V
xfs_repair version 3.1.2


>
> We have upgraded the xfsprogs to 3.1.2 and in the process of
> collecting the required infomation.
>
>>
>> > We have reproduced the condition 3 times and each time we have been
>> > able to remount the drive ( to replay the transaction log ) and then
>> > preform and xfs_repair.
>> >
>> > We are just using cp to copy the file.
>> >
>> > Some further details about the system:
>> >
>> > Software:
>> > - Fresh install of CentOS 5.5 64bit all patches up to date
>> > - Kernel 2.6.18-194.3.1.el5.centos.plus
>>
>> I've got no idea exactly what version of XFS that has in it, so I
>> can't say off the top of my head whether this is a fixed bug or not.
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david@xxxxxxxxxxxxx
>
>
>
> During other testing we have also been able to reproduce the issue by
> copying  a self generated 248Gig file from another system disk to the
> XFS disk. The file was generated using dd with an input of /dev/zero.
>
> All the existing data (~6TB ) was successfully copied onto the storage
> with out have the error. The thing to note is that all the existing
> files are much smaller than the one that we are trying to copy in (
> 248Gig ). And since we have been having the shutdown we have copied
> many smaller files ( files < 30Gig in size ) onto the storage area
> with out issue
>

Thanks,

Shaun

<Prev in Thread] Current Thread [Next in Thread>