xfs
[Top] [All Lists]

Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)

To: Shaun Adolphson <shaun@xxxxxxxxxxxxx>
Subject: Re: CentOS 5.5 XFS internal errors (XFS_WANT_CORRUPTED_GOTO)
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 12 Jul 2010 11:08:32 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <AANLkTimcM9LfOHAH6q95esD4JU8Geb21750YMAh3JKBl@xxxxxxxxxxxxxx>
References: <AANLkTimKKEvfJx6EQZeQF_HnlBLj6B8Kjfy6jUHGPnz5@xxxxxxxxxxxxxx> <20100706231856.GC25018@dastard> <AANLkTilOGA-XF2V_Z_7WntgA23CHNlfx1u0I8xJtHZDz@xxxxxxxxxxxxxx> <AANLkTimcM9LfOHAH6q95esD4JU8Geb21750YMAh3JKBl@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Sun, Jul 11, 2010 at 09:44:07PM +1000, Shaun Adolphson wrote:
> On Thu, Jul 8, 2010 at 9:21 PM, Shaun Adolphson <shaun@xxxxxxxxxxxxx> wrote:
> > On Wed, Jul 7, 2010 at 9:18 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >>
> >> On Tue, Jul 06, 2010 at 08:57:45PM +1000, Shaun Adolphson wrote:
> >> > Hi,
> >> >
> >> > We have been able to repeatably produce xfs internal errors
> >> > (XFS_WANT_CORRUPTED_GOTO) on one of our fileservers. We are attempting
> >> > to locally copy a 248Gig file off a usb drive formated as NTFS to the
> >> > xfs drive. The copy gets about 96% of the way through and we get the
> >> > following messages:
> >> >
> >> > Jun 28 22:14:46 terrorserver kernel: XFS internal error
> >> > XFS_WANT_CORRUPTED_GOTO at line 2092 of file fs/xfs/xfs_bmap_btree.c.
> >> > Caller 0xffffffff8837446f
> >>
> >> Interesting. That's a corrupted inode extent btree - I haven't seen
> >> one of them for a long while. Were there any errors (like IO errors)
> >> reported before this?
> >>
> >> However, the first step is to determine if the error is on disk or an
> >> in-memory error. Can you post output of:
> >>
> >>        - xfs_info <mntpt>
> 
> meta-data=/dev/TerrorVolume/terror isize=256    agcount=130385,
> agsize=32768 blks
>               =                      sectsz=512   attr=1
> data        =                      bsize=4096   blocks=4272433152, imaxpct=25
>               =                      sunit=0      swidth=0 blks
> naming   =version 2         bsize=4096   ascii-ci=0
> log         =internal            bsize=4096   blocks=2560, version=1
>              =                       sectsz=512   sunit=0 blks, lazy-count=0
> realtime  =none               extsz=4096   blocks=0, rtextents=0

WHy did you make this filesystem with 128MB allocation groups? The
default for a filesystem of this size is 1TB allocation groups.
More than 100k allocation groups will certainly push internal AG
scanning scalability past it's tested limits....

Also, a log of 10MB is rather small, and it tells me that you didn't
just create this filesystem firectly on the 16TB block device with a
recent mkfs.xfs. That is, at current mkfs.xfs defaults to get a layout like
this you'd have to ѕtart with a 512MB filesystem and grow it to
16TB.

Growing a filesystem by 3-4 orders of magnitude does not result in a
particularly sane filesystem layout and pushes it way outside
configurations that are regularly tested.  I strongly suggest you
rebuild this filesystem with a default layout from a recent mkfs.xfs
before going any further....

> >>        - xfs_repair -n after a shutdown
> 
> The out out of the xfs_repair -n is 6mb, below is the condensed
> version. I can post the whole output if required.

If there were no errors, then I don't need to see it. However, if
you trimmed errors out or you don't know what errors look like, then
I need to see the whole output...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>