[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: kernel errors when XFS filesystem fills up



On Fri, Aug 08, 2003 at 04:12:22PM +1000, Scott Fagg wrote:
> >
> >For some strange reason we are trying to read at AG blk 0 for that
> >inode, which is wrong - block zero in an AG holds the SB/AGF/AGI/
> >AGFL for that allocation group.  Its not clear if this is due to
> >the EA data on disk pointing to that block, or a bug in the kernel
> >code.  The tools not finding anything suggests to me a kernel bug,
> >not sure where though...
> >
> 
> So what should i do to generate more debug info ?

The absolute ideal from my point of view is a recipe of steps
which I can follow which is guaranteed to trigger the problem.
And if this can be trimmed back to a very basic minimum - e.g.
mkfs (-dsize=something small), a dd command line(s) to fill it
up so this will trigger, etc, & whatever else...

If I have that recipe and can reproduce it, I can be sure of
fixing it (and can verify the fix too).  The simpler the recipe,
the better from my point of view.

You seem to be able to reproduce it easily, which is promising.

> Not sure if it helps, but this sequence of events might give a clue :

This is a good start, but is not deterministic between our two
machines (ie. you hit it but I don't, and theres many variables
like "heaps of files", and an unknown starting point, etc).

> - run 'find' on the XFS vol
> - it hits a nasty inode and trigges the kernel message i see.
> - track down the inode mentioned and remove it and it's parent directory
> - run 'find' again .. no errors triggered
> - copy heaps of files back to the XFS vol and the error will probably occur again a couple of times, even if i'm copying 1000's of files.
> - backup files ( except faulty inodes )
> - re-format XFS parition
> - copy files back
> - .. no errors occur .. until the volume fills up again.
> 
> That help ? 

Getting closer I think.

cheers.

-- 
Nathan