xfs
[Top] [All Lists]

Re: kernel errors when XFS filesystem fills up

To: <linux-xfs@xxxxxxxxxxx>
Subject: Re: kernel errors when XFS filesystem fills up
From: "Scott Fagg" <scott.fagg@xxxxxxxx>
Date: Tue, 12 Aug 2003 17:13:51 +1000
Sender: linux-xfs-bounce@xxxxxxxxxxx
>hi Scott,
>
>On Tue, Aug 12, 2003 at 11:34:20AM +1000, Scott Fagg wrote:
>> 
>> I just tried again with kernel-source-2.4.18-SGI_XFS_1.1.i386.rpm
>> from sgi.com , and the problem does not occur. I can fill up volumes
>> and manipulate ACLs with out kernel errors.
>> 
>
>I know whats going on now - there's a couple of independent
>problems here.  Firstly, the problem where you see corruption
>stack traces fly past on the console is a buglet in the error
>reporting code - a generic dabuf routine is reporting an error
>which is not actually an error in the context that the extended
>attribute code (and hence ACL code) is calling it from.
>
>The reason you don't see errors on older kernels is because
>there was none of the extra corruption checking code in those
>kernels, and hence no xfs_error_report routine, so we wouldn't
>dump things to the console as we do now.  So, those console
>errors are harmless; I have a fix to shut them up and will
>check that in shortly.

I take it then that i'm not actually getting a corrupt filesystem,
which would explain xfs_repair and xfs_check never return anything.

Would your observations also fit in with the behaviour i see when an
inode gets damaged ( missing default ACL ? ) and still triggers the
kernel errors if i access that node when the filesystem has is nowhere
near full ?

that is :

- fill up fs
- manipulate ACLS and get error
- delete lots of files
- mainpulate ACL again and still get error ? 

I think my experience has been that deleting the affected inodes and
then running something like 'find .' across the filesystem or setfacl
-R -dm would no longer produce errors.

>
>There's a second problem with handling default ACLs which can
>result in the default ACL not being inherited when we run out
>of space... I have a fix for this too.  The two of these were
>interacting to cause an increased probability of hitting the
>corruption messages (the bogus ones).
>
>Also, I think in one of your earlier mails you mentioned that
>in your test cases the freespace fluctuates for awhile before
>becoming stable at 100%?  This is probably because of the "-f"
>flag to cp, ie. "overwrite the file if it exists", which means
>cp first truncates (freeing up space), before overwriting (and
>reclaiming that space straight away).

That sounds reasonable.

>
>So, thanks again for all the help in finding test cases - they
>no longer show problems with these fixes in my kernel, and I'll
>get the fixes in soon for you to try out.

Excellent. If only all of my vendors responded so quickly :)

>
>cheers.
>
>-- 
>Nathan
>
>
>



<Prev in Thread] Current Thread [Next in Thread>