>hi Scott,
>
>On Tue, Aug 12, 2003 at 11:34:20AM +1000, Scott Fagg wrote:
>>
>> I just tried again with kernel-source-2.4.18-SGI_XFS_1.1.i386.rpm
>> from sgi.com , and the problem does not occur. I can fill up volumes
>> and manipulate ACLs with out kernel errors.
>>
>
>I know whats going on now - there's a couple of independent
>problems here. Firstly, the problem where you see corruption
>stack traces fly past on the console is a buglet in the error
>reporting code - a generic dabuf routine is reporting an error
>which is not actually an error in the context that the extended
>attribute code (and hence ACL code) is calling it from.
>
>The reason you don't see errors on older kernels is because
>there was none of the extra corruption checking code in those
>kernels, and hence no xfs_error_report routine, so we wouldn't
>dump things to the console as we do now. So, those console
>errors are harmless; I have a fix to shut them up and will
>check that in shortly.
I take it then that i'm not actually getting a corrupt filesystem,
which would explain xfs_repair and xfs_check never return anything.
Would your observations also fit in with the behaviour i see when an
inode gets damaged ( missing default ACL ? ) and still triggers the
kernel errors if i access that node when the filesystem has is nowhere
near full ?
that is :
- fill up fs
- manipulate ACLS and get error
- delete lots of files
- mainpulate ACL again and still get error ?
I think my experience has been that deleting the affected inodes and
then running something like 'find .' across the filesystem or setfacl
-R -dm would no longer produce errors.
>
>There's a second problem with handling default ACLs which can
>result in the default ACL not being inherited when we run out
>of space... I have a fix for this too. The two of these were
>interacting to cause an increased probability of hitting the
>corruption messages (the bogus ones).
>
>Also, I think in one of your earlier mails you mentioned that
>in your test cases the freespace fluctuates for awhile before
>becoming stable at 100%? This is probably because of the "-f"
>flag to cp, ie. "overwrite the file if it exists", which means
>cp first truncates (freeing up space), before overwriting (and
>reclaiming that space straight away).
That sounds reasonable.
>
>So, thanks again for all the help in finding test cases - they
>no longer show problems with these fixes in my kernel, and I'll
>get the fixes in soon for you to try out.
Excellent. If only all of my vendors responded so quickly :)
>
>cheers.
>
>--
>Nathan
>
>
>
|