[PATCH] xfs:negative_icount.patch
Stuart Brodsky
sbrodsky at sgi.com
Tue Jul 20 07:56:50 CDT 2010
Thanks for the feedback. I will have an updated patch shortly. There is a slight problem in that f_ffree is an unsigned 64 so that it does not see something going negative but that is easy to work around. I'll send out the tested patch in a bit.
Thanks,
---
Stuart Brodsky
2750 Blue Water Road
Eagan, MN. 55121
651-683-7910
sbrodsky at sgi.com
On Jul 19, 2010, at 6:11 PM, Dave Chinner wrote:
> Hi Stuart,
>
> A couple of minor list-etiquette comments first:
>
> - please don't top post when replying as it makes it
> really hard to understand what you are replying to
> because there is no context to your reply.
>
> - don't remove the mailing list from the CC list while
> discussing a patch as there are plenty more people than
> just the person who sent the email that might want to put
> their 2c worth in...
>
> I've rearranged your reply and re-added the mailing list to the CC
> list....
>
> On Mon, Jul 19, 2010 at 07:11:55AM -0500, Stuart Brodsky wrote:
>> On Jul 18, 2010, at 7:14 PM, Dave Chinner wrote:
>>> On Fri, Jul 16, 2010 at 01:53:41PM -0500, Stuart Brodsky wrote:
>>>> This patch fixes the stat -f icount field going negative. The use of
>>>> lazy counts means that the super block field sb_icount is not updated
>>>> correctly and will allow over allocation of inodes above maxicount.
> .....
>>>> Index: a/fs/xfs/xfs_trans.c
>>>> ===================================================================
>>>> --- a/fs/xfs/xfs_trans.c.orig 2010-07-16 10:12:02.000000000 -0500
>>>> +++ b/fs/xfs/xfs_trans.c 2010-07-16 10:22:47.690249548 -0500
>>>> @@ -805,6 +805,7 @@
>>>> switch (field) {
>>>> case XFS_TRANS_SB_ICOUNT:
>>>> tp->t_icount_delta += delta;
>>>> + atomic64_add(delta,&tp->t_mountp->m_inodesinuse);
>>>> if (xfs_sb_version_haslazysbcount(&mp->m_sb))
>>>> flags &= ~XFS_TRANS_SB_DIRTY;
>>>> break;
>>>
>>> This is still racy. It may fix your specific reproducer but I can't see how
>>> changing the in use inode count from late in xfs_trans_commit() to early in
>>> xfs_trans_commit() solves the problem: if we look at it
>>> like this:
> ....
>>> Realistically, there is nothing wrong with XFS exceeding mp->m_maxicount by a
>>> small amount - it's just an arbitrary limit that we use to prevent all the
>>> filesytem space from being taken up by inodes. I think the problem is just a
>>> reporting problem here in xfs_fs_statfs():
>>>
>>> 1245 fakeinos = statp->f_bfree << sbp->sb_inopblog;
>>> 1246 statp->f_files =
>>> 1247 MIN(sbp->sb_icount + fakeinos, (__uint64_t)XFS_MAXINUMBER);
>>> 1248 if (mp->m_maxicount)
>>> 1249 statp->f_files = min_t(typeof(statp->f_files),
>>> 1250 statp->f_files,
>>> 1251 mp->m_maxicount);
>>> 1252 statp->f_ffree = statp->f_files - (sbp->sb_icount - sbp->sb_ifree);
>>>
>>> Assume no free space and no free inodes (statp->f_bfree = 0, => fakeinos = 0),
>>> and we have just hit the above race condition to allocate more inodes than
>>> mp->m_maxicount. Hence we have sbp->sb_icount > mp->m_maxicount and that gives:
>>>
>>> statp->f_files = min(sbp->sb_icount, mp->m_maxicount);
>>> statp->f_files = mp->m_maxicount;
>>>
>>> And now we've allocated all the free inodes (sbp->sb_ifree = 0). Therefore:
>>>
>>> statp->f_ffree = mp->m_maxicount - (sbp->sb_icount - 0)
>>>
>>> Which gives statp->f_ffree < 0 and hence negative free inodes. I think
>>> all that is required to "fix" this is to clamp statp->f_ffree to zero.
>>> i.e.:
>>>
>>> if (statp->f_ffree < 0)
>>> statp->f_ffree = 0;
>>>
>>> Make sense?
>>
>> I agree that if we are willing to accept the fact that the actual
>> number of inodes allocated can be over maxicount then your fix is
>> correct in just clamping the f_ffree to 0.
>
> Given that it already exceeds maxicount (and has for years ;) and
> nothing breaks except for the stats, I don't see any problem
> here. It seems much more complex to do anythign accurate and AFAICT
> it is not necessary to be absolutely accurate.
>
> Anyway, we have to allow the allocated inode count to exceed
> maxicount as you can use growfs to reduce the allowed inode
> percentage in the filesystem. Hence if we are near maxicount
> allocated inodes and we reduce the max inode percentage then the
> allocated inode count will exceed maxicount. In that case, we really
> do need to report zero free inodes, not some negative number....
>
>> I just don't like
>> using a counter that has the potential to be out of date any where
>> in the code.
>
> I'm not sure I follow you - nothing gets out of date AFAICT.
>
>> In this case as you suggest being over maxicount is
>> not the end of the world. We will report ENOSPC a bit later but
>> we still will detect it.
>
> We still report ENOSPC at exactly the same time as we do now.
> Nothing there changes, only the information going to userspace is
> sanitised.
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
More information about the xfs
mailing list