xfs
[Top] [All Lists]

Re: [PATCH 3/4] xfs: add quota-driven speculative preallocation throttli

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 3/4] xfs: add quota-driven speculative preallocation throttling
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Fri, 14 Dec 2012 11:13:00 -0500
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20121213022530.GF16353@dastard>
References: <1354726078-31793-1-git-send-email-bfoster@xxxxxxxxxx> <1354726078-31793-4-git-send-email-bfoster@xxxxxxxxxx> <20121213022530.GF16353@dastard>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120911 Thunderbird/15.0.1
On 12/12/2012 09:25 PM, Dave Chinner wrote:
> On Wed, Dec 05, 2012 at 11:47:57AM -0500, Brian Foster wrote:
...
>>
>> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> 
> I'm having trouble determining what the algorithm is supposed to be
> and what might be bugs in the algorithm....
> 
> These are notes, somewhat incoherent, but I'll post them anyway
> because i think they convey my concerns and solutions well enough.
> 

Ok...

> Cheers,
> 
> Dave.
> 
> ----
> 
> Describe the algorithm to ensure I have it right:
> 
>       calculate preallocation size (alloc_blocks)
>               uses file size to determine size
> 
>       determine ENOSPC throttling reduction (shift)
> 
>       calculate maximum quota prealloc amount allowed
>               walk each dquot on inode
>                       check hard limit
>                               => over = no prealloc
>                       check soft limit
>                               => none = prealloc unmodified
>                       check over soft limit
>                               => no = prealloc unmodified
>                       calculate "throttle percentage"
>                       calculate max prealloc
>               use minimum prealloc value returned
> 
>       alloc_blocks = MIN(alloc_blocks, quota_alloc_blocks)
> 
>       apply ENOSPC throttle (shift)
> 
> 
> -  prealloc size being overridden by quota throttling, and then the
>    ENOSPC throttle is applied to that.
> 

That sounds accurate to me.

> - the overall algorithm looks good to me, that means my problems are
>   with the implementation....
> 
>       - it calculates stuff dynamically that could be set once in
>         the struct xfs_dquot on initialisation and whenever limits
>         are changed. i.e. the "percentage to throttle". This
>         should be carried by xfs_dquot as it simplifies
>         the logic here.
> 

Indeed.

>       - if there is a hard limit but no soft limit, it *always*
>         throttles preallocation to the default percentage even
>         where there is lots of space available for the full
>         prealloc.
> 

I'm not following you here. xfs_prealloc_dquot_max() should return the
maximum allowed preallocation for a quota by calculating some percentage
of the free space in the quota (regardless of the existence of a soft
limit; that serves as a throttling watermark and percentage modifier).
xfs_prealloc_quota_max() just extends that across all quota types and
returns the most limiting value. For an inode with plenty of space in
its quota set, we should get a large value back for max_quota_prealloc,
which is then ignored because alloc_blocks doesn't violate that limit.

(Regardless, I don't think this changes your design comments that follow.)

>       - it returns a block count based on limits, or -1 for no
>         throttling. I'd prefer a pair of functions - one to check
>         whether throttling is needed, and one to calculate the
>         throttling parameters
> 
>               - should use xfs_this_quota_on() to drive quota
>                 checks completely inside throttle check. will make
>                 the code much cleaner
> 
>               - check function can be boolean
> 
>               - calc function should return both raw space
>                 available and shift values so the global prealloc
>                 values can be overridden independently. i.e.
>                 allows quota throttling to work even when overall
>                 prealloc is less than the maximum quota would
>                 allow.
> 
> 
> Rough code:
> 
> need_throttle()
> {
>       if (!xfs_this_quota_on(mp, type))
>               return false;
> 
>       dq = xfs_inode_dquot(ip, type);
> 
>       /* no hard limit, no throttle */
>       if (!dq->q_hard_limit)
>               return false;
> 
>       /* over hard limit, always throttle */
>       if (dq->q_res_bcount > dq->q_hard_limit)
>               return true;
> 
>       /*
>        * Under soft limit, no throttle.
>        *
>        * Note: we always have a soft limit for prealloc,
>        * calculated at dquot instantiation or limit change
>        */
>       if (dq->q_res_bcount + alloc_blocks < dq->q_soft_limit)
>               return false;
> 
>       /* between soft limit and hard limit, need to throttle */
>       return true;
> }
> 
> - needs struct xfs-dquot to be initialised appropriately and quota
>   limit changes to handle changes correctly.
> 
> - allows soft limit defaults to be set in memory if they aren't on
>   disk. i.e. default throttling values will be no different in
>   implementation to on-disk limits.
> 

This comment is not quite clear to me. I suspect it relates to the
pseudocode comment above with regard to a default soft limit. To this
point, I'm interpreting your design proposal as something like the
following, at a high-level:

- xfs_dquot grows a (in memory) table of low space limits. This table is
populated in when the quota is read from disk, created and/or the input
parameters (hard/soft limit) change.
- The soft limit as a percentage modifier goes away by nature of using
the 3-5% logarithmic throttle (shift).
- The soft limit as a prealloc throttling trigger remains, just
implemented as a separate function as depicted above.

There are no on-disk changes proposed, correct? Are you suggesting we
set a default soft limit value on all quotas with a hard limit?

> calc_throttle()
> {
>       dq = xfs_inode_dquot(ip, type);
> 
>       freesp = dq->q_hard_limit - dq->q_res_bcount;
> 
>       if (freesp < dq->q_low_space[XFS_LOWSP_5_PCNT]) {
>               shift = 2;
>               if (freesp < dq->q_low_space[XFS_LOWSP_4_PCNT])
>                       shift++;
>               if (freesp < dq->q_low_space[XFS_LOWSP_3_PCNT])
>                       shift++;
>               if (freesp < dq->q_low_space[XFS_LOWSP_2_PCNT])
>                       shift++;
>               if (freesp < dq->q_low_space[XFS_LOWSP_1_PCNT])
>                       shift++;
>       }
> 
>       /* only overwrite current values if the result is a smaller prealloc */
>       if ((freesp >> shift) >= (*qblocks >> *qshift))
>               return;
> 
>       *qblocks = freesp;
>       *qshift = shift;
> }
> 
> - similar shift table to the ENOSPC code for a logarithmic mapping
>   rather than a linear mapping.
> - probably doesn't need 5 steps, 3 steps that do shift += 2 is
>   probably sufficient and would reduce per-dquot memory overhead.
> 
> xfs_iomap_prealloc_size()
> {
> .....
> 
>       qblocks = alloc_blocks;
>       qshift = 0;
> 
>       if (need_throttle(ip, XFS_DQ_USER, alloc_blocks)
>               calc_throttle(ip, XFS_DQ_USER, &qblocks, &qshift);
> 
>       if (need_throttle(ip, XFS_DQ_GROUP, alloc_blocks)
>               calc_throttle(ip, XFS_DQ_GROUP, &qblocks, &qshift);
> 
>       if (need_throttle(ip, XFS_DQ_PROJ, alloc_blocks)
>               calc_throttle(ip, XFS_DQ_PROJ, &qblocks, &qshift);
> 
>       /*
>        * The final size of the preallocation is the smaller of the
>        * whole filesystem prealloc size and the quota prealloc
>        * size. i.e. whichever entity has the least space available
>        * for allocation determines the maximum preallocation size.
>        *
>        * The final throttling level is the larger of the ENOSPC
>        * and quota throttles. i.e. which ever is closer to their
>        * respective space limit determines how much we throttle
>        * by.
>        */
>       alloc_blocks = MIN(qblocks, alloc_blocks)
>       shift = MAX(qshift, shift)
> ....
> 

... and this all makes sense.

I'm a little unsure about applying a shift selected for one limit
against the preallocation size of another, but I suppose it can't hurt
to be aggressive. This design should bubble up all the relevant
parameters to a single point anyways, so that should be easier to reason
about and measure when I have some code.

Thanks for the review Dave!

Brian

<Prev in Thread] Current Thread [Next in Thread>