xfs
[Top] [All Lists]

Re: [PATCH v3 06/11] xfs: use and update the finobt on inode allocation

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH v3 06/11] xfs: use and update the finobt on inode allocation
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Tue, 11 Feb 2014 11:32:06 -0500
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140211071703.GG13647@dastard>
References: <1391536182-9048-1-git-send-email-bfoster@xxxxxxxxxx> <1391536182-9048-7-git-send-email-bfoster@xxxxxxxxxx> <20140211071703.GG13647@dastard>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
On 02/11/2014 02:17 AM, Dave Chinner wrote:
> On Tue, Feb 04, 2014 at 12:49:37PM -0500, Brian Foster wrote:
>> Replace xfs_dialloc_ag() with an implementation that looks for a
>> record in the finobt. The finobt only tracks records with at least
>> one free inode. This eliminates the need for the intra-ag scan in
>> the original algorithm. Once the inode is allocated, update the
>> finobt appropriately (possibly removing the record) as well as the
>> inobt.
>>
>> Move the original xfs_dialloc_ag() algorithm to
>> xfs_dialloc_ag_slow() and fall back as such if finobt support is
>> not enabled.
>>
>> Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
>> ---
>>  fs/xfs/xfs_ialloc.c | 211 
>> +++++++++++++++++++++++++++++++++++++++++++++++++++-
>>  1 file changed, 210 insertions(+), 1 deletion(-)
>>
>> diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
>> index bdaab76..afc9840 100644
>> --- a/fs/xfs/xfs_ialloc.c
>> +++ b/fs/xfs/xfs_ialloc.c
>> @@ -723,7 +723,7 @@ xfs_ialloc_get_rec(
>>   * available.
>>   */
>>  STATIC int
>> -xfs_dialloc_ag(
>> +xfs_dialloc_ag_slow(
>>      struct xfs_trans        *tp,
>>      struct xfs_buf          *agbp,
>>      xfs_ino_t               parent,
> 
> OK, though I would have called it xfs_dialloc_ag_from_inobt() or
> something similar to indicate what tree it is walking....
> 

Fair enough, that's more specific at least.

>> +STATIC int
>> +xfs_dialloc_ag(
>> +    struct xfs_trans        *tp,
>> +    struct xfs_buf          *agbp,
>> +    xfs_ino_t               parent,
>> +    xfs_ino_t               *inop)
>> +{
> 
> Initial thought: Wow, long, long function. How can we split this up?
> 
>> +    struct xfs_mount                *mp = tp->t_mountp;
>> +    struct xfs_agi                  *agi = XFS_BUF_TO_AGI(agbp);
>> +    xfs_agnumber_t                  agno = be32_to_cpu(agi->agi_seqno);
>> +    xfs_agnumber_t                  pagno = XFS_INO_TO_AGNO(mp, parent);
>> +    xfs_agino_t                     pagino = XFS_INO_TO_AGINO(mp, parent);
>> +    struct xfs_perag                *pag;
>> +    struct xfs_btree_cur            *cur;
>> +    struct xfs_btree_cur            *tcur;
>> +    struct xfs_inobt_rec_incore     rec;
>> +    struct xfs_inobt_rec_incore     trec;
>> +    xfs_ino_t                       ino;
>> +    int                             error;
>> +    int                             offset;
>> +    int                             i, j;
>> +
>> +    if (!xfs_sb_version_hasfinobt(&mp->m_sb))
>> +            return xfs_dialloc_ag_slow(tp, agbp, parent, inop);
>> +
>> +    pag = xfs_perag_get(mp, agno);
>> +
>> +    /*
>> +     * If pagino is 0 (this is the root inode allocation) use newino.
>> +     * This must work because we've just allocated some.
>> +     */
>> +    if (!pagino)
>> +            pagino = be32_to_cpu(agi->agi_newino);
>> +
>> +    cur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_FINO);
>> +
>> +    error = xfs_check_agi_freecount(cur, agi);
>> +    if (error)
>> +            goto error_cur;
>> +
>> +    if (agno == pagno) {
>> +            /*
>> +             * We're in the same AG as the parent inode so allocate the
>> +             * closest inode to the parent.
>> +             */
>> +            error = xfs_inobt_lookup(cur, pagino, XFS_LOOKUP_LE, &i);
>> +            if (error)
>> +                    goto error_cur;
>> +            if (i == 1) {
>> +                    error = xfs_inobt_get_rec(cur, &rec, &i);
>> +                    if (error)
>> +                            goto error_cur;
>> +                    XFS_WANT_CORRUPTED_GOTO(i == 1, error_cur);
>> +
>> +                    /*
>> +                     * See if we've landed in the parent inode record. The
>> +                     * finobt only tracks chunks with at least one free
>> +                     * inode, so record existence is enough.
>> +                     */
>> +                    if (pagino >= rec.ir_startino &&
>> +                        pagino < (rec.ir_startino + XFS_INODES_PER_CHUNK))
>> +                            goto alloc_inode;
>> +            }
>> +
>> +            error = xfs_btree_dup_cursor(cur, &tcur);
>> +            if (error) 
>> +                    goto error_cur;
>> +
>> +            error = xfs_inobt_lookup(tcur, pagino, XFS_LOOKUP_GE, &j);
>> +            if (error)
>> +                    goto error_tcur;
>> +            if (j == 1) {
>> +                    error = xfs_inobt_get_rec(tcur, &trec, &j);
>> +                    if (error)
>> +                            goto error_tcur;
>> +                    XFS_WANT_CORRUPTED_GOTO(j == 1, error_tcur);
>> +            }
>> +
>> +            if (i == 1 && j == 1) {
>> +                    if ((pagino - rec.ir_startino + XFS_INODES_PER_CHUNK - 
>> 1) >
>> +                        (trec.ir_startino - pagino)) {
>> +                            rec = trec;
>> +                            xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
>> +                            cur = tcur;
>> +                    } else {
>> +                            xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
>> +                    }
>> +            } else if (j == 1) {
>> +                    rec = trec;
>> +                    xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
>> +                    cur = tcur;
>> +            } else {
>> +                    xfs_btree_del_cursor(tcur, XFS_BTREE_NOERROR);
>> +            }
> 
> That entire branch can be easily factored into:
> 
>               error = xfs_dialloc_ag_finobt_near(cur, pagino, &rec);
> 

Indeed. The function is already organized to facilitate this via the use
of the cursor/record pointers. A slight difference is the callee may
need to duplicate and replace the caller's cursor.

> /*
>  * Allocate as close to the target agino as possible
>  */
> static int
> xfs_dialloc_ag_finobt_near(
>       struct xfs_btree_cur    *cur,
>       xfs_agino_t             agino,
>       struct xfs_inobt_rec_incore *rec)
> {
>       struct xfs_btree_cur    *rcur;  /* cursor for right search */
>       struct xfs_inobt_rec_incore rrec; /* and the record used */
>       int             error;
>       int             i;
>       int             l;
> 
>       error = xfs_inobt_lookup(cur, agino, XFS_LOOKUP_LE, &i);
>       if (error)
>               return error;
> 
>       if (i == 1) {
>               error = xfs_inobt_get_rec(cur, rec, &i);
>               if (error)
>                       return error;
>               XFS_WANT_CORRUPTED_RETURN(i == 1);
> 
>               /*
>                * See if we've landed in the target inode record. The
>                * finobt only tracks chunks with at least one free
>                * inode, so record existence is enough.
>                */
>               if (agino >= rec->ir_startino &&
>                   agino < (rec->ir_startino + XFS_INODES_PER_CHUNK))
>                       return 0;
>       }
> 
>       error = xfs_btree_dup_cursor(cur, &rcur);
>       if (error)
>               return error;
> 
>       error = xfs_inobt_lookup(rcur, agino, XFS_LOOKUP_GE, &j);
>       if (error)
>               goto error_rcur;
>       if (j == 1) {
>               error = xfs_inobt_get_rec(rcur, &rrec, &j);
>               if (error)
>                       goto error_rcur;
>               XFS_WANT_CORRUPTED_GOTO(j == 1, error_rcur);
>       }
> 
>       if (i == 1 && j == 1) {
>               /*
>                * both left and right records are valid, so choose
>                * the closer inode chunk to the target.
>                */
>               if ((agino - rec.ir_startino + XFS_INODES_PER_CHUNK - 1) >
>                                       (rrec.ir_startino - agino)) {
>                       *rec = rrec;
>                       xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
>                       cur = rcur;
>               } else {
>                       xfs_btree_del_cursor(rcur, XFS_BTREE_NOERROR);
>               }
>       } else if (j == 1) {
>               /* only right record is valid */
>               *rec = rrec;
>               xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
>               cur = rcur;
>       } else {
>               /* Found neither left nor right.
>               xfs_btree_del_cursor(rcur, XFS_BTREE_NOERROR);
>       }
>       return 0;
> 
> error_rcur:
>       xfs_btree_del_cursor(rcur, XFS_BTREE_ERROR);
>       return error;
> }
> 
>> +    } else {
>> +            /*
>> +             * Different AG from the parent inode. Check the record for the
>> +             * most recently allocated inode.
>> +             */
>> +            if (agi->agi_newino != cpu_to_be32(NULLAGINO)) {
>> +                    error = xfs_inobt_lookup(cur, agi->agi_newino,
>> +                                             XFS_LOOKUP_EQ, &i);
>> +                    if (error)
>> +                            goto error_cur;
>> +                    if (i == 1) {
>> +                            error = xfs_inobt_get_rec(cur, &rec, &i);
>> +                            if (error)
>> +                                    goto error_cur;
>> +                            XFS_WANT_CORRUPTED_GOTO(i == 1, error_cur);
>> +                            goto alloc_inode;
>> +                    }
>> +            }
>> +
>> +            /*
>> +             * Allocate the first inode available in the AG.
>> +             */
>> +            error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &i);
>> +            if (error)
>> +                    goto error_cur;
>> +            XFS_WANT_CORRUPTED_GOTO(i == 1, error_cur);
>> +
>> +            error = xfs_inobt_get_rec(cur, &rec, &i);
>> +            if (error)
>> +                    goto error_cur;
>> +            XFS_WANT_CORRUPTED_GOTO(i == 1, error_cur);
> 
> And that can be factored in a similar manner in
> xfs_dialloc_ag_newino()....
> 

Ok.

>> +    }
>> +
>> +alloc_inode:
>> +    offset = xfs_lowbit64(rec.ir_free);
>> +    ASSERT(offset >= 0);
>> +    ASSERT(offset < XFS_INODES_PER_CHUNK);
>> +    ASSERT((XFS_AGINO_TO_OFFSET(mp, rec.ir_startino) %
>> +                               XFS_INODES_PER_CHUNK) == 0);
>> +    ino = XFS_AGINO_TO_INO(mp, agno, rec.ir_startino + offset);
>> +
>> +    /*
>> +     * Modify or remove the finobt record.
>> +     */
>> +    rec.ir_free &= ~XFS_INOBT_MASK(offset);
>> +    rec.ir_freecount--;
>> +    if (rec.ir_freecount) 
>> +            error = xfs_inobt_update(cur, &rec);
>> +    else
>> +            error = xfs_btree_delete(cur, &i);
>> +    if (error)
>> +            goto error_cur;
>> +
>> +    /*
>> +     * Lookup and modify the equivalent record in the inobt.
>> +     */
>> +    tcur = xfs_inobt_init_cursor(mp, tp, agbp, agno, XFS_BTNUM_INO);
> 
> In case ou hadn't guessed, I don't like the "tcur/trec" variables
> because they make me thing "temporary" which they aren't. In this
> case it is the inobt cursor and record....
> 
> In fact, this whole segment could be factored into a function like
> xfs_dialloc_ag_inobt_update(), hence removing the second cursor from
> xfs_dialloc_ag() function altogether and that would clean a lot of
> the logic up....
> 

Sounds reasonable.

>> +
>> +    error = xfs_check_agi_freecount(tcur, agi);
>> +    if (error)
>> +            goto error_tcur;
>> +
>> +    error = xfs_inobt_lookup(tcur, rec.ir_startino, XFS_LOOKUP_EQ, &i);
>> +    if (error)
>> +            goto error_tcur;
>> +    XFS_WANT_CORRUPTED_GOTO(i == 1, error_tcur);
>> +
>> +    error = xfs_inobt_get_rec(tcur, &trec, &i);
>> +    if (error)
>> +            goto error_tcur;
>> +    XFS_WANT_CORRUPTED_GOTO(i == 1, error_tcur);
>> +    ASSERT((XFS_AGINO_TO_OFFSET(mp, trec.ir_startino) %
>> +                               XFS_INODES_PER_CHUNK) == 0);
>> +
>> +    trec.ir_free &= ~XFS_INOBT_MASK(offset);
>> +    trec.ir_freecount--;
>> +
>> +    XFS_WANT_CORRUPTED_GOTO((rec.ir_free == trec.ir_free) &&
>> +                            (rec.ir_freecount == trec.ir_freecount),
>> +                            error_tcur);
>> +
>> +    error = xfs_inobt_update(tcur, &trec);
>> +    if (error)
>> +            goto error_tcur;
>> +
>> +    /*
>> +     * Update the perag and superblock.
>> +     */
>> +    be32_add_cpu(&agi->agi_freecount, -1);
>> +    xfs_ialloc_log_agi(tp, agbp, XFS_AGI_FREECOUNT);
>> +    pag->pagi_freecount--;
>> +
>> +    xfs_trans_mod_sb(tp, XFS_TRANS_SB_IFREE, -1);
> 
> This will need to be done before you update the inobt, though, so
> you can run the xfs_check_agi_freecount() count in it and it will
> come out correct....
> 

Right. I suppose this could update the finobt, perag & super, check the
agi against the finobt, then invoke the new helper (which will update
the inobt and check its cursor against the agi).

Brian

> Cheers,
> 
> Dave.
> 

<Prev in Thread] Current Thread [Next in Thread>