xfs
[Top] [All Lists]

Re: [PATCH] [RFC] Implement XFS_IOC_FSBULKSTAT_RANGE ioctl

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] [RFC] Implement XFS_IOC_FSBULKSTAT_RANGE ioctl
From: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
Date: Fri, 6 Mar 2015 14:34:20 -0300
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20150303215557.GU4251@dastard>
Mail-followup-to: Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
References: <1425393678-4550-1-git-send-email-cmaiolino@xxxxxxxxxx> <20150303215557.GU4251@dastard>
User-agent: Mutt/1.5.23 (2014-03-12)
Hi Eric and Dave,

thank you for the reply, I completely forgot about our talk regarding refactor
xfs_bulkstat(), the idea to send the patch here was to get more comments into
it, so, looks like I got what I wanted :-)

I'll work on it and refactor xfs_bulkstat() and also implement the ideas you
gave.

Thanks


On Wed, Mar 04, 2015 at 08:55:57AM +1100, Dave Chinner wrote:
> On Tue, Mar 03, 2015 at 11:41:18AM -0300, Carlos Maiolino wrote:
> > This patch aims to implement a ranged bulkstat ioctl, which will make users 
> > able
> > to bulkstat inodes in a filesystem based on range instead on rely only in a
> > whole filesystem bulkstat.
> > 
> > This first implementation add a per AG bulkstat possibility, but it also 
> > adds
> > the necessary infrastructure to implement different kinds of ranged 
> > bulkstat,
> > like per block.
> > 
> > The patch is already working and I've been testing it for a while, so, I 
> > think
> > it's time to send this patch out and ask for comments on it.
> > 
> > The core of the implementation is very similar with what xfs_bulsktat() 
> > does by
> > now, which, instead bulkstat the whole filesystem, it start/stop on the
> > specified range.
> > 
> > As per Dave's suggesting, I added to  rgbulkreq structure (used to pass data
> > to/from userland), a padding, so we can use the same structure for another 
> > kind
> > of ranged bulkstats without the need to actually change the structure size.
> > 
> > Signed-off-by: Carlos Maiolino <cmaiolino@xxxxxxxxxx>
> > ---
> >  fs/xfs/libxfs/xfs_fs.h |  12 +++
> >  fs/xfs/xfs_ioctl.c     |  56 +++++++++++++
> >  fs/xfs/xfs_itable.c    | 210 
> > +++++++++++++++++++++++++++++++++++++++++++++++++
> >  fs/xfs/xfs_itable.h    |  16 ++++
> >  4 files changed, 294 insertions(+)
> > 
> > diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h
> > index 18dc721..88665aa 100644
> > --- a/fs/xfs/libxfs/xfs_fs.h
> > +++ b/fs/xfs/libxfs/xfs_fs.h
> > @@ -334,6 +334,17 @@ typedef struct xfs_fsop_bulkreq {
> >     __s32           __user *ocount; /* output count pointer         */
> >  } xfs_fsop_bulkreq_t;
> >  
> > +typedef struct xfs_fsop_rgbulkreq {
> > +   __u64           __user *lastip; /* last inode # pointer         */
> > +   __s32           icount;         /* count of entries in buffer   */
> > +   void            __user *ubuffer;/* user buffer for inode desc.  */
> > +   __s32           __user *ocount; /* output count pointer         */
> > +   __u64           start;          /* start point of rgbulkstat    */
> > +   __u64           end;            /* end point of rgbulkstat      */
> > +   __u32           flags;          /* multipurpose flag field      */
> > +   __u32           pad[5];         /* reserved space               */
> > +} xfs_fsop_rgbulkreq_t;
> 
> No new typedefs. Also, call it struct xfs_fsop_bulkstat_range, so
> it's clear what ioctl it belongs with.
> 
> And, as eric mentioned, the flag:
> 
> #define XFS_FSOP_BSRANGE_AGNUMBER     1
> 
> also needs to be defined to tell the kernel that start/end are ag
> numbers.
> 
> > diff --git a/fs/xfs/xfs_ioctl.c b/fs/xfs/xfs_ioctl.c
> > index f7afb86..34a4de5 100644
> > --- a/fs/xfs/xfs_ioctl.c
> > +++ b/fs/xfs/xfs_ioctl.c
> > @@ -790,6 +790,59 @@ xfs_ioc_bulkstat(
> >  }
> >  
> >  STATIC int
> > +xfs_ioc_bulkstat_range(
> > +   xfs_mount_t             *mp,
> > +   unsigned int            cmd,
> > +   void                    __user *arg)
> > +{
> > +
> > +   xfs_fsop_rgbulkreq_t    rgbulkreq;
> 
> Again, better name needed. Even just "bsreq" would be fine, because
> it's just a bulkstat request structure....
> 
> > +   int                     count;  /* # of records returned */
> > +   xfs_ino_t               inlast; /* last inode number */
> > +   int                     done;
> > +   int                     error;
> > +
> > +   if (!capable(CAP_SYS_ADMIN))
> > +           return -EPERM;
> > +
> > +   if (XFS_FORCED_SHUTDOWN(mp))
> > +           return -EIO;
> > +
> > +   /* We do not support another ranged calls yet */
> > +   if (cmd != XFS_IOC_FSBULKSTAT_RANGE)
> > +           return -EINVAL;
> 
> Not necessary.
> 
> > +
> > +   if (copy_from_user(&rgbulkreq, arg, sizeof(xfs_fsop_rgbulkreq_t)))
> > +           return -EFAULT;
> > +
> > +   if (copy_from_user(&inlast, rgbulkreq.lastip, sizeof(__s64)))
> > +           return -EFAULT;
> > +
> > +   if ((count = rgbulkreq.icount) <= 0)
> > +           return -EINVAL;
> 
> Assignments need to be on their own line.
> 
> > +
> > +   if (rgbulkreq.ubuffer == NULL)
> > +           return -EINVAL;
> 
> Missing is validation of the start, end and flags fields.
> 
> > +
> > +   error = xfs_bulkstat_range(mp, &inlast, &count, xfs_bulkstat_one,
> > +                              sizeof(xfs_bstat_t), rgbulkreq.ubuffer,
> > +                              rgbulkreq.start, rgbulkreq.end, &done);
> 
> Why do you need a "done" parameter?  Also, just pass the bsreq
> structure, not the individual elements. I don't see any point in
> passing the formatter or size parameters at this point, either,
> because xfs_bulkstat_range() can easily define them.
> 
> > diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
> > index 82e3142..d6b27ee 100644
> > --- a/fs/xfs/xfs_itable.c
> > +++ b/fs/xfs/xfs_itable.c
> > @@ -530,6 +530,216 @@ del_cursor:
> >     return error;
> >  }
> >  
> > +/*
> > + * Return stat information in bulk (by-inode) for specified
> > + * filesystem range.
> > + */
> > +int
> > +xfs_bulkstat_range(
> > +   xfs_mount_t             *mp,    /* mount point for filesystem */
> > +   xfs_ino_t               *lastinop, /* last inode returned */
> > +   int                     *ubcountp, /* size of buffer/count returned */
> > +   bulkstat_one_pf         formatter, /* func that'd fill a single buf */
> > +   size_t                  statstruct_size, /* sizeof struct filling */
> > +   char                    __user *ubuffer, /* buffer with inode stats */
> > +   __u64                   start,  /* start bulkstat here */
> > +   __u64                   end,    /* end bulkstat here */
> > +   int                     *done) /* 1 if there are more stats to get */
> > +{
> > +   xfs_buf_t               *agbp;  /* agi header buffer */
> > +   xfs_agino_t             agino;  /* inode # in allocation group */
> > +   xfs_agnumber_t          agno;   /* allocation group number */
> > +   xfs_btree_cur_t         *cur;   /* btree cursor for ialloc btree */
> > +   size_t                  irbsize; /* size of irec buffer in bytes */
> > +   xfs_inobt_rec_incore_t  *irbuf; /* start of irec buffer */
> > +   int                     nirbuf; /* size of irbuf */
> > +   int                     ubcount; /* size of user's buffer */
> > +   struct xfs_bulkstat_agichunk ac;
> > +   int                     error = 0;
> > +
> > +   /*
> > +    * Get the last inode value, see if there is nothing to do.
> > +    */
> > +   if (*lastinop != 0)
> > +           agno = XFS_INO_TO_AGNO(mp, *lastinop);
> > +   else
> > +           agno = start;
> > +
> > +   agino = XFS_INO_TO_AGINO(mp, *lastinop);
> > +
> > +   /*
> > +    * If specified end is bigger than mp->m_sb.sb_agount, should we
> > +    * gracefully interpret as if there is nothing to do, or trigger
> > +    * an error?
> > +    */
> 
> We should have already errored out before we get here in this case.
> 
> > +   if (agno >= mp->m_sb.sb_agcount) {
> > +           *done = 1;
> > +           *ubcountp = 0;
> > +           return 0;
> > +   }
> 
> From this point onwards, you've pretty much copy-n-pasted the
> xfs_bulkstat() implementation. What should happen here is:
> 
>       for (agno = start; agno < end; agno++) {
>               error = xfs_bulkstat_range_ag(agno, ... );
>               if (error)
>                       break;
>       }
> 
> and all this:
> 
> > +   ubcount = *ubcountp;
> > +   ac.ac_ubuffer = &ubuffer;
> > +   ac.ac_ubleft = ubcount * statstruct_size; /* bytes */
> > +   ac.ac_ubelem = 0;
> > +
> > +   *ubcountp = 0;
> > +   *done = 0;
> > +
> > +   irbuf = kmem_zalloc_greedy(&irbsize, PAGE_SIZE, PAGE_SIZE * 4);
> > +   if (!irbuf)
> > +           return -ENOMEM;
> > +
> > +   nirbuf = irbsize / sizeof(*irbuf);
> > +
> > +   /*
> > +    * Loop over the allocation groups, starting from the last inode
> > +    * returned; - means start of the allocation group.
> > +    */
> > +   while (agno <= end) {
> > +           struct xfs_inobt_rec_incore     *irbp = irbuf;
> > +           struct xfs_inobt_rec_incore     *irbufend = irbuf + nirbuf;
> > +           bool                            end_of_ag = false;
> > +           int                             icount = 0;
> > +           int                             stat;
> > +
> > +           error = xfs_ialloc_read_agi(mp, NULL, agno, &agbp);
> > +           if (error)
> > +                   break;
> > +
> > +           /*
> > +            * Allocate and initialize a btree cursor for ialloc btree.
> > +            */
> > +           cur = xfs_inobt_init_cursor(mp, NULL, agbp, agno,
> > +                                       XFS_BTNUM_INO);
> > +
> > +           if (agino > 0) {
> > +                   /*
> > +                    * In the middle of an allocation group, we need to get
> > +                    * the remainder of the chunk we are in.
> > +                    */
> > +                   struct xfs_inobt_rec_incore     r;
> > +
> > +                   error = xfs_bulkstat_grab_ichunk(cur, agino, &icount, 
> > &r);
> > +                   if (error)
> > +                           goto del_cursor;
> > +                   if (icount) {
> > +                           irbp->ir_startino = r.ir_startino;
> > +                           irbp->ir_freecount = r.ir_freecount;
> > +                           irbp->ir_free = r.ir_free;
> > +                           irbp++;
> > +                   }
> > +                   /* Increment to the next record */
> > +                   error = xfs_btree_increment(cur, 0, &stat);
> > +           } else {
> > +                   /* Start of ag. Lookup the first inode chunk */
> > +                   error = xfs_inobt_lookup(cur, 0, XFS_LOOKUP_GE, &stat);
> > +           }
> > +           if (error || stat == 0) {
> > +                   end_of_ag = true;
> > +                   goto del_cursor;
> > +           }
> > +
> > +           /*
> > +            * Loop through inode btree records in this ag,
> > +            * until we run out of inodes or space in the buffer.
> > +            */
> > +           while (irbp < irbufend && icount < ubcount) {
> > +                   struct xfs_inobt_rec_incore     r;
> > +
> > +                   error = xfs_inobt_get_rec(cur, &r, &stat);
> > +                   if (error || stat == 0) {
> > +                           end_of_ag = true;
> > +                           goto del_cursor;
> > +                   }
> > +
> > +                   /*
> > +                    * If this chunk has any allocated inodes, save it.
> > +                    * Also start read-ahead now for this chunk.
> > +                    */
> > +                   if (r.ir_freecount < XFS_INODES_PER_CHUNK) {
> > +                           xfs_bulkstat_ichunk_ra(mp, agno, &r);
> > +                           irbp->ir_startino = r.ir_startino;
> > +                           irbp->ir_freecount = r.ir_freecount;
> > +                           irbp->ir_free = r.ir_free;
> > +                           irbp++;
> > +                           icount += XFS_INODES_PER_CHUNK - r.ir_freecount;
> > +                   }
> > +                   error = xfs_btree_increment(cur, 0, &stat);
> > +                   if (error || stat == 0) {
> > +                           end_of_ag = true;
> > +                           goto del_cursor;
> > +                   }
> > +                   cond_resched();
> > +           }
> > +
> > +   /*
> > +    * Drop the btree buffers and the agi buffer as we can't hold any of the
> > +    * locks these represent when calling iget. If there is a pending error,
> > +    * then we are done.
> > +    */
> > +del_cursor:
> > +           xfs_btree_del_cursor(cur, XFS_BTREE_NOERROR);
> > +           xfs_buf_relse(agbp);
> > +           if (error)
> > +                   break;
> > +           /*
> > +            * Now format all the good inodes into the user's buffer. The
> > +            * call to xfs_bulkstat_ag_ichunk() sets up the agino pointer
> > +            * for the next loop iteration.
> > +            */
> > +           irbufend = irbp;
> > +           for (irbp = irbuf;
> > +                irbp < irbufend && ac.ac_ubleft >= statstruct_size;
> > +                irbp++) {
> > +                   error = xfs_bulkstat_ag_ichunk(mp, agno, irbp, 
> > formatter,
> > +                                           statstruct_size, &ac, &agino);
> > +                   if (error)
> > +                           break;
> > +
> > +                   cond_resched();
> > +           }
> > +
> > +           /*
> > +            * If we have run out of space or had a formatting error, we
> > +            * are now done
> > +            */
> > +           if (ac.ac_ubleft < statstruct_size || error)
> > +                   break;
> > +
> > +           if (end_of_ag) {
> > +                   agno++;
> > +                   agino = 0;
> > +           }
> > +   }
> > +   /*
> > +    * Done, we are either out of specified inode range or space to
> > +    * put the data.
> > +    */
> > +   kmem_free(irbuf);
> > +   *ubcountp = ac.ac_ubelem;
> 
> Up to here is done by xfs_bulkstat_range_ag(). This factoring should
> be done to xfs_bulkstat() as the first patch in the series, then you
> can add the xfs_bulkstat_range() function that uses it, then finally
> introduce the ioctl that wraps around xfs_bulkstat_range().
> 
> This avoids all of the copy/paste, and ensures that the bulkstat
> APIs all use the same implementation which makes testing and
> maintenance of the code in future much simpler.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

-- 
Carlos

<Prev in Thread] Current Thread [Next in Thread>