xfs
[Top] [All Lists]

Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] Introduce SEEK_DATA/SEEK_HOLE support to XFS V1
From: Jeff Liu <jeff.liu@xxxxxxxxxx>
Date: Sun, 20 Nov 2011 21:59:31 +0800
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, aelder@xxxxxxx, Chris Mason <chris.mason@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20111120003031.GM7046@dastard>
Organization: Oracle
References: <4E887D7F.2010306@xxxxxxxxxx> <20111114102444.GA27791@xxxxxxxxxxxxx> <4EC10DE8.6030607@xxxxxxxxxx> <20111114125044.GA9802@xxxxxxxxxxxxx> <4EC768F5.4050904@xxxxxxxxxx> <4EC76AB9.9030604@xxxxxxxxxx> <20111120003031.GM7046@dastard>
Reply-to: jeff.liu@xxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.18) Gecko/20110617 Thunderbird/3.1.11
Hi Dave,

Thanks for your further comments!

On 11/20/2011 08:30 AM, Dave Chinner wrote:

> On Sat, Nov 19, 2011 at 04:37:13PM +0800, Jeff Liu wrote:
>>
>> Signed-off-by: Jie Liu <jeff.liu@xxxxxxxxxx>
> 
> Christoph has commented on the code-related aspects of the patch, so
> I won't repeat that. I'll comment on structural/design issues
> instead.
> 
> Firstly, the patch splits the functionality arbitrarily over three
> different files, and I don't think that is necessary. There really
> is no reason at all for xfs_bmap.c to know anything aout
> SEEK_HOLE/SEEK_DATA semantics - that file is for extent manipulation
> and search functions. SEEK semantics should be entirely encoded into
> the function that deals with the seeking.

Yes, I should merge and isolate those functions in xfs_file.c.

> 
> Secondly, xfs_find_desired_extent() and xfs_seek_hole_data() should
> be one function, and named something like xfs_file_seek_extent().

Definitely!  and sorry for my poor english, xfs_file_seek_extent() is distinct 
in this case.

> 
> Finally, don't think using low level extent search functions like
> xfs_bmap_search_extents() is the right level to implement this
> functionality (see my comments about SEEK_HOLE/SEEK_DATA semantics
> in xfs-bmap.c above), especially as we already have functions for
> looking up holes and extents at given offsets.
> 
> That is, to find the next hole at or after after a given offset, we
> already have xfs_bmap_first_unused(). Indeed, this function already
> has the exact semantics that SEEK_HOLE requires. Simply put:
> 
>       case SEEK_HOLE:
>               fsb = XFS_B_TO_FSBT(mp, start_offset);
>               error = xfs_bmap_first_unused(NULL, ip, 1, &fsb,
>                                               XFS_DATA_FORK);
>               if (error)
>                       return -error;
> 
>               if (fsb <= XFS_B_TO_FSBT(mp, start_offset))
>                       return start_offset;
>               return XFS_FSB_TO_B(mp, fsb);

Thanks for pointing it out, I even don't know XFS has this convenient routine 
at that time. :(

> 
> 
> As to the data extent search, I'd prefer you to use xfs_bmapi_read()
> rather than xfs_bmap_search_extents() directly. I'd prefer that we
> return unwritten extents as holes rather than data from the initial
> implementation, and using the low level xfs_bmap_search_extents()
> makes that quite complex. However, we already have a function for
> handling that complexity: xfs_bmapi_read().
> 
> That is, xfs_bmapi_read() needs to be passed an array of two maps,
> one for the current offset, and one for the next extent type. This
> makes one call sufficient for most transitions. Done in a simple
> loop it will handle all conditions of hole->unwritten->hole....
> until it finds a data extent of EOF.
> 
> 
>       start_fsbno = XFS_B_TO_FSBT(mp, start_offset);
>       while (1) {
>               struct xfs_bmbt_irec    maps[2];
>               int                     nmaps = 2;
> 
>               count_fsb = XFS_B_TO_FSB(mp, XFS_MAXIOFFSET(mp));
>               error = xfs_bmapi_read(ip, fsbno, count_fsb,
>                                       &maps, &nmaps, XFS_BMAPI_ENTIRE);
> 
>               if (error)
>                       return -error;
>               if (!nmaps) {
>                       /* no extents at given offset, must be beyond EOF */
>                       return -ENXIO;
>               }
> 
>               switch (map[0].br_startblock) {
>               case DELAYSTARTBLOCK:
>                       /* landed in an in-memory data extent */
>                       return map[0].br_startoff;
> 
>               default:
>                       /* landed in an allocated extent */
>                       if (map[0].br_state == XFS_EXT_NORM) {
>                               /* a real data extent */
>                               return map[0].br_startoff;
>                       }
>                       /* Fall through to hole handling for unwritten extents 
> */
> 
>               case HOLESTARTBLOCK:
>                       /*
>                        * landed in a hole. If the next extent is a data
>                        * extent, then return the start of it, otherwise
>                        * we need to move the start offset and map more
>                        * blocks.
>                        */
>                       if (map[1].br_startblock == DELAYSTARTBLOCK ||
>                          ((map[1].br_startblock != HOLESTARTBLOCK &&
>                            map[1].br_state == XFS_EXT_NORM)))
>                               return map[1].br_startoff;
> 
>                       start_fsbno = map[1].br_startoff + map[1].br_blockcount;
>                       break;
>               }
> 
>               if (XFS_FSB_TO_B(mp, start_fsbno) > ip->i_size) {
>                       /* Beyond EOF now */
>                       return -ENXIO;
>               }
>       }
> 
> This can pretty much all be done in
> fs/xfs/xfs_file.c::xfs_file_seek_extent() because all the functions
> used by the above code are exported from xfs_bmap.c for external use
> - that solves the scattering problem and uses interfaces that we
> already know work in the intended manner.... ;)

Thanks again for the detailed info!
At first, I have spent a few hours to think it over using 
xfs_bmap_search_extents() or xfs_bmapi_read().
Indeed, it will be more complex to deal with the unwritten extents later if 
using xfs_bmap_search_extents().
This change will reflected in my next post.

> 
> BTW:
> 
>> +xfs_file_llseek(
>> +    struct file *file,
>> +    loff_t offset,
>> +    int origin)
>> +{
>> +    struct inode *inode = file->f_mapping->host;
>> +    int ret;
>> +
>> +    if (origin != SEEK_DATA && origin != SEEK_HOLE)
>> +            return generic_file_llseek(file, offset, origin);
>> +
>> +    mutex_lock(&inode->i_mutex);
>> +    switch (origin) {
> 
> We don't need the i_mutex to be held here. We only need to hold the
> ilock in shared mode for this operation to protect against extent
> list modifications (like unwritten extent conversion and
> truncation).

looks we only need to hold the ilock in shared mode in xfs_file_seek_extent().

> 
>> +int
>> +xfs_find_desired_extent(
>> +    struct inode            *inode,
>> +    loff_t                  *start,
>> +    u32                     type)
>> +{
>> +    xfs_inode_t             *ip = XFS_I(inode);
>> +    xfs_mount_t             *mp = ip->i_mount;
>> +    struct xfs_ifork        *ifp;
>> +    int                     lock;
>> +    int                     error;
>> +
>> +    if (ip->i_d.di_format != XFS_DINODE_FMT_EXTENTS &&
>> +        ip->i_d.di_format != XFS_DINODE_FMT_BTREE &&
>> +        ip->i_d.di_format != XFS_DINODE_FMT_LOCAL)
>> +            return XFS_ERROR(EINVAL);
>> +
>> +    xfs_ilock(ip, XFS_IOLOCK_SHARED);
>> +
>> +    /*
>> +     * Flush the delay alloc blocks. Even after flushing the inode,
>> +     * there can still be delalloc blocks on the inode beyond EOF
>> +     * due to speculative preallocation. These are not removed until
>> +     * the release function is called or the inode is inactivated.
>> +     * Hence we cannot assert here that ip->i_delayed_blks == 0.
>> +     */
>> +    if (ip->i_delayed_blks || ip->i_size > ip->i_d.di_size) {
>> +            error = xfs_flush_pages(ip, 0, -1, 0, FI_REMAPF);
>> +            if (error)
>> +                    goto out_unlock_iolock;
>> +    }
> 
> i.e. this IOLOCK and flush is completely unnecessary because we'll
> find delayed allocation extents in the extent lookup and can handle
> them just like real allocated extents....
> 
>> +    lock = xfs_ilock_map_shared(ip);
> 
> i.e. this is the only lock we need to take.

Yes, if we get rid of the the flush pages here, xfs_ilock() can be removed 
safely.

Thanks,
-Jeff

> 
> Cheers,
> 
> Dave.


<Prev in Thread] Current Thread [Next in Thread>