xfs
[Top] [All Lists]

Re: [PATCH] xfs: reinstate the iolock in xfs_readdir

To: Ben Myers <bpm@xxxxxxx>
Subject: Re: [PATCH] xfs: reinstate the iolock in xfs_readdir
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 4 Dec 2013 09:55:20 +1100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20131203212951.GP1935@xxxxxxx>
References: <20131203212951.GP1935@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Dec 03, 2013 at 03:29:51PM -0600, Ben Myers wrote:
> Although it was removed in commit 051e7cd44ab8, ilock needs to be taken in
> xfs_readdir because we might have to read the extent list in from disk.  This
> keeps other threads from reading from or writing to the extent list while it 
> is
> being read in and is still in a transitional state.
>
> This has been associated with "Access to block zero" messages on directories
> with large numbers of extents resulting from excessive filesytem 
> fragmentation,
> as well as extent list corruption.  Unfortunately no test case at this point.

That commit was from 2007, so this is not a result of a recent
change. As it is, access to directories is completely
serialised by the VFS i_mutex. We don't need the iolock to read in
the extent list for a directory, because by the time we get to
readdir it is already guaranteed to be read in by this code:

STATIC int
xfs_dir_open(
        struct inode    *inode,
        struct file     *file)
{
        struct xfs_inode *ip = XFS_I(inode);
        int             mode;
        int             error;

        error = xfs_file_open(inode, file);
        if (error)
                return error;

        /*
         * If there are any blocks, read-ahead block 0 as we're almost
         * certain to have the next operation be a read there.
         */
        mode = xfs_ilock_map_shared(ip);
        if (ip->i_d.di_nextents > 0)
                xfs_dir3_data_readahead(NULL, ip, 0, -1);
        xfs_iunlock(ip, mode);
        return 0;
}

Which means that for a directory in block/leaf/node form the open of
it prior to the first readdir() call will read the extent tree into
memory and it will be pinned in memory for at least the life of the
file descriptor that is open on the directory.

If the directory is not in block form at open time, then the
first modification that takes it to block form will correctly lock
and initialise the extent tree, and concurrent readers will block on
the i_mutex at the VFS, not on the XFS iolock.

Now, if something is accessing the directory without going through
the VFS, there might be an issue, but all read accesses are via a
file descriptor of some kind and so should always have the extent
list initialised and read in correctly before reads occur due to the
code in xfs_dir_open() and serialisation via the i_mutex at the VFS
level.

So I don't see why holding the iolock during readdir is necessary,
nor how not holding it could lead to corrupt extent trees being seen
during directory operations. I need more information to understand
the issue being seen....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>