On 5/8/14, 8:17 PM, Dave Chinner wrote:
> From: Dave Chinner <dchinner@xxxxxxxxxx>
>
> The way discontiguous buffers are currently handled in prefetch is
> by unlocking the prefetch tree and reading them one at a time in
> pf_read_discontig(), inside the normal loop of searching for buffers
> to read in a more optimized fashion.
>
> But by unlocking the tree, we allow other threads to come in and
> find buffers which we've already stashed locally on our bplist[].
> If 2 threads think they own the same set of buffers, they may both
> try to delete them from the prefetch btree, and the second one to
> arrive will not find it, resulting in:
>
> fatal error -- prefetch corruption
>
> To fix this, simply abort the buffer gathering loop when we come
> across a discontiguous buffer, process the gathered list as per
> normal, and then after running the large optimised read, check to
> see if the last buffer on the list is a discontiguous buffer.
> If is is discontiguous, then issue the discontiguous buffer read
> while the locks are not held. We only ever have one discontiguous
> buffer per read loop, so it is safe just to check the last buffer in
> the list.
>
> The fix is loosely based on a a patch provided by Eric Sandeen, who
> did all the hard work of finding the bug and demonstrating how to
> fix it.
Ok, this makes sense to me. The comment above the discontig read
seems a bit confusing; you say it's safe to read while unlocked,
but I wouldn't have expected it not to be - the lock is just for
btree manipulation, and that's not being done. So I think the
comment adds a little confusion rather than clarification.
But the code looks fine.
Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
> Reported-by:Eric Sandeen <sandeen@xxxxxxxxxx>
> Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
> ---
> repair/prefetch.c | 53 +++++++++++++++++++++++++++--------------------------
> 1 file changed, 27 insertions(+), 26 deletions(-)
>
> diff --git a/repair/prefetch.c b/repair/prefetch.c
> index 65fedf5..e269f1f 100644
> --- a/repair/prefetch.c
> +++ b/repair/prefetch.c
> @@ -444,27 +444,6 @@ pf_read_inode_dirs(
> }
>
> /*
> - * Discontiguous buffers require multiple IOs to fill, so we can't use any
> - * linearising, hole filling algorithms on them to avoid seeks. Just remove
> them
> - * for the prefetch queue and read them straight into the cache and release
> - * them.
> - */
> -static void
> -pf_read_discontig(
> - struct prefetch_args *args,
> - struct xfs_buf *bp)
> -{
> - if (!btree_delete(args->io_queue, XFS_DADDR_TO_FSB(mp, bp->b_bn)))
> - do_error(_("prefetch corruption\n"));
> -
> - pthread_mutex_unlock(&args->lock);
> - libxfs_readbufr_map(mp->m_ddev_targp, bp, 0);
> - bp->b_flags |= LIBXFS_B_UNCHECKED;
> - libxfs_putbuf(bp);
> - pthread_mutex_lock(&args->lock);
> -}
> -
> -/*
> * pf_batch_read must be called with the lock locked.
> */
> static void
> @@ -496,13 +475,19 @@ pf_batch_read(
> }
> while (bplist[num] && num < MAX_BUFS && fsbno < max_fsbno) {
> /*
> - * Handle discontiguous buffers outside the seek
> - * optimised IO loop below.
> + * Discontiguous buffers need special handling, so stop
> + * gathering new buffers and process the list and this
> + * discontigous buffer immediately. This avoids the
> + * complexity of keeping a separate discontigous buffer
> + * list and seeking back over ranges we've already done
> + * optimised reads for.
> */
> if ((bplist[num]->b_flags & LIBXFS_B_DISCONTIG)) {
> - pf_read_discontig(args, bplist[num]);
> - bplist[num] = NULL;
> - } else if (which != PF_META_ONLY ||
> + num++;
> + break;
> + }
> +
> + if (which != PF_META_ONLY ||
> !B_IS_INODE(XFS_BUF_PRIORITY(bplist[num])))
> num++;
> if (num == MAX_BUFS)
> @@ -570,6 +555,22 @@ pf_batch_read(
> * now read the data and put into the xfs_but_t's
> */
> len = pread64(mp_fd, buf, (int)(last_off - first_off),
> first_off);
> +
> + /*
> + * Check the last buffer on the list to see if we need to
> + * process a discontiguous buffer. The gather loop guarantees
> + * we only ever have a single discontig buffer on the list,
> + * and that it is the last buffer, so it is safe to do this
> + * check and read here like this while we aren't holding any
> + * locks.
> + */
> + if ((bplist[num - 1]->b_flags & LIBXFS_B_DISCONTIG)) {
> + libxfs_readbufr_map(mp->m_ddev_targp, bplist[num - 1],
> 0);
> + bplist[num - 1]->b_flags |= LIBXFS_B_UNCHECKED;
> + libxfs_putbuf(bplist[num - 1]);
> + num--;
> + }
> +
> if (len > 0) {
> /*
> * go through the xfs_buf_t list copying from the
>
|