On 5/7/14, 8:42 PM, Dave Chinner wrote:
> On Wed, May 07, 2014 at 06:29:15PM -0500, Eric Sandeen wrote:
>> The way discontiguous buffers are currently handled in
>> prefetch is by unlocking the prefetch tree and reading
>> them one at a time in pf_read_discontig(), inside the
>> normal loop of searching for buffers to read in a more
>> optimized fashion.
>> But by unlocking the tree, we allow other threads to come
>> in and find buffers which we've already stashed locally
>> on our bplist. If 2 threads think they own the same
>> set of buffers, they may both try to delete them from
>> the prefetch btree, and the second one to arrive will not
>> find it, resulting in:
>> fatal error -- prefetch corruption
>> Fix this by maintaining 2 lists; the original bplist,
>> and a new one containing only discontiguous buffers.
>> The original list can be seek-optimized as before,
>> and the discontiguous list can be read one by one
>> before we do the seek-optimized reads, after all of the
>> tree manipulation has been completed.
> Nice job finding the problem, Eric! It looks like your patch solves
> the problem, but after considering this approach for a while I think
> it's overkill. ;)
Well, that's how it goes. :)
> What the loop is trying to do is linearise all the IO and turn lots
> of small IO into a single large IO, so if we grab all the discontig
> buffers in the range, then do IO on them, then do the large IO, we
> are effectively seeking all over that range, including backwards.
> This is exactly the sort of problem the prefetch loop is trying to
mmmhm... OTOH, discontig buffers are ... fairly rare? And they do
have to be read sometime.
> So what I think is best is that we simply abort the pulling of new
> buffers off the list when we hit a discontiguous buffer. Leave the
> discontig buffer as the last on the list, and process the list as
> per normal. Remove all the remaining buffers from the btree, then
> drop the lock and do the pread64 call.
I kind of half considered something like that, but the optimizing
trims back num based on a few criteria, some involving the last
buffer in the bplist. So it's going to require a bit more awareness
> Then, check the last buffer on the bplist - if it's the discontig
> buffer (i.e. wasn't dropped during list processing), then issue the
> discontig buffer IO. It should at least start as either sequential I
> oor with a small forwards seek, so so shoul be as close to seek
> optimised as we can get for such buffers. Then it can be removed
> from the bplist, num decremented, the lock picked back up and the
> large buffer read in via pread64() can be sliced and diced
> i.e. much less code, no need for a separate list, and the seeks
> shoul dbe minimised as much as possible....
I'll give something like that a shot. Yeah, it felt a bit brute-force-y.