[Top] [All Lists]

Re: [PATCH 4/4] xfs: fix the logspace waiting algorithm

To: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Subject: Re: [PATCH 4/4] xfs: fix the logspace waiting algorithm
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 5 Dec 2011 13:53:16 +1100
Cc: Ben Myers <bpm@xxxxxxx>, Dave Chinner <dchinner@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20111202115141.GA21643@xxxxxxxxxxxxx>
References: <20111128081732.350228200@xxxxxxxxxxxxxxxxxxxxxx> <20111128081925.981681380@xxxxxxxxxxxxxxxxxxxxxx> <20111130235641.GX29840@xxxxxxx> <20111201072149.GA1319@xxxxxxxxxxxxx> <20111201195128.GZ29840@xxxxxxx> <20111202115141.GA21643@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Dec 02, 2011 at 06:51:41AM -0500, Christoph Hellwig wrote:
> On Thu, Dec 01, 2011 at 01:51:28PM -0600, Ben Myers wrote:
> > Process A reads from the grant reserve head at 2641 (and there currently is 
> > enough space)
> > Process B wakes at either 2646 or 2650, in xlog_reserveq_wait, locks, and 
> > reads from the grant reserve head (and currently there is enough space)
> > Process B removes itself from the list 
> > Process A reads from the reservq list and finds it to be empty
> > Process A finds that there was enough space at 2646
> > Process B returns from xlog_reserveq_wait, unlocks, grants space at 2656, 
> > returns
> > Process A grants log space at 2656, and returns
> > 
> > AFAICS there is nothing that prevents these guys from granting the same
> > space when you approach free_bytes >= need_bytes concurrently. 
> > 
> > This lockless stuff is always a mind job for me.  I'll take another look at
> > some of the other aspects of the patch.  Even if it doesn't resolve my
> > question about the lockless issue, it seems to resolve Chandra's race.
> Indeed, I think we have this race.  Then again I I think we had
> exactly the same one before, too. 

What the current code does is this (ignoring this patch):

        check free space
        if (no free space) {
                add to list;
                                        space becomes available
                                        wake up
                goto again;

        remove from list
        grant space
                do {
                        grant calc
                } while (head does not move)

The problem being talked about, as I understand it, is that free
space can change between the check and the grant, resulting in
process B also seeing free space because the space process A thinks
it has but has not yet accounted for it.

It seems to me that this can happen regardless of whether we have a
queue of waiters or not, so the change to how the waiters are woken
in Christoph's patch is irrelevant.

> The only way to fix it would be
> to do a sort of double cmpxchg that only moves the grant head forward
> if it's still in available space vs the tails lsn.

Actually, it's quite simple to fix simply by having the free space
check return the grant head value at the time of the check and then
using that as the old value in the cmpxchg loop for updating the
grant head. That is:

        free = xlog_space_left(..... &old_head);

        if (xlog_grant_add_space(log, l_grant_reserve_head, need_bytes, 
                goto again;

and xlog_grant_log_space() returns 1 if the old_head does not match
the current head so we go araound again.

I don't think we don't need to do this for the write head update in
xlog_grant_log_space() because once we have been granted reserve
head space we are guarantee to have write head space.

We do, however, need to make the same change for the write head in
xlog_regrant_write_log_space() as that has potential for the same
race when two permanent transactions commit and race to re-reserve
the write space that was just consumed by the committed transaction.

And to retain existing behaviour for other callers, perhaps an
old_head value of -1 (NULL LSN value) can be passed to indicate that
current head value should be used and to loop until the change is

Does that make sense? It retains the same lockless behaviour, but
closes the race condition of the grant head changing from the time
of read to the time of actual modification....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>