xfs
[Top] [All Lists]

Re: [PATCH] xfs: Remove the entries from the queue while waking them up.

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH] xfs: Remove the entries from the queue while waking them up.
From: Chandra Seetharaman <sekharan@xxxxxxxxxx>
Date: Mon, 21 Nov 2011 17:01:52 -0600
Cc: XFS Mailing List <xfs@xxxxxxxxxxx>
In-reply-to: <20111119011450.GL7046@dastard>
Organization: IBM
References: <1321644054.2201.80.camel@xxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20111119011450.GL7046@dastard>
Reply-to: sekharan@xxxxxxxxxx
Hi Dave,

Thanks for the review and comments.

On Sat, 2011-11-19 at 12:14 +1100, Dave Chinner wrote:
> On Fri, Nov 18, 2011 at 01:20:54PM -0600, Chandra Seetharaman wrote:
> > l_reserveq and l_writeq maintains a list of processes waiting to get log
> > space. Processes are supposed to get in the list when the amount of free
> > space available in the log is less than what they need.
> > 
> > When space becomes available current code, wakes up the processes, but
> > expect the processes to remove themselves from the queue.
> > 
> > Since the lock protecting the list is only acquired later by the woken
> > up process, there is a window of time were a new process that is looking
> > for space can wrongly get into the queue while there is enough space
> > available. 
> > 
> > Since there is enough space available, this process can never be woken
> > up, which leads to the hang of the process.
> 
> Excellent work, Chandra.

Thanks Dave.

Made all the changes (except one as noted below). Testing the new code
and will sent it soon.

:
:
:
<snip>

> > @@ -2550,8 +2552,7 @@ redo:
> >     free_bytes = xlog_space_left(log, &log->l_grant_reserve_head);
> >     if (free_bytes < need_bytes) {
> >             spin_lock(&log->l_grant_reserve_lock);
> > -           if (list_empty(&tic->t_queue))
> > -                   list_add_tail(&tic->t_queue, &log->l_reserveq);
> > +           list_add_tail(&tic->t_queue, &log->l_reserveq);
> >  
> >             trace_xfs_log_grant_sleep2(log, tic);
> >  
> 
> Ok, we now have the assumption that when we enter this code the
> ticket is not on any queue at all. Can you add an
> "ASSERT(list_empty(&tic->t_queue));" to the code before the above
> xlog_space_left() call? That way we'll know if we violate that
> assumption as potentially corrupt memory....

As Christoph pointed, we might have been woken up by some system event.
So, I changed the code a bit around here. Please comment on the new
code.


<snip>

<Prev in Thread] Current Thread [Next in Thread>