Hi Dave,
Thanks for the review and comments.
On Sat, 2011-11-19 at 12:14 +1100, Dave Chinner wrote:
> On Fri, Nov 18, 2011 at 01:20:54PM -0600, Chandra Seetharaman wrote:
> > l_reserveq and l_writeq maintains a list of processes waiting to get log
> > space. Processes are supposed to get in the list when the amount of free
> > space available in the log is less than what they need.
> >
> > When space becomes available current code, wakes up the processes, but
> > expect the processes to remove themselves from the queue.
> >
> > Since the lock protecting the list is only acquired later by the woken
> > up process, there is a window of time were a new process that is looking
> > for space can wrongly get into the queue while there is enough space
> > available.
> >
> > Since there is enough space available, this process can never be woken
> > up, which leads to the hang of the process.
>
> Excellent work, Chandra.
Thanks Dave.
Made all the changes (except one as noted below). Testing the new code
and will sent it soon.
:
:
:
<snip>
> > @@ -2550,8 +2552,7 @@ redo:
> > free_bytes = xlog_space_left(log, &log->l_grant_reserve_head);
> > if (free_bytes < need_bytes) {
> > spin_lock(&log->l_grant_reserve_lock);
> > - if (list_empty(&tic->t_queue))
> > - list_add_tail(&tic->t_queue, &log->l_reserveq);
> > + list_add_tail(&tic->t_queue, &log->l_reserveq);
> >
> > trace_xfs_log_grant_sleep2(log, tic);
> >
>
> Ok, we now have the assumption that when we enter this code the
> ticket is not on any queue at all. Can you add an
> "ASSERT(list_empty(&tic->t_queue));" to the code before the above
> xlog_space_left() call? That way we'll know if we violate that
> assumption as potentially corrupt memory....
As Christoph pointed, we might have been woken up by some system event.
So, I changed the code a bit around here. Please comment on the new
code.
<snip>
|