[Top] [All Lists]

Re: [PATCH 3/4] xfs: revert to using a kthread for AIL pushing

To: Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>
Subject: Re: [PATCH 3/4] xfs: revert to using a kthread for AIL pushing
From: Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>
Date: Wed, 19 Oct 2011 13:16:23 +0200
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Tejun Heo <tj@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20111010055546.GA1641@xxxxxxxxxxxxxx>
References: <20111006183257.036884724@xxxxxxxxxxxxxxxxxxxxxx> <20111006183549.770414484@xxxxxxxxxxxxxxxxxxxxxx> <20111010014509.GT3159@dastard> <20111010055546.GA1641@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv: Gecko/20110831 Thunderbird/3.1.13
On 2011.10.10 at 12:45 +1100, Dave Chinner wrote:
On Thu, Oct 06, 2011 at 02:33:00PM -0400, Christoph Hellwig wrote:
Currently we have a few issues with the way the workqueue code is used to
implement AIL pushing:

  - it accidentally uses the same workqueue as the syncer action, and thus
    can be prevented from running if there are enough sync actions active
    in the system.
  - it doesn't use the HIGHPRI flag to queue at the head of the queue of
    work items

At this point I'm not confident enough in getting all the workqueue flags and
tweaks right to provide a perfectly reliable execution context for AIL
pushing, which is the most important piece in XFS to make forward progress
when the log fills.

Revert back to use a kthread per filesystem which fixes all the above issues
at the cost of having a task struct and stack around for each mounted
filesystem.  In addition this also gives us much better ways to diagnose
any issues involving hung AIL pushing and removes a small amount of code.

Signed-off-by: Christoph Hellwig<hch@xxxxxx>
Reported-by: Stefan Priebe<s.priebe@xxxxxxxxxxxx>
Tested-by: Stefan Priebe<s.priebe@xxxxxxxxxxxx>

I'd much prefer to fix the problems with the workqueue usage than
revert back to using a thread, but seeing as I cannot reproduce the
hangs I can't really track down whatever problem there is. So,
a bit reluctantly:

Any news on this problem? What happens with the next long term stable kernel 3.0.X? How do you proceed with this bug?


<Prev in Thread] Current Thread [Next in Thread>