[Top] [All Lists]

Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [xfs-masters] xfs deadlock in stable kernel 3.0.4
From: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Thu, 22 Sep 2011 18:01:51 -0400
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, Stefan Priebe - Profihost AG <s.priebe@xxxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <20110922214956.GX15688@dastard>
References: <20110920172455.GA30757@xxxxxxxxxxxxx> <4E78CEFD.9030603@xxxxxxxxxxxx> <20110920223047.GA13758@xxxxxxxxxxxxx> <20110921021133.GM15688@dastard> <4E7994D3.5020103@xxxxxxxxxxxx> <20110921114237.GP15688@dastard> <20110921122649.GA16602@xxxxxxxxxxxxx> <20110921230718.GS15688@dastard> <20110922141457.GA11929@xxxxxxxxxxxxx> <20110922214956.GX15688@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Fri, Sep 23, 2011 at 07:49:56AM +1000, Dave Chinner wrote:
> On Thu, Sep 22, 2011 at 10:14:57AM -0400, Christoph Hellwig wrote:
> >         By default, a wq guarantees non-reentrance only on the same
> >     CPU.  A work item may not be executed concurrently on the same
> >     CPU by multiple workers but is allowed to be executed
> >     concurrently on multiple CPUs.  This flag makes sure
> >     non-reentrance is enforced across all CPUs.  Work items queued
> >     to a non-reentrant wq are guaranteed to be executed by at most
> >     one worker system-wide at any given time.
> > 
> > So this still seems to preferable for the ail workqueue, and should be
> > able to replace the XFS_AIL_PUSHING_BIT protections.
> No, we can't. WQ_NON_REENTRANT only protects against concurrency on
> the same CPU, not across all CPUs - it still allows concurrent
> per-CPU work processing on the same work item.

Non concurrently for a given work_struct on the same CPU is the default,
WQ_NON_REENTRANT extents that to not beeing exectuted concurrently at
all.  Check the documentation above again, or the code - just look
for the only occurance of WQ_NON_REENTRANT in kernel/workqueue.c and
the surronuding code (e.g. find_worker_executing_work and the
current_work field in struct worker)

> However, we want only a *single* AIL worker instance executing per
> filesystem, not per-cpu per filesystem. Concurrent per-filesystem
> workers will simply bash on the AIL lock trying to walk the AIL at
> the same time, and this is precisely the issue the single AIL worker
> setup is avoiding. The XFS_AIL_PUSHING_BIT is what enforces the
> single per-filesystem push worker running at any time.

I think that's exactly what WQ_NON_REENTRANT is intended for.

<Prev in Thread] Current Thread [Next in Thread>