On Mon, Mar 16, 2009 at 05:21:24AM -0400, Christoph Hellwig wrote:
> On Sun, Mar 15, 2009 at 10:40:42PM +1100, Dave Chinner wrote:
> > Unwritten extent conversion can recurse back into the filesystem due
> > to memory allocation. Memory reclaim requires I/O completions to be
> > processed to allow the callers to make progress. If the I/O
> > completion workqueue thread is doing the recursion, then we have a
> > deadlock situation.
> > Move unwritten extent completion into it's own workqueue so it
> > doesn't block I/O completions for normal delayed allocation or
> > overwrite data.
> Hmm. That was the original reason behind splitting the data from
> xfsbufd queue. So maybe the split should be just unwritten vs the
> rest and three queues?
> Btw, do you have a testcase that can reproduce this?
No, I hit it a couple of times running xfsqa on a low memory UML
image - 256MB of RAM, IIRC - during one of the fstress tests. I got
enough information to determine this was the problem and it hasn't
showed up since. I think someone also posted a lockdep trace
on LKML a couple of months back as well...