On Thu, Sep 23, 2010 at 01:58:32PM -0500, Alex Elder wrote:
> On Thu, 2010-09-23 at 12:27 +1000, Dave Chinner wrote:
> > From: Dave Chinner <dchinner@xxxxxxxxxx>
> > I have been seeing relatively frequent pauses in transaction throughput up
> > to
> > 30s long under heavy parallel workloads. The only thing that seemed strange
> > about them was that the xfsaild was active during the pauses, but making no
> > progress. It was running exactly 20 times a second (on the 50ms no-progress
> > backoff), and the number of pushbuf events was constant across this time as
> > well. IOWs, the xfsaild appeared to be stuck on buffers that it could not
> > push
> > out.
> . . .
> If you like I can take this patch directly (i.e., not wait for you to
> send a separate pull request). It fixes a real bug but since delayed
> logging still an experimental feature I am not inclined to send it to
> Linus at this point in the cycle. Let me know if you disagree.
I think it needs to go to linus as well back to 2.6.35.y as it can
result in recovery silently corrupting the filesystem if a
checkpoint larger than half the log is present in the log during
recovery. I don' tthink the experimental status of the code makes
any difference, especially as we've already pushed checkpoint/
recovery corruption fixes into this release....
I'm adding it to the start of the metadata scale patchset branch
right now, which I'll probably being sending a pull request out for