xfs
[Top] [All Lists]

[PATCH 0/6] xfs: direct IO invalidation and related fixes

To: xfs@xxxxxxxxxxx
Subject: [PATCH 0/6] xfs: direct IO invalidation and related fixes
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 21 Aug 2014 15:09:08 +1000
Cc: clm@xxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Hi folks,

This patch set started when I was testing Chris Mason's direct IO
read invalidation fix (patch 2 of the series). I hit a problem
testing it on a 1k block size filesystem, and that opened a can of
bugs. Luckily, the can of bugs fell into the fire, and so the bug
hunt wasn't long and drawn out.

In the end, the class of bugs that was uncovered is prevented from
occurring by the first patch (hmmm, not sure the title is accurate
anymore) which then means it is safe to fix the invalidation bugs
in the direct Io code (Chris fixed read, I fixed write). With these
issues fixed, we can trim the range of invalidation to just that of
the direct IO in progress and that finally works without causing
random regressions.

This then left fsx tripping over issues with collapse range calls
that Brian had already found the cause of, so this series adds his
patch to change the inode logging and my temporary workaround of
flushing the entire file before running the collapse range
operation.

With these 6 patches, the troublesome fsx configuration from
generic/263 goes from assert failing on the 1192nd operation to
running for over 60 million operations before tripping over another
collapse range issue. This was repeated on multiple test machines,
with a ramdisk based test getting 61,350,000 million ops, a 2p VM
getting 63,700,000 ops and a 1p VM getting 65,450,000 million ops
before failing.

The series does result in generic/247 now occasionally tripping an
invalidation failure, but this is intentionally causing direct Io
writes and mmap writes to race and so this is an expected failure
when the code is working correctly. i.e. page faults cannot be
synchronised against any other IO operation.

Hence I think this patch series fixes the root cause of another long
standing bufferhead coherency issue that we'd otherwise covered up.
Please test and review - I want to send this patchset to Linux for
3.17-rc3 if possible...

-Dave.

<Prev in Thread] Current Thread [Next in Thread>