[ cut to just the important points ]
On Thu, Aug 04, 2016 at 06:40:42PM +0000, Kani, Toshimitsu wrote:
> On Tue, 2016-08-02 at 10:21 +1000, Dave Chinner wrote:
> > If I drop the fsync from the
> > buffered IO path, bandwidth remains the same but runtime drops to
> > 0.55-0.57s, so again the buffered IO write path is faster than DAX
> > while doing more work.
>
> I do not think the test results are relevant on this point because both
> buffered and dax write() paths use uncached copy to avoid clflush. The
> buffered path uses cached copy to the page cache and then use uncached copy to
> PMEM via writeback. Therefore, the buffered IO path also benefits from using
> uncached copy to avoid clflush.
Except that I tested without the writeback path for buffered IO, so
there was a direct comparison for single cached copy vs single
uncached copy.
The undenial fact is that a write() with a single cached copy with
all the overhead of dirty page tracking is /faster/ than a much
shorter, simpler IO path that uses an uncached copy. That's what the
numbers say....
> Cached copy (req movq) is slightly faster than uncached copy,
Not according to Boaz - he claims that uncached is 20% faster than
cached. How about you two get together, do some benchmarking and get
your story straight, eh?
> and should be
> used for writing to the page cache. For writing to PMEM, however, additional
> clflush can be expensive, and allocating cachelines for PMEM leads to evict
> application's cachelines.
I keep hearing people tell me why cached copies are slower, but
no-one is providing numbers to back up their statements. The only
numbers we have are the ones I've published showing cached copies w/
full dirty tracking is faster than uncached copy w/o dirty tracking.
Show me the numbers that back up your statements, then I'll listen
to you.
-Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx
|