On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:
> On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > The historical reason for such behaviour existing in XFS was that in
> > 1997 the CPU and IO latency cost of unwritten extent conversion was
> > significant,
> >> (Take for example a trusted cluster filesystem backend that checks the
> >> object checksum before returning any data to the user; and if the
> >> check fails the cluster file system will try to use some other replica
> >> stored on some other server.)
> > IOWs, all they want to do is avoid the unwritten extent conversion
> > overhead. Time has shown that a bad security/performance tradeoff
> > decision was made 13 years ago in XFS, so I see little reason to
> > repeat it for ext4 today....
> I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
> of extent conversion. It's that extent conversion causes more metadata
> operations than what you'd have otherwise,
Yes, that's the "IO latency" part of the cost I mentioned above.
> which means systems that
> want to use O_DIRECT and make sure the data doesn't go away either
> have to write O_DIRECT|O_DSYNC or need to call fdatasync().
Seriously, we tell application writers _all the time_ that they
*must* use fsync/fdatasync to guarantee their data is on stable
storage and that they cannot rely on side-effects of filesystem or
storage specific behaviours (like ext3 ordered mode) to do that job
You're suggesting that by introducing FALLOC_FL_EXPOSE_OLD_DATA,
applications can rely on filesystem/storage specific behaviour to
guarantee data is on stable storage without the use of
fdatasync/fsync. Wht you describe is definitely storage specific,
because volatile write caches still needs the fdatasync to issue a
Do you see the same conflict here that I do?
> cluster file system implementor