[Top] [All Lists]

Re: [PATCH 1/6] fs: add hole punching to fallocate

To: Lawrence Greenfield <leg@xxxxxxxxxx>
Subject: Re: [PATCH 1/6] fs: add hole punching to fallocate
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 12 Jan 2011 23:44:31 +1100
Cc: "Ted Ts'o" <tytso@xxxxxxx>, Josef Bacik <josef@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, linux-btrfs@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, joel.becker@xxxxxxxxxx, cmm@xxxxxxxxxx, cluster-devel@xxxxxxxxxx
In-reply-to: <AANLkTimwmJ_ZoE9oAuA1WGhCgK585jDznqnc6k0=9Ntb@xxxxxxxxxxxxxx>
References: <1289248327-16308-1-git-send-email-josef@xxxxxxxxxx> <20101109011222.GD2715@dastard> <20101109033038.GF3099@xxxxxxxxx> <20101109044242.GH2715@dastard> <20101109214147.GK3099@xxxxxxxxx> <20101109234049.GQ2715@dastard> <AANLkTimwmJ_ZoE9oAuA1WGhCgK585jDznqnc6k0=9Ntb@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Tue, Jan 11, 2011 at 04:13:42PM -0500, Lawrence Greenfield wrote:
> On Tue, Nov 9, 2010 at 6:40 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > The historical reason for such behaviour existing in XFS was that in
> > 1997 the CPU and IO latency cost of unwritten extent conversion was
> > significant,


> >> (Take for example a trusted cluster filesystem backend that checks the
> >> object checksum before returning any data to the user; and if the
> >> check fails the cluster file system will try to use some other replica
> >> stored on some other server.)
> >
> > IOWs, all they want to do is avoid the unwritten extent conversion
> > overhead. Time has shown that a bad security/performance tradeoff
> > decision was made 13 years ago in XFS, so I see little reason to
> > repeat it for ext4 today....
> I'd make use of FALLOC_FL_EXPOSE_OLD_DATA. It's not the CPU overhead
> of extent conversion. It's that extent conversion causes more metadata
> operations than what you'd have otherwise,

Yes, that's the "IO latency" part of the cost I mentioned above.

> which means systems that
> want to use O_DIRECT and make sure the data doesn't go away either
> have to write O_DIRECT|O_DSYNC or need to call fdatasync().

Seriously, we tell application writers _all the time_ that they
*must* use fsync/fdatasync to guarantee their data is on stable
storage and that they cannot rely on side-effects of filesystem or
storage specific behaviours (like ext3 ordered mode) to do that job
for them.

You're suggesting that by introducing FALLOC_FL_EXPOSE_OLD_DATA,
applications can rely on filesystem/storage specific behaviour to
guarantee data is on stable storage without the use of
fdatasync/fsync. Wht you describe is definitely storage specific,
because volatile write caches still needs the fdatasync to issue a
cache flush.

Do you see the same conflict here that I do?

> cluster file system implementor

Which one?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>