[Top] [All Lists]

Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)

To: Yongqiang Yang <xiaoqiangnk@xxxxxxxxx>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 18 Apr 2011 10:35:53 +1000
Cc: Andreas Dilger <adilger@xxxxxxxxx>, Pádraig Brady <P@xxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, "linux-ext4@xxxxxxxxxxxxxxx" <linux-ext4@xxxxxxxxxxxxxxx>, "coreutils@xxxxxxx" <coreutils@xxxxxxx>, Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
In-reply-to: <BANLkTikEeXcvjgREoRCgriWAhZfnxJVtKQ@xxxxxxxxxxxxxx>
References: <20110414120635.GB1678@xxxxxxxxxxxxxx> <20110414140222.GB1679@xxxxxxxxxxxxxx> <4DA70BD3.1070409@xxxxxxxxxxxxxx> <4DA717B2.3020305@xxxxxxxxxxx> <20110414225904.GK21395@dastard> <4DA7836A.5040604@xxxxxxxxxxxxxx> <20110415000940.GL21395@dastard> <76FFF648-CA02-494B-A862-566C66A8CB82@xxxxxxxxx> <20110416005040.GP21395@dastard> <BANLkTikEeXcvjgREoRCgriWAhZfnxJVtKQ@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Sat, Apr 16, 2011 at 02:05:51PM +0800, Yongqiang Yang wrote:
> On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
> >> On 2011-04-14, at 6:09 PM, Dave Chinner <david@xxxxxxxxxxxxx>
> >> wrote:
> >> > No, this was explicitly laid out in the fiemap interface
> >> > discussions - it's up to the applicaiton to decide if it needs
> >> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
> >> > flag is for.  This forces the fiemap call to do a fsync _before_
> >> > getting the mapping. If you want to know the exact layout of the
> >> > file is, then you must use this flag.
> >> >
> >> > Even so, it is recognised that this is racy - any use of the
> >> > block map has a time-of-read-to-time-of-use race condition that
> >> > means you have to _verify_ the copy after it completes. FYI,
> >> > that's what xfs_fsr does when copying based on extent maps - if
> >> > the inode has changed in _any way_ during the copy, it aborts
> >> > the copy of that file.
> >> >
> >> > i.e. using fiemap for copying is at best a *hint* about the
> >> > regions that need copying, and it is in no way a guarantee that
> >> > you'll get all the information you need to make accurate copy
> >> > even if you do use the synchronous variant.
> >>
> >> I would tend to agree with Pádraig. If there is data in the
> >> mapping (regardless of whether it is on disk or not), the FIEMAP
> >> should return this to the caller.  The SYNC flag is only intended
> >> to flush the data to disk for tools that are doing
> >> direct-to-disk operations on the data.
> >
> > What you are suggesting is that FIEMAP needs to be page cache
> > coherent, and that is far, far away from the intended use of the
> > interface. Even consiering that you need to looking for active pages
> > in the page cache when mapping extents say to me that you are
> > doing something very wrong.
> >
> > Unwritten extents remain unwritten until the data is physically
> > written to them. Therefore, to change their state, you need to sync
> No, buffered writes change their state without sync.

They shouldn't.

> > the data covering the range.  _Lying_ about whether an extent is in
> > the unwritten state is a really bad precedence to set, especially as
> > it is then guaranteed to change state when a crash occurs (Why did
> > recovery zero out my file? FIEMAP said it contained data before my
> > system crashed!).
> All filesystems have metadata in memory which is not flushed to
> permanent storage. e.g. if a extent exists in memory, but itself and
> corresponding data are not flushed to permanent storage.

Sure, but in the case of unwritten extents, XFS does not change the
metadata state in memory until *after the physical IO is completed*.
I'm pretty sure that btrfs is the same.

IOWs, despite the fact that a buffered write has occurred, no
metadata has changed state in memory, and the extents are still
unwritten in both memory and on disk....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>