[Top] [All Lists]

Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Files full of zeros with coreutils-8.11 and xfs (FIEMAP related?)
From: Yongqiang Yang <xiaoqiangnk@xxxxxxxxx>
Date: Sat, 16 Apr 2011 14:05:51 +0800
Cc: Andreas Dilger <adilger@xxxxxxxxx>, Pádraig Brady <P@xxxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, "linux-ext4@xxxxxxxxxxxxxxx" <linux-ext4@xxxxxxxxxxxxxxx>, "coreutils@xxxxxxx" <coreutils@xxxxxxx>, Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=TUqK3ZntR6+MQN9wLiP1e3YRlMt72D9+pBpjoOBk31M=; b=s5FGQJSfuc8Nvh4gPM2TOPomAYhSCIxAjrNiqQJBEbf/AIiD3n1agH+VqagJpyGaxu Z1PWpjaTZ82uvLyDRsOoiKZV6iLQmrN6CGx7yCl/Y0V/Hj39AIvaacQOxNpkNO5OrLLQ FJM/9F7LU7tQ9cGRQRlkDXQ9vjzQEkgOgQJ90=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=q+VGLmMHsmBo/7hXQrBzPrwkMLHTdbWt+kNOdYlbtZPAN6aZ8EKfLKdyjkMbpzlcKh uOANsLF2CXjAFgJjKDBQbctPkLSO29hbcAlTSAW1vw0m9AQXvD+LsyQb9wBetImQwnJ/ hVXGgu7Sk9ZszAvj4zTqAG6DipcCFeTo+ks10=
In-reply-to: <20110416005040.GP21395@dastard>
References: <20110414102608.GA1678@xxxxxxxxxxxxxx> <20110414120635.GB1678@xxxxxxxxxxxxxx> <20110414140222.GB1679@xxxxxxxxxxxxxx> <4DA70BD3.1070409@xxxxxxxxxxxxxx> <4DA717B2.3020305@xxxxxxxxxxx> <20110414225904.GK21395@dastard> <4DA7836A.5040604@xxxxxxxxxxxxxx> <20110415000940.GL21395@dastard> <76FFF648-CA02-494B-A862-566C66A8CB82@xxxxxxxxx> <20110416005040.GP21395@dastard>
On Sat, Apr 16, 2011 at 8:50 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Thu, Apr 14, 2011 at 11:01:04PM -0600, Andreas Dilger wrote:
>> On 2011-04-14, at 6:09 PM, Dave Chinner <david@xxxxxxxxxxxxx>
>> wrote:
>> > No, this was explicitly laid out in the fiemap interface
>> > discussions - it's up to the applicaiton to decide if it needs
>> > to do a sync first. That's what the FIEMAP_FLAG_SYNC control
>> > flag is for.  This forces the fiemap call to do a fsync _before_
>> > getting the mapping. If you want to know the exact layout of the
>> > file is, then you must use this flag.
>> >
>> > Even so, it is recognised that this is racy - any use of the
>> > block map has a time-of-read-to-time-of-use race condition that
>> > means you have to _verify_ the copy after it completes. FYI,
>> > that's what xfs_fsr does when copying based on extent maps - if
>> > the inode has changed in _any way_ during the copy, it aborts
>> > the copy of that file.
>> >
>> > i.e. using fiemap for copying is at best a *hint* about the
>> > regions that need copying, and it is in no way a guarantee that
>> > you'll get all the information you need to make accurate copy
>> > even if you do use the synchronous variant.
>> I would tend to agree with Pádraig. If there is data in the
>> mapping (regardless of whether it is on disk or not), the FIEMAP
>> should return this to the caller.  The SYNC flag is only intended
>> to flush the data to disk for tools that are doing
>> direct-to-disk operations on the data.
> What you are suggesting is that FIEMAP needs to be page cache
> coherent, and that is far, far away from the intended use of the
> interface. Even consiering that you need to looking for active pages
> in the page cache when mapping extents say to me that you are
> doing something very wrong.
> Unwritten extents remain unwritten until the data is physically
> written to them. Therefore, to change their state, you need to sync
No, buffered writes change their state without sync.

> the data covering the range.  _Lying_ about whether an extent is in
> the unwritten state is a really bad precedence to set, especially as
> it is then guaranteed to change state when a crash occurs (Why did
> recovery zero out my file? FIEMAP said it contained data before my
> system crashed!).

All filesystems have metadata in memory which is not flushed to
permanent storage. e.g. if a extent exists in memory, but itself and
corresponding data are not flushed to permanent storage. So you said
above can only be achieved by sync before FIEMAP.  Otherwise if a
crash occurs, FIEMAP can not find data before system crashed.

Without delayed allocation, there is no difference between
preallocation case(fallocate) and normal cases.

> Don't try to mangle the API semantics every time someone doesn't
> understand how to use FIEMAP reliably. If you need the extent list
> returned by FIEMAP to match what is in the page cache *regardless of
>> Otherwise the UNMAPPED flag is useless, since even with "check,
>> copy, check" there is no guarantee that the inode is changed
>> _during_ the copy operation. It could have been written into the
>> cache _before_ the FIEMAP and remain unchanged and in your case
>> there would be no way to know any data was ever written to the
>> file without SYNC on ever single file before FIEMAP.
> I can't find any UNMAPPED flag in the FIEMAP interface, so I have no
> idea what you are refering to here.
> Cheers,
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx
> --
> To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Best Wishes
Yongqiang Yang

<Prev in Thread] Current Thread [Next in Thread>