On Wed, 31 Aug 2011, Dave Chinner wrote:
On Tue, Aug 30, 2011 at 06:17:02PM -0700, Sunil Mushran wrote:
On 08/25/2011 06:35 PM, Dave Chinner wrote:
Agreed, that's the way I'd interpret it, too. So perhaps we need to
ensure that this interpretation is actually tested by this test?
How about some definitions to work by:
Data: a range of the file that contains valid data, regardless of
whether it exists in memory or on disk. The valid data can be
preceeded and/or followed by an arbitrary number of zero bytes
dependent on the underlying implementation of hole detection.
Hole: a range of the file that contains no data or is made up
entirely of NULL (zero) data. Holes include preallocated ranges of
files that have not had actual data written to them.
Does that make sense? It has sufficient flexibility in it for the
existing generic "non-implementation", allows for filesystems to
define their own hole detection boundaries (e.g. filesystem block
size), and effectively defines how preallocated ranges from
fallocate() should be treated (i.e. as holes). If we can agree on
those definitions, I think that we should document them in both the
kernel and the man page that defines SEEK_HOLE/SEEK_DATA so everyone
is on the same page...
We should not tie in the definition to existing fs technologies.
Such as? If we don't use well known, well defined terminology, we
end up with ambiguous, vague functionality and inconsistent
we should let the fs weigh the cost of providing accurate information
with the possible gain in performance.
A range in a file that could contain something other than nulls.
If in doubt, it is data.
A range in a file that only contains nulls.
And that's -exactly- the ambiguous, vague definition that has raised
all these questions in the first place. I was in doubt about whether
unwritten extents can be considered a hole, and by your definition
that means it should be data. But Andreas seems to be in no doubt it
should be considered a hole.
Hence if I implement XFS support and Andreas implements ext4 support
by your defintion, we end with vastly different behaviour even
though the two filesystems use the same underlying technology for
preallocated ranges. That's exactly the inconsistency in
implementation that I'd like us to avoid.
IOWs, the definition needs to be clear enough to prevent these
inconsistencies from occurring. Indeed, the phrase "preallocated
ranges that have not had data written to them" is as independent of
filesystem implementation or technologies as possible. However,
because Linux supports preallocation (unlike our reference
platform), and we encourage developers to use it where appropriate,
it is best that we define how we expect such ranges to behave
clearly. That makes life easier for everyone.
Since a sparse file has the holes filled by nulls by definition, it seems
fairly clear that they chould count as holes. In fact, I would not be
surprised to see some filesystem _only_ report the unwritten pieces of
sparse files as holes (not any other ranges of nulls)
the question I have is how large does the range of nulls need to be before
it's reported as a hole? disk sectors, filesystem blocks, other?