On Mon, Nov 10, 2014 at 2:52 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Mon, Nov 10, 2014 at 10:25:40AM +0100, Miklos Szeredi wrote:
>> On Sun, Nov 9, 2014 at 12:42 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Fri, Nov 07, 2014 at 11:09:59AM -0800, Christoph Hellwig wrote:
>> >> The overlayfs merge introduces a new rename flag to create to whiteouts.
>> >> Should be a fairly easy to implement.
>> >> Miklos, do you have any good documentation and/or test cases for this?
>> > So overlayfs uses some weird char dev hack to implement whiteout
>> > inodes in directories? Why do we need a whiteout inode on disk?
>> > what information is actually stored in the whiteout inode that
>> > overlayfs actually needs? Only readdir and lookup care about
>> > whiteouts, and AFAICT nothing of the inode is ever used except
>> > checking the chrdev/whiteoutdev hack via ovl_is_whiteout(dentry).
>> > Indeed, whatever happened to just storing the whiteout in the dirent
>> > via DT_WHT and using that information on lookup in the lower
>> > filesystem to mark the dentry returned appropriately without needing
>> > to lookup a real inode?
>> The filesystem is free to implement whiteouts a dirent without an actual
> Sure, but overlayfs won't make use of it, so we'd have
> have to hack around overlayfs's ignorance of DT_WHT in several
> different places to do this. e.g.
> - in mknod to intercept creation of magical whiteout chardevs
> - in readdir so we can convert them to DT_CHR so overlayfs
> can detect them,
> - in ->lookup so we can create magical chardev inodes in
> memory rather than try to read them from disk.
> - in rename we have to detect the magical chardev inodes so
> we know it's a whiteout we are dealing with
I care little where it's intercepted, in the filesystem or in the VFS.
Obviously if this is going to be done by more than one filesystem it
makes sense to gather the common bits and pieces into VFS helpers.
And overlayfs could easily make use of DT_WHT, if it was available, I
really have no problem with that.
As long as we don't have the insane DT_WHT == negative dentry on the
userspace API, it's good (i.e. you *should* be able to remove it).
And if "tar" is able to archive the thing and restore it, that's just
an added bonus. The special chardev does all that. So in my book
it's a perfect solution.
And while I don't care if DT_WHT ends up not using an inode *on disk*
or not, a negative entry on disk with a positive dentry in the VFS is
bound to end up with complex and fiddly conversion layers, that may
not actually be worth it.
- "DT_WHT + S_IFCHR inode" OK, needs some conversion but not too difficult
- "DT_WHT + negative indode" OK, but fiddly
- "DT_CHR + S_IFCHR inode" OK, but sligthly suboptimal (or "completely
fucked up" depending on your POV).
The big advantage of not introducing DT_WHT is that it spares the pain
of introducing possibly incompatible on-disk format (i.e. will old
versions of fsck going to balk at it? will old kernels reject the
image containing DT_WHT?)
If the filesystem and tools are already prepared for the DT_WHT dirent
type then that's a non-issue, obviously.