xfs
[Top] [All Lists]

Re: [RFC] unifying write variants for filesystems

To: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Subject: Re: [RFC] unifying write variants for filesystems
From: Anton Altaparmakov <anton@xxxxxxxxxx>
Date: Sun, 2 Feb 2014 23:16:10 +0000
Cc: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Jens Axboe <axboe@xxxxxxxxx>, Mark Fasheh <mfasheh@xxxxxxxx>, Joel Becker <jlbec@xxxxxxxxxxxx>, linux-fsdevel <linux-fsdevel@xxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Sage Weil <sage@xxxxxxxxxxx>, Steve French <sfrench@xxxxxxxxx>, Dave Kleikamp <shaggy@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx>
References: <20140114132207.GA25170@xxxxxxxxxxxxx> <20140114172033.GU10323@xxxxxxxxxxxxxxxxxx> <20140118064040.GE10323@xxxxxxxxxxxxxxxxxx> <CA+55aFw4LgyYEkygxHUnpKZg3jMACGzsyENc9a9rWFmLcaRefQ@xxxxxxxxxxxxxx> <20140118074649.GF10323@xxxxxxxxxxxxxxxxxx> <CA+55aFzM0N7WjqnLNnuqTkbj3iws9f3bYxei=ZBCM8hvps4zYg@xxxxxxxxxxxxxx> <20140118201031.GI10323@xxxxxxxxxxxxxxxxxx> <20140119051335.GN10323@xxxxxxxxxxxxxxxxxx> <20140120135514.GA21567@xxxxxxxxxxxxx> <CA+55aFzEA-eM9v2PvsWx4v4ANaKXuRGYyGCkegJg++rhtHvnig@xxxxxxxxxxxxxx> <20140201224301.GS10323@xxxxxxxxxxxxxxxxxx>
Sender: Anton Altaparmakov <aia21@xxxxxxxxxxxxxxxx>
Hi Al,

On 1 Feb 2014, at 22:43, Al Viro <viro@xxxxxxxxxxxxxxxxxx> wrote:
> * ntfs_file_buffered_write() should switch to iov_iter as well.  It's
> open-coding a lot of iov_iter stuff.

The NTFS code predates iov_iter by more than 10 years so it simply wasn't there 
to use when I wrote NTFS...  But yes, NTFS should be updated to use it, I agree 
completely.

> It's not entirely trivial and
> I'd really like to hear from ntfs folks on that, though, and the current
> code looks deadlock-prone.  We prefault everything, then lock the pages
> to which we'll be copying, then attempt to do __copy_from_user_inatomic(),
> falling back to __copy_from_user() if that fails.  Fine, but what happens
> if the source of write() is mmaped from the same file and we lose CPU while
> prefaulting the last page, have memory pressure evict the first one, then
> have __ntfs_grab_cache_pages() allocate and lock (nonuptodate) pages
> we'll be copying to and have __copy_from_user() try to copy *from* those
> same pages?  We are doing it while holding these pages locked, so pagefault
> will have really fun time with them...  Anton?

That would deadlock.  You are (of course) quite right.  But the generic file 
write code used to suffer from the same deadlock problem and no-one ever 
complained much about it IIRC...

In the current kernel the problem does not exist any more (due to the do the 
atomic copy with pagefaults disabled and then retrying with just the first 
segment and keeping retrying till it works) so indeed updating NTFS to use 
iov_iter would be a good opportunity to get that fixed in NTFS as well.

It does not matter if for NTFS it turns out to be quite inefficient (due to 
locking/unlocking several pages at once) as it is such a crazy corner case that 
will hardly ever be triggered in real life especially so as the only time NTFS 
operates on more than one page at once in ntfs_file_buffered_write() is when a 
write happens into a sparse logical block (NTFS "cluster") and only on volumes 
with a logical block size > PAGE_CACHE_SIZE which with a minimum 
PAGE_CACHE_SIZE of 4096 bytes on Linux and the fact that Windows only ever 
creates volumes with a logical block size of 4096 (unless the user specifically 
forces a different block size when creating the volume) it means that it 
basically hardly ever happens.  Also, the NTFS kernel driver never creates 
sparse files, i.e. the sparse file would have had to come from Windows or 
another NTFS implementation...

I will have a look at moving NTFS to iov_iter and fixing the potential deadlock 
at the same time.

Best regards,

        Anton
-- 
Anton Altaparmakov <aia21 at cam.ac.uk> (replace at with @)
Unix Support, Computing Service, University of Cambridge
J.J. Thomson Avenue, Cambridge, CB3 0RB, UK

<Prev in Thread] Current Thread [Next in Thread>