On Thu, Jul 28, 2016 at 12:51:30AM +0300, Kirill A. Shutemov wrote:
> On Sat, Dec 19, 2015 at 12:55:59AM -0800, Darrick J. Wong wrote:
> > Hoist the btrfs EXTENT_SAME ioctl up to the VFS and make the name
> > more systematic (FIDEDUPERANGE).
> >
> > Signed-off-by: Darrick J. Wong <darrick.wong@xxxxxxxxxx>
> > ---
> > fs/compat_ioctl.c | 1
> > fs/ioctl.c | 38 ++++++++++++++++++
> > fs/read_write.c | 100
> > +++++++++++++++++++++++++++++++++++++++++++++++
> > include/linux/fs.h | 4 ++
> > include/uapi/linux/fs.h | 30 ++++++++++++++
> > 5 files changed, 173 insertions(+)
> >
> >
> > diff --git a/fs/compat_ioctl.c b/fs/compat_ioctl.c
> > index 70d4b10..eab31e7 100644
> > --- a/fs/compat_ioctl.c
> > +++ b/fs/compat_ioctl.c
> > @@ -1582,6 +1582,7 @@ COMPAT_SYSCALL_DEFINE3(ioctl, unsigned int, fd,
> > unsigned int, cmd,
> >
> > case FICLONE:
> > case FICLONERANGE:
> > + case FIDEDUPERANGE:
> > goto do_ioctl;
> >
> > case FIBMAP:
> > diff --git a/fs/ioctl.c b/fs/ioctl.c
> > index 84c6e79..fcdd33b 100644
> > --- a/fs/ioctl.c
> > +++ b/fs/ioctl.c
> > @@ -568,6 +568,41 @@ static int ioctl_fsthaw(struct file *filp)
> > return thaw_super(sb);
> > }
> >
> > +static long ioctl_file_dedupe_range(struct file *file, void __user *arg)
> > +{
> > + struct file_dedupe_range __user *argp = arg;
> > + struct file_dedupe_range *same = NULL;
> > + int ret;
> > + unsigned long size;
> > + u16 count;
> > +
> > + if (get_user(count, &argp->dest_count)) {
> > + ret = -EFAULT;
> > + goto out;
> > + }
> > +
> > + size = offsetof(struct file_dedupe_range __user, info[count]);
(I still hate this interface.)
> Vlastimil triggered this during fuzzing:
>
> http://paste.opensuse.org/view/raw/99203426
>
> High order allocation without __GFP_NOWARN + fallback. That's not good.
>
> Basically, we don't have any sanity check of 'dest_count' here. This u16
> comes directly from userspace. And we call memdup_user() based on it.
>
> Here's a program which makes kernel allocate order-9 page:
>
> https://gist.github.com/kiryl/2b344b51da1fd2725be420a996b10d22
>
> Should we put some reasonable upper limit for the 'dest_count'?
> What is typical 'dest_count'?
There are two userland programs I know of that call this ioctl. The
first is xfs_io, which always sets dest_count = 1.
The other is duperemove, which seems capable of setting dest_count to
however many fragments it finds, up to a max of 120. Capping size to
x86's 4k page size yields 127 entries. On bigger machines with 64k
pages, that increases to 2047. I think that's enough for anybody.
(Honestly, 127 dedupe candidates * max 16M extent length is already
2GB of IO for a single call.)
--D
>
> > +
> > + same = memdup_user(argp, size);
> > + if (IS_ERR(same)) {
> > + ret = PTR_ERR(same);
> > + same = NULL;
> > + goto out;
> > + }
> > +
> > + ret = vfs_dedupe_file_range(file, same);
> > + if (ret)
> > + goto out;
> > +
> > + ret = copy_to_user(argp, same, size);
> > + if (ret)
> > + ret = -EFAULT;
> > +
> > +out:
> > + kfree(same);
> > + return ret;
> > +}
> > +
>
> --
> Kirill A. Shutemov
|