[Top] [All Lists]

Re: [PATCH] userns: Convert xfs to use kuid/kgid where appropriate

To: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
Subject: Re: [PATCH] userns: Convert xfs to use kuid/kgid where appropriate
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 20 Jun 2013 11:41:33 +1000
Cc: Dwight Engen <dwight.engen@xxxxxxxxxx>, "Eric W. Biederman" <ebiederm@xxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <8761x9x2i5.fsf@xxxxxxxxxxxx>
References: <20130619110948.0bfafa2b@xxxxxxxxxx> <8761x9x2i5.fsf@xxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Jun 19, 2013 at 01:35:30PM -0700, Eric W. Biederman wrote:
> I am copying my gmail address so I have a chance of seeing replies from
> Dave Chiner.  So far the only way I have been able to read his replies
> has been to read mailling lists.  Which has not be conductive to having
> this code discussed properly.  Hopefully copying my gmail address will
> allow us to have a reasonable and timely conversation.
> Dwight Engen <dwight.engen@xxxxxxxxxx> writes:
> > Use uint32 from init_user_ns for xfs internal uid/gid representation in
> > acl, xfs_icdinode. 
> From my review of the code earlier that just isn't safe.  It allows all
> kinds of things to slip through.

Such as?

> > Conversion of kuid/gid is done at the vfs boundary,
> > other user visible xfs specific interfaces (bulkstat, eofblocks filter)
> > expect uint32 init_user_ns uid/gid values.
> From my earlier review of the code conversion at the vfs boundary is
> not safe.    

So you've claimed.

> First off kuid_t and kgid_t are not a vfs concepts, they are linux
> kernel concepts, and xfs is in the linux kernel.  What makes this
> relevant is not all filesystem accesses are through the vfs so all of
> the necessary conversions for security and a consistent user experience
> can be had by only performing conversions at the user/kernel boundary.

Right. Boundaries and consistent conversion across them are
important. They are especially important in filesystems for
converting to/from on-disk and kernel-native formats. Blindly
pushing kernel structure down as far are you can make them reach
ignores important boundaries.

You might want to hit XFS with a big hammer but the right solution
is far more nuanced that that.

> In particular by being sloppy and not pushing kuid_t/kgid_t further down
> you did not handle all of the conversions needed at the user/kernel

The kuid_t/kgid_t is actually pushed down this far - it's in the
struct inode - the code currently uses the on-disk XFS uid/gid,
not the struct inode's kuid_t/kgid_t. That's easily fixable.

Indeed, that's where most of the work needs to be done in XFS -
using VFS(ip)->i_uid instead of ip->i_d.di_uid in places where we
aren't dealing with reading or modifying the on-disk structures of
XFS. Once that is done, we will have driven kuid_t/kgid_t as far
down as the in-memory/on-disk format boundary allows, and we'll end
up catching all the sorts of problems you are worried about.

> Which can be called by an
> unprivileged user.

That's an oversite. Needs fixing.

> I honestly don't think avoiding the push down of kuid_t and kgid_t to
> all of the xfs in-core data structures is safe.  Even if the initial
> patch is safe I expect there will be silent breakage when the next ioctl
> that bypasses the vfs is added.

This problem isn't isolated to XFS - ioctls are added to ext4,
btrfs, gfs2, etc in every release and they all face the same
problems.  Hence trying to paint it as an XFS problem is realy
missing the mark....

I'd really like to see some regressions tests in xfstests that we
can use to confirm that filesystems have implemented namespaces
properly (e.g. quota set/get/report tests). That would go a long way
to ensuring that people don't inadvertantly. I'm sure you have some
test scripts for testing all the changes you made, so sharing them
would help us a lot.

[ Hmmmm - the quota netlink warning interface - it just takes a
warning for a specific kqid and emits it to all listeners on the
netlink interface. There's no namespace awareness there, so what
stops quota warning messages from one namespace being heard in all
other namespaces? It's not network namespace aware.... ]

As it is, I'm far more concerned about the security problems
existing ioctls and interfaces pose for user namespaces. i.e. that
CAP_SYS_ADMIN in any namespace can use bulkstat and then the
fs/fhandle.c interfaces to find and access any file in the
filesystem regardless of namespace restrictions. You can guess
filehandles pretty trivially, so you don't need bulkstat and you
don't need XFS for this to be a problem....

Further, bulkstat is used by backup utilities, so I think it needs
the unmodified uid/gid/prid to be passed out, and the restore
program needs a way to push them all back in unmodified. How would
you propose we go about doing this. Alternatively could you tell us
how a sub-namespace level backup/restore program supposed to
handle/detect/avoid being invoked from inside a restricted,
non-global namespace?


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>