On Thu, 20 Jun 2013 11:41:33 +1000
Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Jun 19, 2013 at 01:35:30PM -0700, Eric W. Biederman wrote:
> > I am copying my gmail address so I have a chance of seeing replies
> > from Dave Chiner. So far the only way I have been able to read his
> > replies has been to read mailling lists. Which has not be
> > conductive to having this code discussed properly. Hopefully
> > copying my gmail address will allow us to have a reasonable and
> > timely conversation.
> > Dwight Engen <dwight.engen@xxxxxxxxxx> writes:
> > > Use uint32 from init_user_ns for xfs internal uid/gid
> > > representation in acl, xfs_icdinode.
> > From my review of the code earlier that just isn't safe. It allows
> > all kinds of things to slip through.
> Such as?
Maybe saying "at the vfs boundary" is misleading, I guess I don't see
how this is all that different from what you did in the other
filesystems. Using ext4 as the example the conversions are done between:
struct inode <-> struct ext4_inode
struct posix_acl <-> ext4_acle_entry
which in xfs is analogous to
struct inode <-> struct xfs_icdinode
struct posix acl <-> struct xfs_acl_entry
which is where I did the conversions.
> > > Conversion of kuid/gid is done at the vfs boundary,
> > > other user visible xfs specific interfaces (bulkstat, eofblocks
> > > filter) expect uint32 init_user_ns uid/gid values.
> > From my earlier review of the code conversion at the vfs boundary is
> > not safe.
> So you've claimed.
> > First off kuid_t and kgid_t are not a vfs concepts, they are linux
> > kernel concepts, and xfs is in the linux kernel. What makes this
> > relevant is not all filesystem accesses are through the vfs so all
> > of the necessary conversions for security and a consistent user
> > experience can be had by only performing conversions at the
> > user/kernel boundary.
> Right. Boundaries and consistent conversion across them are
> important. They are especially important in filesystems for
> converting to/from on-disk and kernel-native formats. Blindly
> pushing kernel structure down as far are you can make them reach
> ignores important boundaries.
> You might want to hit XFS with a big hammer but the right solution
> is far more nuanced that that.
> > In particular by being sloppy and not pushing kuid_t/kgid_t further
> > down you did not handle all of the conversions needed at the
> > user/kernel boundary in XFS_IOC_FREE_EOFBLOCKS.
See other reply for possible way to do this. I think the larger issue
is bulkstat, which I'm not sure should be converted or not.
> The kuid_t/kgid_t is actually pushed down this far - it's in the
> struct inode - the code currently uses the on-disk XFS uid/gid,
> not the struct inode's kuid_t/kgid_t. That's easily fixable.
Yep, I'll go through the code and switch to the inode where possible.
> Indeed, that's where most of the work needs to be done in XFS -
> using VFS(ip)->i_uid instead of ip->i_d.di_uid in places where we
> aren't dealing with reading or modifying the on-disk structures of
> XFS. Once that is done, we will have driven kuid_t/kgid_t as far
> down as the in-memory/on-disk format boundary allows, and we'll end
> up catching all the sorts of problems you are worried about.
> > Which can be called by an
> > unprivileged user.
> That's an oversite. Needs fixing.
> > I honestly don't think avoiding the push down of kuid_t and kgid_t
> > to all of the xfs in-core data structures is safe. Even if the
> > initial patch is safe I expect there will be silent breakage when
> > the next ioctl that bypasses the vfs is added.
> This problem isn't isolated to XFS - ioctls are added to ext4,
> btrfs, gfs2, etc in every release and they all face the same
> problems. Hence trying to paint it as an XFS problem is realy
> missing the mark....
> I'd really like to see some regressions tests in xfstests that we
> can use to confirm that filesystems have implemented namespaces
> properly (e.g. quota set/get/report tests). That would go a long way
> to ensuring that people don't inadvertantly. I'm sure you have some
> test scripts for testing all the changes you made, so sharing them
> would help us a lot.
> [ Hmmmm - the quota netlink warning interface - it just takes a
> warning for a specific kqid and emits it to all listeners on the
> netlink interface. There's no namespace awareness there, so what
> stops quota warning messages from one namespace being heard in all
> other namespaces? It's not network namespace aware.... ]
> As it is, I'm far more concerned about the security problems
> existing ioctls and interfaces pose for user namespaces. i.e. that
> CAP_SYS_ADMIN in any namespace can use bulkstat and then the
> fs/fhandle.c interfaces to find and access any file in the
> filesystem regardless of namespace restrictions. You can guess
> filehandles pretty trivially, so you don't need bulkstat and you
> don't need XFS for this to be a problem....
> Further, bulkstat is used by backup utilities, so I think it needs
> the unmodified uid/gid/prid to be passed out, and the restore
> program needs a way to push them all back in unmodified. How would
> you propose we go about doing this. Alternatively could you tell us
> how a sub-namespace level backup/restore program supposed to
> handle/detect/avoid being invoked from inside a restricted,
> non-global namespace?