[Top] [All Lists]

[PATCH 00/14] xfs: Support for interacting with multiple user namespaces

To: <linux-fsdevel@xxxxxxxxxxxxxxx>
Subject: [PATCH 00/14] xfs: Support for interacting with multiple user namespaces
From: ebiederm@xxxxxxxxxxxx (Eric W. Biederman)
Date: Wed, 13 Mar 2013 15:21:11 -0700
Cc: Linux Containers <containers@xxxxxxxxxxxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, "Serge E. Hallyn" <serge@xxxxxxxxxx>, Ben Myers <bpm@xxxxxxx>, Alex Elder <elder@xxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.1 (gnu/linux)
Or replace uids, gids, and projids with kuids, kgids, and kprojids

In support of user namespaces I have introduced into the kernel kuids,
kgids, and kprojids.  The most important characteristic of these ids is
that they are not assignment compatible with uids, gids, and projids
coming from userspace nor are they assignment compatible with uids, gids
or projids stored on disk.  This assignment incompatibility makes it
easy to ensure that conversions are introduced at the edge of userspace
and at the interface between on-disk and in-memory data structures.

Getting all of the conversions in all of the right places is important
because if one is missed it can easily become a permission check that
compares the wrong values.

While doing these conversions I have learned time and time again that if
I do not push kuids and kgids down into every in memory data structure I
can find there will be important conversions that are missed.

Converting xfs is an interesting challenge because of the way xfs
handles it's inode data is very atypical.  XFS does two things no other
filesystem in linux does.  XFS dumps it's in-memory inode structure
directly into the on-disk journal without any conversion.  After an
inode has been evicted from vfs inode cache XFS continues to cache the
inode for a time, so that if the inode is needed before all of the
state for the inode has been written to disk an uptodate copy can be
obtained from the in-memory cached inode.

Interacting with users in different user namespaces for filesystems for
the most part is easy.  The vfs data structures hand off kuids and kgids
to the filesystem.  The filesystem then places those kuids and kgids in
it's in memory data structures (if it has any beyond struct inode).
When data is read from disk the uid and gid values are converted from
values in the initial user namespace to kuid and kgid values.  When data
is written to disk the kuids and kgids are converted into uid and gid
values in the initial user namespace.

The initial user namespace is chosen for data on disk, because that is
the user namespace that the data on disk uses for unconverted

When interacting with userspace processes the values are stored
in the current user namespace, which is different for each process.

For example in this chunk of code that has caused some questions what is
happening is:
+       if (mask & FSX_PROJID) {
+               projid = make_kprojid(current_user_ns(), fa->fsx_projid);
+               if (!projid_valid(projid))
+                       return XFS_ERROR(EINVAL);

fsx_projid is coming from userspace so we convert it from whatever the
userspace value is in the current user namespace to a kprojid. 

+               /*
+                * Disallow 32bit project ids when projid32bit feature is not 
+                */
+               if ((from_kprojid(&init_user_ns, projid) > (__uint16_t)-1) &&

The disk might only support 16bit project ids.  So the kprojid is
converted into a projid in the initial user namespace to see what the
value we will eventually try to store on-disk is.  If the on-disk value
is larger than (2^16-1) an error is flagged.

+                   !xfs_sb_version_hasprojid32bit(&ip->i_mount->m_sb))
+                       return XFS_ERROR(EINVAL);
+       }

In earlier versions of this patchset I have run afoul of the fact that the
in-memory inode is dumped to disk making a change to that data structure an
ABI change, and then I ran afoul of the fact that despite the fact that
struct xfs_inode survives the embedded struct inode may be evicted from the
vfs and become invalid and all of it's contents stomped with

Given the number of ioctls that xfs supports it would be irresponsible to
do anything except insist that kuids, kgids, and kprojids are used in all of
in memory data structures of xfs, as otherwise it becomes trivially easy to
miss a needed conversion with the advent of a new ioctl.

It has been suggested that kuids, and kgids are a vfs construct and should
stop at the vfs and should not be used in xfs data structures.  They are
not a vfs construct they are a kernel construct and are used everywhere in
the kernel.  xfs does not get to be an exception.

To put kuids, kgids, and kprojids in all of the xfs data structures without
breaking the on-disk ABI, this patchset moves struct xfs_icdinode from
struct xfs_inode to xfs_log_item, and introduces a set of conversion
functions.  Introducing the separation between on-disk and in-memory format
that is needed to properly perform this conversion.

I have a few additional patches not included in this posting sitting in my
development tree that removes the extra copie that this change introduces
into xfs_inode_item_format.

xfstests in this instance are boring the same 17 tests keep failing both
before and after my changes.

I don't care through which tree these changes are merged.  If you would
like to take these in through xfs tree that would be great.  Otherwise I
will be happy to take these changes through my user namespace tree.


Eric W. Biederman (14):
      xfs: Convert uids and gids in xfs acls to/from kuids and kgids
      xfs: Separate the in core and the logged inode.
      xfs: Store projectid as a single variable.
      xfs: Update inode uids, gids, and projids to be kuids, kgids, and kprojids
      xfs: Update xfs_ioctl_setattr to handle projids in any user namespace
      xfs: Use kuids and kgids in xfs_setattr_nonsize
      xfs: Update ioctl(XFS_IOC_FREE_EOFBLOCKS) to handle callers in any 
      xfs: Use kprojids when allocating inodes.
      xfs: Modify xfs_qm_vop_dqalloc to take kuids, kgids, and kprojids.
      xfs: Push struct kqid into xfs_qm_scall_qmlim and xfs_qm_scall_getquota
      xfs: Modify xfs_qm_dqget to take a struct kqid.
      xfs: Remember the kqid for a quota
      xfs: Use q_id instead of q_core.d_id.
      xfs: Enable building with user namespaces enabled.

 fs/xfs/xfs_acl.c         |   23 ++++++-
 fs/xfs/xfs_dquot.c       |   39 ++++++++----
 fs/xfs/xfs_dquot.h       |   13 +++-
 fs/xfs/xfs_icache.c      |   14 ++--
 fs/xfs/xfs_icache.h      |   11 +++-
 fs/xfs/xfs_inode.c       |  160 +++++++++++++++++++++++++++++++++-------------
 fs/xfs/xfs_inode.h       |   51 +++++++++------
 fs/xfs/xfs_inode_item.c  |    3 +-
 fs/xfs/xfs_inode_item.h  |    1 +
 fs/xfs/xfs_ioctl.c       |   52 ++++++++++++---
 fs/xfs/xfs_iops.c        |   14 ++--
 fs/xfs/xfs_itable.c      |   47 +++++++-------
 fs/xfs/xfs_qm.c          |   83 ++++++++++++------------
 fs/xfs/xfs_qm.h          |    4 +-
 fs/xfs/xfs_qm_bhv.c      |    2 +-
 fs/xfs/xfs_qm_syscalls.c |   24 ++++---
 fs/xfs/xfs_quota.h       |    4 +-
 fs/xfs/xfs_quotaops.c    |   20 +-----
 fs/xfs/xfs_rename.c      |    2 +-
 fs/xfs/xfs_trace.h       |    2 +-
 fs/xfs/xfs_trans_dquot.c |    8 +--
 fs/xfs/xfs_utils.c       |    2 +-
 fs/xfs/xfs_utils.h       |    2 +-
 fs/xfs/xfs_vnodeops.c    |   14 ++--
 init/Kconfig             |    1 -
 25 files changed, 366 insertions(+), 230 deletions(-)

<Prev in Thread] Current Thread [Next in Thread>