On Wed, 29 Feb 2012 21:19:17 +1100 (EST), Nathan Scott wrote:
nathans> 2. Moved on to discussion of issues Max has encountered while
nathans> working through the NFS client stats PMDA. Issues around:
nathans> - How to handle the instance domain, which is per-mounted
nathans> filesystem but sometimes (when kernel decides to share
nathans> struct superblock for a client NFS mount - same server,
nathans> and same mount options) the instances will share the same
nathans> values. This is unexpected from a users POV, since the
nathans> I/O went to one mount point or the other, yet both update.
nathans> - Options include having a single instance for these shared
nathans> mounts (assuming correct identification possible), using
nathans> just whichever mount point is observed first as external
nathans> instance name. But, would lead to even more confusing
nathans> behaviour if that mount point goes away, but the other
nathans> remains - stats then reported for an unmounted path.
nathans> - For further details, see Max's promised mail.
Here is an example of what I'm trying to deal with - in
/proc/self/mountstats each NFS mount has a block of data which looks
like this:
$ cat /proc/self/mountstats
....
device 192.168.108.128:/mnt/data/ mounted on /mnt/nfs with fstype nfs4
statvers=1.0
opts:
rw,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.108.129,minorversion=0
age: 13
caps: caps=0x7ff6,wtmult=512,dtsize=4096,bsize=0,namlen=255
nfsv4: bm0=0xffffefff,bm1=0xf9fe3e,acl=0x0
sec: flavor=1,pseudoflavor=1
events: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bytes: 0 0 0 0 0 0 0 0
RPC iostats version: 1.0 p/v: 100003/4 (nfs)
xprt: tcp 973 0 1 0 13 19 19 0 19 0
per-op statistics
NULL: 0 0 0 0 0 0 0 0
READ: 0 0 0 0 0 0 0 0
WRITE: 0 0 0 0 0 0 0 0
COMMIT: 0 0 0 0 0 0 0 0
....
If I were to mount /mnt/data_new from the same 192.168.108.128 server
with the same NFS mount options then I'll have another entry which
will have its own block in /proc/self/mountstats but will share all
the counters for events, bytes, per-op statistics with /mnt/data e.g.
device 192.168.108.128:/mnt/data_new mounted on /mnt/nfs_new with fstype nfs4
statvers=1.0
opts:
rw,vers=4,rsize=32768,wsize=32768,namlen=255,acregmin=3,acregmax=60,acdirmin=30,acdirmax=60,hard,proto=tcp,port=0,timeo=600,retrans=2,sec=sys,clientaddr=192.168.108.129,minorversion=0
age: 13
caps: caps=0x7ff6,wtmult=512,dtsize=4096,bsize=0,namlen=255
nfsv4: bm0=0xffffefff,bm1=0xf9fe3e,acl=0x0
sec: flavor=1,pseudoflavor=1
events: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
bytes: 0 0 0 0 0 0 0 0
RPC iostats version: 1.0 p/v: 100003/4 (nfs)
xprt: tcp 973 0 1 0 13 19 19 0 19 0
per-op statistics
NULL: 0 0 0 0 0 0 0 0
READ: 0 0 0 0 0 0 0 0
WRITE: 0 0 0 0 0 0 0 0
COMMIT: 0 0 0 0 0 0 0 0
Now if I want to present user with some information about NFS
operations I can use mount path as external instance identifier
so she can fetch things like nfsclient.nfs4.reqs.getattr.count["/mnt/nfs"].
This creates two problems:
1. because counters are per nfs superblock (the structure is named
nfs_server in the kernel) and are shared between /mnt/nfs and
/mnt/nfs_new any operation on /mnt/nfs will be visible in
/mnt/nfs_new as well. The user will be right to complain that she
didn't touch /mnt/nfs_new, why does it counters change.
2. If I collapse the instances so that only /mnt/nfs is visible (first
entry in /proc/self/mountstats wins principle) and the user
unmounts /mnt/nfs I'll still have to display counters for it - more
confusion.
Ben decided to use exports to identify the instances but it does not
work either because one export can be mounted multiple times on a
single client and also because exports (things like
192.168.108.128:/mnt/data) suffer from the same counter aliasing
problem.
The only way to identify those things in unique way is to combine host
name and the mount options but it makes instance names exceedingly
long and pretty much useless. I can probably collapse it into
something shorter, e.g. 192.168.108.128_mount1, and provide a mapping
from both mount path and export to this "name" but it still
submarvelous.
The other option is to use local mount path for instances and write a
long explanation in the help text for each metric.
The third option is to fix the stats but it will require having them
per vfs_mount and Linux kernel guys are not too keep on this approach.
The other problem is transport statistics - the line
xprt: tcp 973 0 1 0 13 19 19 0 19 0
contains information about events assoicated with "transport" which
is a connection between a client and a server. The problem is that
this transport can be shared between multiple nfs_server structures
which means that level of aliasing is even larger.
Plus there is no way to identify the transport reliable - I cannot use
local protocol/port because it can change if the transport reconnects
and I cannot use remote protocol/host/port because single client can
seldomly have more then one connection of the same server.
So in this case I need to come up with a suitable scheme for nameing
instances and some way of detecting sharing.
If anyone who wasn't at that meeting has some better ideas I'm all
ears.
nathans> - Would be helped by extension to the Perl PCP::PMDA module
nathans> to allow PMDAs to call the pmdaCache family of routines
nathans> (long overdue on my part).
Yes, this would go a long way in making instance management suck a bit
less.
max
|