[pcp] linux_proc caches; error handling; segv upon unprivileged proc.* reads
Frank Ch. Eigler
fche at redhat.com
Wed Nov 26 20:06:38 CST 2014
Hi -
I'd like to share some notes of analysis of how come some gittish
builds of pcp cause a linux_proc pmda segv during normal operations,
or during the qa/022 test case (when invoked as pcpqa). The common
element seems to be invocation as root vs. an unprivileged userid.
This occurs on a fedora rawhide VM, i.e.:
Linux vm-rawhide-64 3.18.0-rc2-00053-g9f76628da20f #1 SMP Wed Nov 12 15:12:35 EST 2014 x86_64 x86_64 x86_64 GNU/Linux
% pcp
[...] Version 3.10.1-1, 8 agents, 2 clients
% rpm -q pcp
pcp-3.10.1-0.759.gf6d6a93.fc22.x86_64
% sudo service pmcd restart
% sudo pminfo -f proc.memory.maps
[... ok ...]
% pminfo -f proc.memory.maps
proc.memory.maps: pmLookupDesc: No PMCD agent for domain of request
indeed, the procpmda process exits with a SEGV thusly:
Program received signal SIGSEGV, Segmentation fault.
0x00007f4cf9bcad8a in strlen () from /lib64/libc.so.6
(gdb) bt
#0 0x00007f4cf9bcad8a in strlen () from /lib64/libc.so.6
#1 0x00007f4cf9f3c66b in __pmStuffValue (avp=avp at entry=0x7fff088f9fb0, vp=vp at entry=0x1eaaf80, type=type at entry=6) at stuffvalue.c:55
#2 0x00007f4cfa181e71 in pmdaFetch (numpmid=numpmid at entry=1, pmidlist=pmidlist at entry=0x1e8a01c, resp=resp at entry=0x7fff088fa1c8, pmda=pmda at entry=0x1e84010)
at callback.c:603
#3 0x0000000000402dc3 in proc_fetch (numpmid=1, pmidlist=0x1e8a01c, resp=0x7fff088fa1c8, pmda=0x1e84010) at pmda.c:2393
#4 0x00007f4cfa184572 in __pmdaMainPDU (dispatch=dispatch at entry=0x7fff088fa270) at mainloop.c:179
#5 0x00007f4cfa184e08 in pmdaMain (dispatch=dispatch at entry=0x7fff088fa270) at mainloop.c:428
#6 0x00000000004025f7 in main (argc=6, argv=<optimized out>) at pmda.c:2668
This happens because the input pmAtom* is unfilled:
(gdb) frame 1
#1 0x00007f4cf9f3c66b in __pmStuffValue (avp=avp at entry=0x7fff088f9fb0, vp=vp at entry=0x1eaaf80, type=type at entry=6) at stuffvalue.c:55
55 body = strlen(avp->cp) + 1;
(gdb) p *avp
$2 = {l = 0, ul = 0, ll = 0, ull = 0, f = 0, d = 0, cp = 0x0, vbp = 0x0}
This happens because proc_fetchCallBack returns sts >= 0 but fails to
fill in the output atom.
This happens because fetch_proc_pid_maps returns an incomplete pointer
(a proc_pid_entry_t with a NULL maps_buf) *and* a 0 sts.
$9 = {id = 481, flags = 1, name = 0x1ca3ee0 "000481 avahi-daemon:
running [vm-rawhide-64.local]", stat_buflen = 0, stat_buf = 0x0,
statm_buflen = 0, statm_buf = 0x0, maps_buflen = 0, maps_buf = 0x0,
status_buflen = 0, status_buf = 0x0, status_lines = {uid = 0x0, gid
= 0x0, sigpnd = 0x0, sigblk = 0x0, sigign = 0x0, sigcgt = 0x0,
vmsize = 0x0, vmlck = 0x0, vmrss = 0x0, vmdata = 0x0, vmstk = 0x0,
vmexe = 0x0, vmlib = 0x0, vmswap = 0x0, threads = 0x0, vctxsw = 0x0,
nvctxsw = 0x0}, schedstat_buflen = 0, schedstat_buf = 0x0, io_buflen
= 0, io_buf = 0x0, io_lines = {rchar = 0x0, wchar = 0x0, syscr =
0x0, syscw = 0x0, readb = 0x0, writeb = 0x0, cancel = 0x0},
wchan_buflen = 0, wchan_buf = 0x0, fd_buflen = 0, fd_count = 0,
fd_buf = 0x0, cgroup_id = 0, label_id = 0}
This happens because when fetch_proc_pid_maps tries to populate that
data structure on behalf of the client (so already temp-seteuid'd),
proc_open("maps", ep) comes back with an error, but leaves *sts=0 and
skips the rest of the function without ->maps_buf init. This is
almost certainly a bug (->maps_buf should be set).
The *sts=0 part happens because maperr() hides (only!) EACCES & EINVAL
errors and maps them to *sts=0. This is mysterious. This is the most
salient place the pmda could pass knowledge up toward libpcp_pmda &
the client to identify this metric instance as missing (e.g.,
PM_ERR_INST). Instead it pretends everything's OK, enabling the above
bug.
This logic was changed as a part of commit #85d06e790e, and has seen
at least one full release. I didn't initially figure out why this is
affecting only my VM instead of every recent pcp build & machine. It
turns out that this could be a kernel version difference. An strace
of the same type of operation on an older 3.14.9 kernel shows:
open("/proc/31830/maps", O_RDONLY) = 5
read(5, 0x7fff8e9833c0, 1024) = -1 EACCES (Permission denied)
whereas on the new 3.18-rc:
open("/proc/481/maps", O_RDONLY) = -1 EACCES (Permission denied)
So it now fails earlier.
On the virtual machine, the qa/022 test passes if run as root, but
fails if run as pcpqa. How do other people run the testsuite? It
seems like we need the test extended to run both ways, and better
describe such invocation issues in qa/README.
Reading related code got me thinking.
The ACL idea for the linux_proc pmda is simply that a user invoking
proc.* fetches should see (or not see) the exact same amount of data
as she would if looking directly at /proc. The linux_proc pmda
doesn't quite accomplish that. One possible problem is that it keeps
a shared hash-table of process-data (pidhash), which could only be
fully populated by root. If that data happens to be retained (ie.,
escapes a refresh, or ep->flags | PROC_PID_..._FETCHED), then any
subsequent less-privileged pcp client user can copy out the contents!
Any temp-seteuid access control effort is moot:
% hostname
HOST
% service pmcd restart
% sudo pmval -i 1 -f proc.memory.maps
[... prime the cache ...]
then thereafter, from an unprivileged userid:
% pmval -i 1 -f proc.memory.maps
will return the cached information, even though the second
client has no business seeing all that information!
We need to revisit the general issue of these caches in the proc_pmda,
from a security point of view, not just the scaling problems
identified at <http://www.pcp.io/pipermail/pcp/2014-November/006005.html>.
- FChE
More information about the pcp
mailing list