On 05/12/2015 05:43 PM, Nathan Scott wrote:
Changes committed to git://git.pcp.io/nathans/pcp.git master
Nathan Scott (3):
pmdalinux: fix container issues, especially with networking metrics
qa: fix typo in an error message in test 540
build: extend gitignore file set for pmdaroot
The code all looks OK to me, but testing failed - pmdaroot segfault
with a NULL cp->name in the fetch callback. This is with a script running
that is continually creating busybox containers that basically just run
ifconfig -a and then exit.
Also noticed containers.state.running was showing way too many instances.
Should be zero or at most one ("docker ps" shows none running).
To repro, run the following :
# while true; do docker run busybox ifconfig eth0; done
Then in another window, run "pminfo -f containers" a few times.
I'll try and find some time to figure this out later today, but
for now here's a gdb traceback :
Program received signal SIGSEGV, Segmentation fault.
0x000000000040315b in root_fetchCallBack (mdesc=0x607360 <root_metrictab+32>,
inst=<optimized out>, atom=0x7ffcc9973f60) at root.c:202
202 atom->cp = *cp->name == '/' ? cp->name+1 : cp->name;
(gdb) where
#0 0x000000000040315b in root_fetchCallBack (mdesc=0x607360
<root_metrictab+32>, inst=<optimized out>, atom=0x7ffcc9973f60) at root.c:202
#1 0x00007fad98c6c7b4 in pmdaFetch (numpmid=<optimized out>,
pmidlist=<optimized out>, resp=<optimized out>, pmda=0x1913010)
at callback.c:573
#2 0x00007fad98c6ef22 in __pmdaMainPDU (dispatch=dispatch@entry=0x7ffcc9974160)
at mainloop.c:179
#3 0x0000000000402457 in root_main (dp=0x7ffcc9974160) at root.c:682
#4 main (argc=<optimized out>, argv=<optimized out>) at root.c:767
(gdb) l 195
190 containers = INDOM(CONTAINERS_INDOM);
191 sts = pmdaCacheLookup(containers, inst, &name, (void**)&cp);
192 if (sts < 0)
193 return sts;
194 if (sts != PMDA_CACHE_ACTIVE)
195 return PM_ERR_INST;
196 root_refresh_container_values(name, cp);
197 switch (idp->item) {
198 case 0: /* containers.engine */
199 atom->cp = cp->engine->name;
(gdb) l
200 break;
201 case 1: /* containers.name */
202 atom->cp = *cp->name == '/' ? cp->name+1 : cp->name;
203 break;
204 case 2: /* containers.pid */
205 atom->ul = cp->pid;
206 break;
207 case 3: /* containers.state.running */
208 atom->ul = (cp->status & CONTAINER_FLAG_RUNNING) != 0;
209 break;
(gdb) p cp
$1 = (container_t *) 0x1914490
(gdb) p *cp
$2 = {pid = 0, status = 0, name = 0x0,
cgroup =
"system.slice/docker-e720313565c7817bb4aa1c287ef0908d6960b20c7ec36f9d1a50a069a03a5b6b.scope",
'\000' <repeats 37 times>,
stat = {st_dev = 0, st_ino = 0, st_nlink = 0, st_mode = 0, st_uid = 0, st_gid
= 0, __pad0 = 0, st_rdev = 0, st_size = 0, st_blksize = 0,
st_blocks = 0, st_atim = {tv_sec = 0, tv_nsec = 0}, st_mtim = {tv_sec = 0,
tv_nsec = 0}, st_ctim = {tv_sec = 0, tv_nsec = 0},
__glibc_reserved = {0, 0, 0}}, engine = 0x6074c0 <engines>}
so we segfaulted dereferencing cp->name
|