On 01/21/2015 10:48 AM, Ken McDonell wrote:
On the trail of qa/957 failures ... Mark's looking at the most common case,
but I found another different one that looks like a small mem leak (at
first).
The issue is around the malloc at line 75 in proc_scsi.c.
Nathan and I were talking off-list about this (qa/957 valgrind
failures) .. and I have an uncommitted patch to fix it, but looks
like you've made some other overlapping changes ..
[...]
So the _fundamental_ question is what is the semantics of this instance
domain? Under what circumstances is it expected to change ... h/w reconfig
or non-determinism in the kernel's scsi scanning code?
the original code pre-dates scsi hotplug (especially USB devices coming
and going under the guise of the scsi driver).
If it is subject to expected change, then it should not be backed by
permanent store, but recreated each time as needed.
the h:b:t:l part of the external instance name used to be constant
unless you moved a disk to a different controller, in which case
it's basically a different device. The sd[a-z]* mapping has never
been persistent - hence the whole point of this metric. The same
applies to device-mapper devices (dm-[0-9]* in /proc/partitions).
However, in recent kernels, it seems some devices can change their
h:b:t:l name (particularly the scsi host number), following a
reboot *without* having unplugged or replugged anything. Nathan
noticed this with his cdrom following a reboot. And of course
usb devices are hotplugged/unplugged frequently.
So this instance domain is no longer inherently persistent at all.
We'd be better off using uuid or wwid etc for naming disks, which
is globally unique and persistent (see /dev/disk/by-{id,uuid}).
In the process of this investigation I quickly checked some other
pmdaCacheFoo() use in the linux PMDA and there are many issues ... this code
(and possibly across all PMDAs) needs a thoroughly good audit by someone who
understands how the pmdaCache services really work.
ditto for just about all indoms for h/w devices, including CPUs.
Just about everything is hotpluggable nowdays - we need to audit
all of the affected code, not just for pmdaCache usage, but also
the persistedness of the instance names in modern kernels.
|