Hello!
The problem with the hang on "Finding module dependencies" has been traced
to a deadlock in the kernel, possibly in devfs.
One minilogd calls bind() on /dev/log, and the other calls unlink() on the
same file. The problem only happens with Linux 2.4.x (not 2.5.x) and only
if devfs is mounted.
Given that it's essentially a race condition, we cannot be sure that the
problem doesn't exist without devfs or on 2.5.x kernels. However, devfs
must be involved somehow.
Since both minilogd processes are in the "D" state, I don't think devfsd
is involved. They both are waiting in the kernel.
The attached patch can be used as a workaround. It prevents minilogd
from doing unlink() while another one is calling bind() on the same file.
The patch is very ugly and is not meant to be applied. However, it
demonstrates where the problem lies.
I think there is more that one program to blame. We have two initlog
processes, each calling minilogd. That's the first problem.
We have minilogd removing /dev/log without trying to find if it's in use.
This may or may not be OK. At least it's not OK if the kernel is not
fixed. That's the second problem.
unlink() and bind() can deadlock on devfs. That's the third problem. If
it's a known problem and it was fixed in 2.5.x kernels, it would be really
nice to propagate the fix to the 2.4.x series, because hanging Red Hat is
a major annoyance, and it took me months living with even worse
workarounds and a whole day to track it down.
There are some more details in this bug:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=85621
I'll try to create a testcase not involving reboot. Also, I'll be able to
run the logs produced by Alt-SysRq T through ksymoops. But I'm going to
be quite busy next week, so I'll appreciate if someone else beats me as
that.
--
Regards,
Pavel Roskin
minilogd.diff
Description: Text document
|