>>>>> "richard" == Richard Gooch <rgooch@xxxxxxxxxxxxxxx> writes:
richard> Um, what do you mean you failed to explain it?
1) that I explain myself badly :(
2) that you didn't understand the problem from my explanation :(
Here are the traces that I had at the moment:
I still has that bug with 2.4.18-pre7, and it has this patch applied.
stack traces again (in kernel land).
p1:
schedule()
devfs_de_revalidate_wait()
cached_lookup()
lookup_hash()
sys_unlink()
system_call()
p2:
schedule()
wait_for_devfsd_finished()
devfs_lookup(()
lookup_hash()
unix_bind()
sys_bind()
sys_socketcall()
system_call()
the thing that they are tring to create/remove is /dev/log.
And devfsd is already running in that state:
__schedule()
__down()
__down_failed()
__text_lock_namei()
This has worked normally until now, it has beggining to fail yesterday.
What the tasks are doing:
the task does basically:
unlink("/dev/log");
bind("/dev/log") -> type AF_LOCAL, we have already did the socket()
listen()
if (fork)
exit();
else {
stat(/dev/log);
<do normal stuff for a syslogd handler>
stat(/dev/log); (we need to make sure that nobody has changed the
link under our toes
exit();
}
As you can see, the user space does something that looks normal to do,
and the kernel handling that part looks strange.
Other thing that is perhaps a bug in our setup is that we are storing
/dev/log in /lib/dev-state, and probably we shouldn't(this was Andrej
discovery), but that is a different story, i.e. I think that:
create unix socket
reboot
devfsd recreate socket
unlink socket
create socket again
stat the name of the socket
should not hang devfsd.
I hope that this time I has been clearer, I the info is not enough, I
will try to get an userspace trace of devfsd while this is happening,
but I don't have a good idea on how to do it yet.
Later, Juan.
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
|