devfs
[Top] [All Lists]

Re: [kernel] char/raw.c devfs support

To: Juan Quintela <quintela@xxxxxxxxxxxxxxxx>
Subject: Re: [kernel] char/raw.c devfs support
From: Richard Gooch <rgooch@xxxxxxxxxxxxxxx>
Date: Sun, 17 Feb 2002 23:22:46 -0700
Cc: Borsenkow Andrej <Andrej.Borsenkow@xxxxxxxxxxxxxx>, "'Thierry Vignaud'" <tvignaud@xxxxxxxxxxxxxxxx>, kernel@xxxxxxxxxxxxxxxx, "'devfs mailing list'" <devfs@xxxxxxxxxxx>, "'Frederic Lepied'" <flepied@xxxxxxxxxxxxxxxx>
In-reply-to: <m2d6z8q6lt.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <000601c1b56b$241be2e0$21c9ca95@xxxxxxxxxxxxxx> <m2lmdwq7q2.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxx> <200202141750.g1EHo2G12220@xxxxxxxxxxxxxxxxxxxxxxxx> <m2d6z8q6lt.fsf@xxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-devfs@xxxxxxxxxxx
Juan Quintela writes:
> >>>>> "richard" == Richard Gooch <rgooch@xxxxxxxxxxxxxxx> writes:
> 
> 
> richard> Um, what do you mean you failed to explain it?
> 
> 1) that I explain myself badly :(
> 
> 2) that you didn't understand the problem from my explanation :(
> 
> Here are the traces that I had at the moment:
> 
> I still has that bug with 2.4.18-pre7, and it has this patch applied.
> 
> stack traces again (in kernel land).
> 
> p1:
>         schedule()
>         devfs_de_revalidate_wait()
>         cached_lookup()
>         lookup_hash()
>         sys_unlink()
>         system_call()
> 
> p2:
> 
>         schedule()
>         wait_for_devfsd_finished()
>         devfs_lookup(()
>         lookup_hash()
>         unix_bind()
>         sys_bind()
>         sys_socketcall()
>         system_call()
> 
> the thing that they are tring to create/remove is /dev/log.
> 
> And devfsd is already running in that state:
> 
>     __schedule()
>     __down()
>     __down_failed()
>     __text_lock_namei()
> 
> This has worked normally until now, it has beggining to fail yesterday.
> 
> What the tasks are doing:
> 
> the task does basically:
> 
> unlink("/dev/log");
> bind("/dev/log") -> type AF_LOCAL, we have already did the socket()
> listen()
> if (fork)
>    exit();
> else {
>      stat(/dev/log);
>      <do normal stuff for a syslogd handler>
>      stat(/dev/log); (we need to make sure that nobody has changed the
>          link under our toes
>      exit();
> }
> 
> As you can see, the user space does something that looks normal to do,
> and the kernel handling that part looks strange.

Well, I don't see what's strange about the kernel handling part. Devfs
is enforcing serialisation, but that isn't supposed to cause
deadlocks.

> Other thing that is perhaps a bug in our setup is that we are storing
> /dev/log in /lib/dev-state, and probably we shouldn't(this was Andrej
> discovery), but that is a different story, i.e. I think that:

I don't think that restoring /dev/log should cause this problem. I
haven't seen a failure path that would cause this deadlock.

> create unix socket
> reboot
> devfsd recreate socket
> unlink socket
> create socket again
> stat the name of the socket 
> 
> should not hang devfsd.

Agreed.

> I hope that this time I has been clearer, I the info is not enough, I
> will try to get an userspace trace of devfsd while this is happening,
> but I don't have a good idea on how to do it yet.

This is the key. So far I have not received *any* indication of what
devfsd is doing. The kernel trace you provided seems incomplete
(somewhere in the call stack I'd expect to see system_call(), and we
don't). And I've not yet received strace output of devfsd. I've just
released devfsd v1.3.24, which fixes the problem with devfsd
re-reading the configuration file while being traced, so please go
forth and trace it. If necessary, hack your boot scripts to trace
devfsd.

But somehow I've got to see what devfsd is trying to do, and the
events that led up to it. At the moment, all I've been told is that
devfsd is waiting on some resource, but no clue as to what resource.

Note that the strace output could be quite large, so you might like to
just send it to me, rather than spamming the list.

                                Regards,

                                        Richard....
Permanent: rgooch@xxxxxxxxxxxxx
Current:   rgooch@xxxxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>