pagg
[Top] [All Lists]

Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11

To: Kaigai Kohei <kaigai@xxxxxxxxxxxxx>
Subject: Re: [patch] Minor PAGG attach/detach semantic change for 2.6.11
From: Erik Jacobson <erikj@xxxxxxx>
Date: Wed, 28 Sep 2005 09:18:31 -0500
Cc: Erik Jacobson <erikj@xxxxxxx>, Kingsley Cheung <kingsley@xxxxxxxxxx>, pagg@xxxxxxxxxxx, tonyt@xxxxxxxxxx, paulmck@xxxxxxxxxx
In-reply-to: <433A7FE4.5040109@ak.jp.nec.com>
References: <20050617014512.GA10285@aurema.com> <20050927201020.GA30433@sgi.com> <433A7FE4.5040109@ak.jp.nec.com>
Sender: pagg-bounce@xxxxxxxxxxx
User-agent: Mutt/1.5.6i
> In my understanding, any _write_ operations can not be implemented
> without locking, even if we can use RCU.
> (In addition, RCU conscious writing/update style is required.)

I'm surrounding write operations with a writelock rwsem (not a spinlock, at 
least not now... since it is common for pnotify users to use semaphores for 
their own locking).  I think this is similar to your example submission, but
you used a spinlock in those places.

At this moment, I'm close to done with something to look at.  I'm just 
tracking down a bug in the new implementation that showed up in the Job 
patch.  I'll post what I have when I'm ready and maybe we can tear it apart 
then.  This is my first rcu experience so I'd welcome the feedback including
"this just won't work with rcu" if that's what it comes down to.

I also want to be sure the 'stale data' problem isn't actually a problem
for us.

> frequentrly-read pass, such as SELinux's Access Vector Cache(AVC).
> But it's not omnipotence, and it restricts write methodology.

The feeling I had is that most users of pnotify won't be writing super-often.
This is a generalization that may be incorrect.  Taking Job as an example,
once the process is made part of a job, not much usually happens in terms of
adjusting the data pointer associated with the task struct until Job is done.

I could imagine there may be things this isn't the case for, then the writes
will be a penalty possibly.

> I have attention to another respect. The current pnotify implementation
> requires to hold pnotify_event_list_sem before calling pnotify_get_events().

As I recall, that code only would happen at most twice in the life of a kernel
module, right?  The only time the init function pointer would fire, if it's
present, is at pnotify_register time.  A similar piece of code happens at
unregister time I think.  I guess I'm wondering if this happens enough to
worry about?  Please let me know if I missed your entire point.

> Threfore, we must repeat read_lock/unlock(&tasklist_lock) on
> do_each_thread()/while_each_thread() loop as follows:
> 
> ----------------------------
> read_lock(&tasklist_lock);
> do_each_thread(g, p) {
>       get_task_struct(p);
>       read_unlock(&tasklist_lock);
> 
>       down_read(&p->pnotify_subscriber_list_sem);
>       subscriber = pnotify_get_subscriber(p, events->name);
>               :
>       up_read(&p->pnotify_subscriber_list_sem);
>       read_lock(&tasklist_lock);
>       << checking, p is dead or not ? >>
> } while_each_thread(g, p);
> read_unlock(&tasklist_lock);
> ----------------------------
> 
> I'm happy, if pnotify_subscriber_list would be protected by rwlock.
> 
> If rwlock is used, we can not implement pnotify_subscribe() with current
> spec.  But is it impossible to prepare pnotify_subscribe_atomic() or
> pnotify_subscribe_bind() which associates task_struct with pre-allocated
> pnotify_events object ?
> 
> ---- in rwlock world :-) ---
> read_lock(&tasklist_lock);
> do_each_thread(g, p) {
>       read_lock(&p->pnotify_subscriber_list_rwlock);
>       subscriber = pnotify_get_subscriber(p, events->name);
>               :
>       read_unlock(&p->pnotify_subscriber_list_rwlock);
> } while_each_thread(g, p);
> read_unlock(&tasklist_lock);
> ----------------------------
> 
> Thanks,
> 
> 
> >/**
> > * __pnotify_fork - Add kernel module subscriber to same subscribers as 
> > parent
> > * @to_task: The child task that will inherit the parent's subscribers
> > * @from_task: The parent task
> > *
> > * Used to attach a new task to the same subscribers the parent has in its
> > * subscriber list.
> > *
> > * The "from" argument is the parent task.  The "to" argument is the child
> > * task.
> > *
> > * See Documentation/pnotify.txt for details on
> > * how to handle return codes from the attach function pointer.
> > *
> > * Locking: The to_task is currently in-construction, so we don't
> > * need to worry about write-locks.  We do need to be sure the parent's
> > * subscriber list, which we copy here, doesn't go away on us.  This is
> > * done via RCU.
> > *
> > */
> >int
> >__pnotify_fork(struct task_struct *to_task, struct task_struct *from_task)
> >{
> >     struct pnotify_subscriber *from_subscriber;
> >     int ret;
> >
> >     /* We need to be sure the parent's list we copy from doesn't 
> >     disappear */
> >     rcu_read_lock();
> >
> >     list_for_each_entry_rcu(from_subscriber, 
> >     &from_task->pnotify_subscriber_list, entry) {
> >             struct pnotify_subscriber *to_subscriber = NULL;
> >
> >             to_subscriber = pnotify_subscribe(to_task, 
> >             from_subscriber->events);
> >             if (!to_subscriber) {
> >                     ret=-ENOMEM;
> >                     __pnotify_exit(to_task);
> >                     rcu_read_unlock();
> >                     return ret;
> >             }
> >             ret = to_subscriber->events->fork(to_task, to_subscriber,
> >               from_subscriber->data);
> >     
> >             rcu_read_unlock(); /* no more to do with the parent's data */
> 
> rcu_read_unlovk(); should be deployed outside of the 
> list_for_each_entry_rcu(){...}.
> 
> >
> >             if (ret < 0) {
> >                     /* Propagates to copy_process as a fork failure */
> >                     /* No __pnotify_exit because there is one in the 
> >                     failure path
> >                      * for copy_process in fork.c */
> >                     return ret; /* Fork failure */
> >             }
> >             else if (ret > 0) {
> >                     /* Success, but fork function pointer in the 
> >                     pnotify_events structure
> >                      * doesn't want the kenrel module subscribed */
> >                     /* Again, this is the in-construction-child so no 
> >                     write lock */
> >                     pnotify_unsubscribe(to_subscriber);
> >             }
> >     }
> >
> >     return 0;       /* success */
> >}
> 
> -- 
> Linux Promotion Center, NEC
> KaiGai Kohei <kaigai@xxxxxxxxxxxxx>
--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>