pagg
[Top] [All Lists]

Re: PAGG Threading Issues (was Re: [patch] Registration Check for Compul

To: Kingsley Cheung <kingsley@xxxxxxxxxx>
Subject: Re: PAGG Threading Issues (was Re: [patch] Registration Check for Compulsory Hooks)
From: Erik Jacobson <erikj@xxxxxxxxxxxxxxxxxxxxxxx>
Date: Tue, 9 Nov 2004 15:02:24 -0600
Cc: pagg@xxxxxxxxxxx
In-reply-to: <20041108230039.GE18308@xxxxxxxxxx>
References: <20041108033648.GB18308@xxxxxxxxxx> <Pine.SGI.4.53.0411072233350.2612495@xxxxxxxxxxxxxxxxxxxxxxx> <20041108230039.GE18308@xxxxxxxxxx>
Sender: pagg-bounce@xxxxxxxxxxx
I think you make some good points here and I honestly need to do some
research (and probably talk to some co-workers) before I can get you a
good response.  We've got some deadlines we're up against right now.

Would it be ok if I defer this for a couple days?

If you want to be sure the ball isn't dropped, you could file a Bugzilla
bug:

http://oss.sgi.com/bugzilla/

In any case, I hope to get back to you in a few days.  If you need a
response sooner let me know and I'll try to shuffle some things around.

On Tue, 9 Nov 2004, Kingsley Cheung wrote:

> Hi Erik,
>
> Some other issues as well.  Please correct me if I'm wrong :)
>
> 1) I've noticed that pagg_detach() has been moved from
> __put_task_struct() to do_exit() between the pagg patches for Linux
> 2.6.8.  This occurred as described at
> http://oss.sgi.com/archives/pagg/2004-08/msg00002.html
>
> Anyway, I think the move might have raised a problem with the
> registration code.  Below is the relevant segment of code I'd like to
> consider:
>
>       get_task_struct(p);
>       read_unlock(&tasklist_lock);
>       down_write(&p->pagg_sem);
>       paggp = pagg_get(p, pagg_hook_new->name);
>       if (paggp == NULL) {
>               paggp = pagg_alloc(p, pagg_hook_new);
>               if (paggp != NULL)
>                       init_result = pagg_hook_new->init(p, paggp);
>               else
>                       init_result = -ENOMEM;
>       }
>       up_write(&p->pagg_sem);
>       read_lock(&tasklist_lock);
>       /* Like in remove_client_paggs_from_all_tasks, if the task
>        * disappeared on us while we were going through the
>        * for_each_process loop, we need to start over with that loop.
>        * That's why we have the list_empty here */
>       task_exited = list_empty(&p->tasks);
>       put_task_struct(p);
>
> Now the tasks on the task list include those that are zombies.  This
> occurs during exit_notify(), which sets their state to ZOMBIE but
> leaves them on the task list.  Therefore the above code will allocate
> paggs for zombie tasks as well. [Note that task_exited won't trigger
> as the task is still on the task list.]
>
> Now unfortunately, by moving pagg_detach() out of __put_task_struct(),
> when the zombie has its task structure released, the pagg that was
> allocated during registration is no longer detached.  Thus we'll could
> have a memory leak once the task structure is dropped and
> detach_pagg() is not invoked.
>
> In fact the leak could happen not just for zombies, but for tasks just
> about to exit.  The race occurs in do_exit(), in which the
> detach_pagg() could be called before we allocate the pagg.
>
> Perhaps the quickest solution is to place a detach_pagg() call back
> into __put_task_struct().  Another suggestion might be to check for
> the PF_EXITING flag when allocating the pagg.
>
>
> 2) Is pagg intended to be a container for Linux tasks or thread
> groups?  I'm avoiding the use of the word "process" as that could be
> confusing in this context.
>
> The reason I'm asking is because throughout the pagg registration code
> the for_each_process() macro is used.  This happens in
> remove_client_paggs_from_all_tasks() and pagg_hook_register().  The
> problem is, however, since 2.6 for_each_process() only traverses
> thread group leader tasks.  Thus the registration and deregistration
> code only considers thread group leaders.
>
> In contrast the pagg_attach() and pagg_detach() hooks are placed to
> catch all tasks in the fork and exit code.  With the way the
> registration and deregistration code is behaving, this means that we
> won't catch all existing tasks and we won't clean up after all tasks
> properly for multi-threaded applications.
>
> So what is pagg intended for?  To catch every single task?  In that
> case the do_each_thread() and while_each_thread() macros should be
> used.  Or is pagg a container for thread groups only?  The changes for
> that would require much more thought...
>
>
> 3) Is the "module" pointer used anywhere in the pagg_hook struct meant
> to be used in the pagg code or in the client code?  It doesn't seem to
> be used in the pagg code at all.
>
>
> Thanks!
> --
>               Kingsley
>

--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota

<Prev in Thread] Current Thread [Next in Thread>