I think you make some good points here and I honestly need to do some
research (and probably talk to some co-workers) before I can get you a
good response. We've got some deadlines we're up against right now.
Would it be ok if I defer this for a couple days?
If you want to be sure the ball isn't dropped, you could file a Bugzilla
bug:
http://oss.sgi.com/bugzilla/
In any case, I hope to get back to you in a few days. If you need a
response sooner let me know and I'll try to shuffle some things around.
On Tue, 9 Nov 2004, Kingsley Cheung wrote:
> Hi Erik,
>
> Some other issues as well. Please correct me if I'm wrong :)
>
> 1) I've noticed that pagg_detach() has been moved from
> __put_task_struct() to do_exit() between the pagg patches for Linux
> 2.6.8. This occurred as described at
> http://oss.sgi.com/archives/pagg/2004-08/msg00002.html
>
> Anyway, I think the move might have raised a problem with the
> registration code. Below is the relevant segment of code I'd like to
> consider:
>
> get_task_struct(p);
> read_unlock(&tasklist_lock);
> down_write(&p->pagg_sem);
> paggp = pagg_get(p, pagg_hook_new->name);
> if (paggp == NULL) {
> paggp = pagg_alloc(p, pagg_hook_new);
> if (paggp != NULL)
> init_result = pagg_hook_new->init(p, paggp);
> else
> init_result = -ENOMEM;
> }
> up_write(&p->pagg_sem);
> read_lock(&tasklist_lock);
> /* Like in remove_client_paggs_from_all_tasks, if the task
> * disappeared on us while we were going through the
> * for_each_process loop, we need to start over with that loop.
> * That's why we have the list_empty here */
> task_exited = list_empty(&p->tasks);
> put_task_struct(p);
>
> Now the tasks on the task list include those that are zombies. This
> occurs during exit_notify(), which sets their state to ZOMBIE but
> leaves them on the task list. Therefore the above code will allocate
> paggs for zombie tasks as well. [Note that task_exited won't trigger
> as the task is still on the task list.]
>
> Now unfortunately, by moving pagg_detach() out of __put_task_struct(),
> when the zombie has its task structure released, the pagg that was
> allocated during registration is no longer detached. Thus we'll could
> have a memory leak once the task structure is dropped and
> detach_pagg() is not invoked.
>
> In fact the leak could happen not just for zombies, but for tasks just
> about to exit. The race occurs in do_exit(), in which the
> detach_pagg() could be called before we allocate the pagg.
>
> Perhaps the quickest solution is to place a detach_pagg() call back
> into __put_task_struct(). Another suggestion might be to check for
> the PF_EXITING flag when allocating the pagg.
>
>
> 2) Is pagg intended to be a container for Linux tasks or thread
> groups? I'm avoiding the use of the word "process" as that could be
> confusing in this context.
>
> The reason I'm asking is because throughout the pagg registration code
> the for_each_process() macro is used. This happens in
> remove_client_paggs_from_all_tasks() and pagg_hook_register(). The
> problem is, however, since 2.6 for_each_process() only traverses
> thread group leader tasks. Thus the registration and deregistration
> code only considers thread group leaders.
>
> In contrast the pagg_attach() and pagg_detach() hooks are placed to
> catch all tasks in the fork and exit code. With the way the
> registration and deregistration code is behaving, this means that we
> won't catch all existing tasks and we won't clean up after all tasks
> properly for multi-threaded applications.
>
> So what is pagg intended for? To catch every single task? In that
> case the do_each_thread() and while_each_thread() macros should be
> used. Or is pagg a container for thread groups only? The changes for
> that would require much more thought...
>
>
> 3) Is the "module" pointer used anywhere in the pagg_hook struct meant
> to be used in the pagg code or in the client code? It doesn't seem to
> be used in the pagg code at all.
>
>
> Thanks!
> --
> Kingsley
>
--
Erik Jacobson - Linux System Software - Silicon Graphics - Eagan, Minnesota
|