I have experienced some odd behavior when communicating between multiple
processes through the loopback device using poll() to wait for input.
Attachment 'random-inet.c' is a program that shows the problem. Basically it
starts a number of processes. Each process makes a connection to the each of
the other processes (resembling MPI implementations such as lam-mpi). Now a
given number of messages are sent to a pseudo-random destination. When a
process receives one of the messages it forwards it to another randomly
chosen destination. The program is run as follows:
./random-inet <# processes> <# messages>
Problem: One would expect that this program would use up all the available
CPU-time, but this is not the case. Allready with 3 processes and 1 message
there is still some idle CPU-time and it becomes worse when more process are
added.
As a sanity check i created the same program using UNIX socket created by
socketpair() (random-spair.c). This makes the problem go away.
I have also attached the MPI program 'random-mpi.c' showing the same problem
with lam-mpi 7.0.6.
Another MPI program that does NOT have the problem is 'ring-mpi.c'. This
sends the messages around in a ring of processes. The controlled
communication pattern somehow makes the problem go away.
I have attached the 'ver-linux' of the systems that I have tested. I know
these are not mainline kernels but I have not found any mention of such a
problem in the latest changelogs. I will gladly try it on the mainline if
that would help.
I'm not on the list so please CC.
Hans Henrik Happe
random-inet.c
Description: Text Data
ver_linux.2.4.24
Description: Text document
ver_linux.2.6.11-gentoo-r6
Description: Text document
ver_linux.2.6.3-7mdk
Description: Text document
random-mpi.c
Description: Text Data
random-spair.c
Description: Text Data
ring-mpi.c
Description: Text Data
|