We're still having this problem.
Marco has observed it with 2.2.19 and 2.4.5.
I've had him instrument the Pluto code, so we have more information.
- select says that the file descriptor is ready to be read from
- poll on that file descriptor, with POLLIN | POLLPRI | POLLOUT,
returns 1, with revents = 12. I decode this to be these two flags:
#define POLLOUT 0x0004 /* Writing now will not block */
#define POLLERR 0x0008 /* Error condition */
Notice that POLLIN is not on.
- the recvfrom with flags = 0 hangs waiting for a message.
Here's a part of the strace:
select(9, [4 6 8], NULL, NULL, {36, 0}) = 2 (in [4 8], left {36, 0})
select says that there is input available on file descriptors
4 and 8. 4 is a unix domain socket for control information.
8 is a UDP socket. We'll look at 8 first.
select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0})
Just to make sure (because we're paranoid): yes, select says
8 has input for us.
poll([{fd=8, events=POLLIN|POLLPRI|POLLOUT, revents=POLLOUT|POLLERR}], 1, 0) = 1
Further paranoia: what does poll say about 8? The program
doesn't actually use this info -- just for testing.
It says that there is POLLERR but not POLLIN.
recvfrom(8, 0xbffef5c8, 65536, 0, 0xbffef5b0, 0xbffef5ac) = ? ERESTARTSYS (To
be restarted)
--- SIGHUP (Hangup) ---
Hang in recvfrom until SIGHUP liberates us.
Since the select said that there was something to read, the recvfrom
must not hang, but it does.
I'm not sure what the correct kernel behaviour should be. Either the
select should say that there is nothing to read, or the recvfrom
should not hang. Andi suggested earlier that the read should return
immediately, with a -1 result, indicating error.
This looks like a kernel bug to me (or perhaps a documentation bug).
Hugh Redelmeier
hugh@xxxxxxxxxx voice: +1 416 482-8253
|