netdev
[Top] [All Lists]

select says I can read, but recvfrom hangs

To: <netdev@xxxxxxxxxxx>
Subject: select says I can read, but recvfrom hangs
From: "D. Hugh Redelmeier" <hugh@xxxxxxxxxx>
Date: Wed, 20 Jun 2001 22:57:59 -0400 (EDT)
Cc: Marco Berizzi <pupilla@xxxxxxxxxxx>
Reply-to: <hugh@xxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
I'm the maintainer of Pluto, the IKE daemon for the LINUX FreeS/WAN
project.

Some of our users have experienced situations where the Pluto process
becomes unresponsive because it is waiting in a recvfrom.  The thing
that puzzles me is that recvfrom will not be executed unless select
has indicated that there is something to read on that socket.  I have
no idea how that could happen.

Do any of you have any ideas about what could be happening?

Details:

- A couple of people have noticed it happening, but not often.  It may
  be happening without being noticed, but not at great frequency.

- one user has been able to reproduce it fairly consistently.  I've
  mutated the code to try to narrow down what is going on.  Right
  now, there are three selects that say a message is ready, but the
  recvfrom still hangs.

- this user's system is Slackware 7.1, with a kernel.org 2.2.19,
  patched by FreeS/WAN 1.91.  Richard Briggs, our kernel guy doesn't
  see a way that FreeS/WAN affects the input path for messages that
  are UDP (i.e. not ESP and not AH)

- the socket in question is bound to UDP, Port 500 with the IP address
  of the public interface.  The RFCs dictate this.  Socket options:
  SO_REUSEADDR and IP_RECVERR.  Hmm, I wonder if IP_RECVERR could be
  the problem -- I have evidence that not many folks have used it.

- I would not ask you to read the whole of Pluto to help me.  But if
  you wish to, it can be found through www.freeswan.org.  Here is the
  recvfrom that is hanging, and the preceding just-to-be-safe select:

        {
            fd_set nreadfds;
            int nndes;
            struct timeval tm;

            tm.tv_sec = 0;      /* don't wait at all */
            tm.tv_usec = 0;

            FD_ZERO(&nreadfds);
            FD_SET(ifp->fd, &nreadfds);
            do {
                nndes = select(ifp->fd + 1, &nreadfds, NULL, NULL, &tm);
            } while (nndes == -1 && errno == EINTR);
            if (nndes < 0)
            {
                log_errno((e, "re-select() failed in comm_handle"));
                return;
            }
            if (nndes == 0)
            {
                log("SURPRISE: re-select() in comm_handle finds %s no longer 
ready for input"
                    , ifp->rname);
                return;
            }
            passert(nndes == 1 && FD_ISSET(ifp->fd, &nreadfds));
        }

        passert(select_found == ifp->fd);
        zero(&from.sa);
        packet_len = recvfrom(ifp->fd, bigbuffer, sizeof(bigbuffer), 0
            , &from.sa, &from_len);
        passert(select_found == ifp->fd);       /* true paranoia */
        select_found = NULL_FD;

- the only signal handlers simply set a sigatomic_t variable and
  return (SIGHUP, SIGTERM).  They are not firing.

- The file descriptor in question is not shared with another process.
  Locking prevents two copies of Pluto from running at once.

- the scenario that provokes the problem for the user goes as follows:

  + Pluto is running on a security gateway, with a Windows NT box
    behind it

  + he connects a second windows box, running PGPnet (an IPSEC
    implementation), through the internet, to the public interface
    of the security gateway.  This box negotiates a tunnel with
    the security gateway.

  + he disconnects the second windows box, and reconnects the same way
    but with a different IP address (the IP address is dynamically
    assigned whenever he connects this box to the internet).

  + the second box starts and completes IKE negotiation.

  + Pluto is tricked into hanging on a recvfrom.

Is there any way to tell from the system whether the select is wrong
(i.e. there is no message) or the recvfrom is wrong (i.e. there is a
message, but it still hangs reading it)?

Thanks,

Hugh Redelmeier
hugh@xxxxxxxxxx  voice: +1 416 482-8253



<Prev in Thread] Current Thread [Next in Thread>