[Top] [All Lists]

Re: MSG_EOR flag

To: Steve@xxxxxxxxxxx
Subject: Re: MSG_EOR flag
From: Henner Eisen <eis@xxxxxxxxxxxxx>
Date: Thu, 9 Mar 2000 20:06:42 +0100
Cc: netdev@xxxxxxxxxxx
In-reply-to: <200003072257.WAA18346@xxxxxxxxxxxxxx> (message from Steve Whitehouse on Tue, 7 Mar 2000 22:57:23 +0000 (GMT))
References: <200003072257.WAA18346@xxxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx

>>>>> "Steve" == Steve Whitehouse <steve@xxxxxxxxxxxxxx> writes:
    Steve> If you are suggesting that (I think you are, but I'm not
    Steve> 100% sure) that the protocol not transfer a single byte of

Yes, I was thinking about something like this ...

    Steve> data to userspace in a read() call until the EOR marker has
    Steve> been seen, this has problems. Firstly, upon the "buffer not
    Steve> big enough error" the userspace program has to find out
    Steve> somehow how big the buffer needs to be (probably another
    Steve> ioctl()).  Secondly, the kernel side buffers now have to be
    Steve> big enough to store a complete record from the transmitting
    Steve> application. There is nothing to say how large a record
    Steve> maybe - it could be many times larger than the physical
    Steve> memory of the receiving machine. Within a specific protocol
    Steve> there may well be limits, but in some there aren't. DECnet
    Steve> is one of the protocols that are unlimited in this way,
    Steve> which is most of the reason for the current behaviour.

... and I'm also aware of these problems.

    Steve> Overall, I prefer the option of keeping the behaviour of
    Steve> read() as simple as possible and just using the more
    Steve> comprehensive recvmsg() when more information is required.
    Steve> MSG_WAITALL means don't return until the specified number
    Steve> of bytes have been read. For SEQPACKET, that has to be
    Steve> amended so that early return occurs at message boundaries,
    Steve> otherwise the rule of no more than one record per recvmsg()
    Steve> call could be broken. However I don't think that
    Steve> MSG_WAITALL should be merged into read() for SEQPACKET
    Steve> sockets, simply because it gives no more information to
    Steve> userland than the current scheme,

I agree. The real source of the (potential) compatibilty problem is that
certain protocol families support SOCK_SEQPACKET by the datagram_* methods
in their code. They can be easily identified because they put
datagram_poll() in their proto_ops and call skb_recv_datagram() from
their recvmsg() method. The protocols families affected are
ax25, netrom, rose, and x25. 

The result is that their SOCK_SEQPACKET sockets behave more like
SOCK_DGRAM sockets with additional reliabilty (the reliabilty is
however affected by the problems mentioned above). The other
SOCK_SEQPACKET sockets (and this seems to be in line with posix
requirements) behave more like SOCK_STREAM with additional packet
boundary marker (MSG_EOR flag).

The result is that the sockets using the datagram methods behave
like this:

 - sendmsg() allways behaves as if MSG_EOR was set, although it is
   currently not set by applications. In in 2.2.x, sendmsg() will
   even return an error if MSG_EOR is set).

 - recvmsg() allways behaves as if MSG_WAITALL was set (although
   neither current applications nor sock_read() do set it). 2.2.x
   as well as 2.3.x will even return an error if MSG_WAITALL is
   set. MSG_TRUNC is however set correctly if the receive buffer
   in recvmsg is to small, the remaining data are discarded. But

Until this is fixed, I think that the above mentioned protocol
families should deal with the MSG_* flags as follows in order
to match posix sematics as close as possible:

(1) sendmsg() should accept the flag MSG_EOR.
(2) recvmsg() should accept the flag MSG_WAITALL.
(3) recvmsg() should set MSG_EOR before it returns.
    Question: shall it also do so if MSG_TRUNC is set?
(4) As long as the implementations do not support sending partial
    messages, sendmsg should return with error when MSG_EOR was not set.

It is very unlikely that (1) and (2) will break any existent
application. They will allow applications, which have been fixed to set
the flags correctly, continue to run with the broken kernel level protocol
implementation. Thus, I'd suggest that these changes are done even
in 2.2.x.

(3) Is a slightly more dangerous compatibilty issue. I'd suggest to
apply this to 2.3.x only. (Compatibilty problems of such sort should be
expected when upgrading major kernel version)

(4) will cause the largest compatibilty problem with current applications.
I think it is o.k. to do that in x25 (which is CONFIG_EXPERIMENTAL)
for 2.3.x. For the other protocol families, the maintainer would probably
decide not to do so. (Maybe, print out a net_rate_limit()'ed kernel debug

Do you agree?

When the protocols are fixed in future, applications using read()
should be able to reproduce the current semtantics
(read() returning fully re-assembles messages) be setting socket options
SO_RCVLOWAT and SO_RCVTIMEO to very large values. If necessary, the default
values for these options can be increased, but this can be discussed when
the fixing is actuelly done.


<Prev in Thread] Current Thread [Next in Thread>