>>>>> "Steve" == Steve Whitehouse <steve@xxxxxxxxxxxxxx> writes:
Steve> If you are suggesting that (I think you are, but I'm not
Steve> 100% sure) that the protocol not transfer a single byte of
Yes, I was thinking about something like this ...
Steve> data to userspace in a read() call until the EOR marker has
Steve> been seen, this has problems. Firstly, upon the "buffer not
Steve> big enough error" the userspace program has to find out
Steve> somehow how big the buffer needs to be (probably another
Steve> ioctl()). Secondly, the kernel side buffers now have to be
Steve> big enough to store a complete record from the transmitting
Steve> application. There is nothing to say how large a record
Steve> maybe - it could be many times larger than the physical
Steve> memory of the receiving machine. Within a specific protocol
Steve> there may well be limits, but in some there aren't. DECnet
Steve> is one of the protocols that are unlimited in this way,
Steve> which is most of the reason for the current behaviour.
... and I'm also aware of these problems.
Steve> Overall, I prefer the option of keeping the behaviour of
Steve> read() as simple as possible and just using the more
Steve> comprehensive recvmsg() when more information is required.
Steve> MSG_WAITALL means don't return until the specified number
Steve> of bytes have been read. For SEQPACKET, that has to be
Steve> amended so that early return occurs at message boundaries,
Steve> otherwise the rule of no more than one record per recvmsg()
Steve> call could be broken. However I don't think that
Steve> MSG_WAITALL should be merged into read() for SEQPACKET
Steve> sockets, simply because it gives no more information to
Steve> userland than the current scheme,
I agree. The real source of the (potential) compatibilty problem is that
certain protocol families support SOCK_SEQPACKET by the datagram_* methods
in their code. They can be easily identified because they put
datagram_poll() in their proto_ops and call skb_recv_datagram() from
their recvmsg() method. The protocols families affected are
ax25, netrom, rose, and x25.
The result is that their SOCK_SEQPACKET sockets behave more like
SOCK_DGRAM sockets with additional reliabilty (the reliabilty is
however affected by the problems mentioned above). The other
SOCK_SEQPACKET sockets (and this seems to be in line with posix
requirements) behave more like SOCK_STREAM with additional packet
boundary marker (MSG_EOR flag).
The result is that the sockets using the datagram methods behave
- sendmsg() allways behaves as if MSG_EOR was set, although it is
currently not set by applications. In in 2.2.x, sendmsg() will
even return an error if MSG_EOR is set).
- recvmsg() allways behaves as if MSG_WAITALL was set (although
neither current applications nor sock_read() do set it). 2.2.x
as well as 2.3.x will even return an error if MSG_WAITALL is
set. MSG_TRUNC is however set correctly if the receive buffer
in recvmsg is to small, the remaining data are discarded. But
Until this is fixed, I think that the above mentioned protocol
families should deal with the MSG_* flags as follows in order
to match posix sematics as close as possible:
(1) sendmsg() should accept the flag MSG_EOR.
(2) recvmsg() should accept the flag MSG_WAITALL.
(3) recvmsg() should set MSG_EOR before it returns.
Question: shall it also do so if MSG_TRUNC is set?
(4) As long as the implementations do not support sending partial
messages, sendmsg should return with error when MSG_EOR was not set.
It is very unlikely that (1) and (2) will break any existent
application. They will allow applications, which have been fixed to set
the flags correctly, continue to run with the broken kernel level protocol
implementation. Thus, I'd suggest that these changes are done even
(3) Is a slightly more dangerous compatibilty issue. I'd suggest to
apply this to 2.3.x only. (Compatibilty problems of such sort should be
expected when upgrading major kernel version)
(4) will cause the largest compatibilty problem with current applications.
I think it is o.k. to do that in x25 (which is CONFIG_EXPERIMENTAL)
for 2.3.x. For the other protocol families, the maintainer would probably
decide not to do so. (Maybe, print out a net_rate_limit()'ed kernel debug
Do you agree?
When the protocols are fixed in future, applications using read()
should be able to reproduce the current semtantics
(read() returning fully re-assembles messages) be setting socket options
SO_RCVLOWAT and SO_RCVTIMEO to very large values. If necessary, the default
values for these options can be increased, but this can be discussed when
the fixing is actuelly done.