On Wed, Aug 11, 2004 at 08:18:29PM -0300, Werner Almesberger wrote:
> Suparna Bhattacharya wrote:
> > I was hoping all this while that someone with deeper knowledge
> > in this area than me would respond, but well, maybe they were
> > all quiet chuckles :) ?
> Or they haven't stopped laughing yet ;-)
> > Does your proposal require additional semantics on aio TCP socket
> > reads and writes that differ from the synchronous TCP case, besides
> > not blocking and indicating completion through aio_complete ?
> Unfortunately, yes. First of all, we'd need a definition of where
> in the stream the AIO operation is applied. Two possibilities:
> 1) explicit: apply the concept of a "file position" to the stream,
> and make it visible to applications (through aio_offset)
> 2) implicit: follow the existing principle that any read consumes
> just the next chunk of data, and internally assign positions
> based on the sequence number. As a consequence, AIOs would be
> ordered over time (in the case of individual aio_reads) and
> space (in the case of lio_listio).
> In any case, it's a departure from existing API properties, i.e.
> 1) would introduce an application-visible "stream position" for
> TCP (which doesn't agree with TCP being able to send arbitrarily
> long streams, but then, a nice 64 bit position is probably close
> enough to near-infinity), and 2) adds ordering to AIO, which may
> be undesirable in terms of consistency, and also in terms of
> lock avoidance.
> There's also the issue of whether an AIO read should complete
> after retrieving less than aio_nbytes. Three possibilities:
> 1) never (probably not a great idea)
> 2) may always (like "read" does)
> 3) only on the last AIO read returning data
> 2) would be the most flexible approach, but requires either
> application-settable positions (to fetch the missing part) or
> automatic re-arranging of subsequent AIO reads.
> 3) avoids the problems of 2), but doesn't work well if the
> reader didn't correctly predict segment boundaries, and may
> cause trouble (like in 2) if there are pending requests after
> the one that was "short", and new data arrives.
> Last but not least, aio_forget would have to tell TCP that we're
> not only no longer interested in retrieving a certain piece of
> data, but that we'll never be.
> If positions are implicit, aio_cancel would actually have this
> effect (since there would be no way to request the same range of
> data again), so we wouldn't even need aio_forget.
> > The notion of which segment to aio_forget on the Rx path
> > is a little hazy to me (were you were indeed referring
> > to the receive side here ? I can see this more clearly for
> > the send side when coupled with zero copy).
> Yes, this is mainly about receiving. Similar things could be
> done for sending, but that's largely a separate issue.
> Let's say I'm issuing three AIOs:
> 1: offset = 0, nbytes = 100
> 2: offset = 100, nbytes = 100
> 3: offset = 200, nbytes = 100
> Now a segment arrives for 0-99, and another for 200-299.
> Normal TCP will retry (by ACKing sequence 100) until also the
> segment 100-199 has made it.
> With AIO-TCP, if our application is happy with getting two
> out of the three requests, it can now aio_forget the 2nd
> request. TCP would notice that can now ACK up to sequence 200,
> for the forgotten read, and even up to sequence 300, because
> the 200-299 has been received. So it'll ACK sequence 300 now,
> and happily move on, without caring whether segment 100-199
> ever gets through.
OK, in the light of the change in semantics you described
earlier, introducing the notion of an offset, this makes sense.
Thanks for clarifying.
> - Werner
> / Werner Almesberger, Buenos Aires, Argentina werner@xxxxxxxxxxxxxxx /
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India