[Top] [All Lists]

Re: net-AIO and real-time TCP (blue sky research)

To: Werner Almesberger <werner@xxxxxxxxxxxxxxx>
Subject: Re: net-AIO and real-time TCP (blue sky research)
From: Suparna Bhattacharya <suparna@xxxxxxxxxx>
Date: Thu, 12 Aug 2004 17:37:10 +0530
Cc: netdev@xxxxxxxxxxx
In-reply-to: <20040811201829.T28020@xxxxxxxxxxxxxxx>
References: <20040801235102.K1276@xxxxxxxxxxxxxxx> <20040810155148.GA4630@xxxxxxxxxx> <20040811201829.T28020@xxxxxxxxxxxxxxx>
Reply-to: suparna@xxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4i
On Wed, Aug 11, 2004 at 08:18:29PM -0300, Werner Almesberger wrote:
> Suparna Bhattacharya wrote:
> > I was hoping all this while that someone with deeper knowledge
> > in this area than me would respond, but well, maybe they were
> > all quiet chuckles :) ?
> Or they haven't stopped laughing yet ;-)
> > Does your proposal require additional semantics on aio TCP socket
> > reads and writes that differ from the synchronous TCP case, besides
> > not blocking and indicating completion through aio_complete ?
> Unfortunately, yes. First of all, we'd need a definition of where
> in the stream the AIO operation is applied. Two possibilities:
>  1) explicit: apply the concept of a "file position" to the stream,
>     and make it visible to applications (through aio_offset)
>  2) implicit: follow the existing principle that any read consumes
>     just the next chunk of data, and internally assign positions
>     based on the sequence number. As a consequence, AIOs would be
>     ordered over time (in the case of individual aio_reads) and
>     space (in the case of lio_listio).
> In any case, it's a departure from existing API properties, i.e.
> 1) would introduce an application-visible "stream position" for
> TCP (which doesn't agree with TCP being able to send arbitrarily
> long streams, but then, a nice 64 bit position is probably close
> enough to near-infinity), and 2) adds ordering to AIO, which may
> be undesirable in terms of consistency, and also in terms of
> lock avoidance.
> There's also the issue of whether an AIO read should complete
> after retrieving less than aio_nbytes. Three possibilities:
>  1) never (probably not a great idea)
>  2) may always (like "read" does)
>  3) only on the last AIO read returning data
> 2) would be the most flexible approach, but requires either
> application-settable positions (to fetch the missing part) or
> automatic re-arranging of subsequent AIO reads.
> 3) avoids the problems of 2), but doesn't work well if the
> reader didn't correctly predict segment boundaries, and may
> cause trouble (like in 2) if there are pending requests after
> the one that was "short", and new data arrives.
> Last but not least, aio_forget would have to tell TCP that we're
> not only no longer interested in retrieving a certain piece of
> data, but that we'll never be.
> If positions are implicit, aio_cancel would actually have this
> effect (since there would be no way to request the same range of
> data again), so we wouldn't even need aio_forget.
> > The notion of which segment to aio_forget on the Rx path 
> > is a little hazy to me (were you were indeed referring
> > to the receive side here ? I can see this more clearly for
> > the send side when coupled with zero copy).
> Yes, this is mainly about receiving. Similar things could be
> done for sending, but that's largely a separate issue.
> Let's say I'm issuing three AIOs:
>  1: offset = 0, nbytes = 100
>  2: offset = 100, nbytes = 100
>  3: offset = 200, nbytes = 100
> Now a segment arrives for 0-99, and another for 200-299.
> Normal TCP will retry (by ACKing sequence 100) until also the
> segment 100-199 has made it.
> With AIO-TCP, if our application is happy with getting two
> out of the three requests, it can now aio_forget the 2nd
> request. TCP would notice that can now ACK up to sequence 200,
> for the forgotten read, and even up to sequence 300, because
> the 200-299 has been received. So it'll ACK sequence 300 now,
> and happily move on, without caring whether segment 100-199
> ever gets through.

OK, in the light of the change in semantics you described
earlier, introducing the notion of an offset, this makes sense. 
Thanks for clarifying.


> - Werner
> -- 
>   _________________________________________________________________________
>  / Werner Almesberger, Buenos Aires, Argentina     werner@xxxxxxxxxxxxxxx /
> /_

Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India

<Prev in Thread] Current Thread [Next in Thread>