netdev
[Top] [All Lists]

Re: net-AIO and real-time TCP (blue sky research)

To: Werner Almesberger <werner@xxxxxxxxxxxxxxx>
Subject: Re: net-AIO and real-time TCP (blue sky research)
From: Suparna Bhattacharya <suparna@xxxxxxxxxx>
Date: Thu, 12 Aug 2004 17:37:10 +0530
Cc: netdev@xxxxxxxxxxx
In-reply-to: <20040811201829.T28020@xxxxxxxxxxxxxxx>
References: <20040801235102.K1276@xxxxxxxxxxxxxxx> <20040810155148.GA4630@xxxxxxxxxx> <20040811201829.T28020@xxxxxxxxxxxxxxx>
Reply-to: suparna@xxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4i
On Wed, Aug 11, 2004 at 08:18:29PM -0300, Werner Almesberger wrote:
> Suparna Bhattacharya wrote:
> > I was hoping all this while that someone with deeper knowledge
> > in this area than me would respond, but well, maybe they were
> > all quiet chuckles :) ?
> 
> Or they haven't stopped laughing yet ;-)
> 
> > Does your proposal require additional semantics on aio TCP socket
> > reads and writes that differ from the synchronous TCP case, besides
> > not blocking and indicating completion through aio_complete ?
> 
> Unfortunately, yes. First of all, we'd need a definition of where
> in the stream the AIO operation is applied. Two possibilities:
> 
>  1) explicit: apply the concept of a "file position" to the stream,
>     and make it visible to applications (through aio_offset)
> 
>  2) implicit: follow the existing principle that any read consumes
>     just the next chunk of data, and internally assign positions
>     based on the sequence number. As a consequence, AIOs would be
>     ordered over time (in the case of individual aio_reads) and
>     space (in the case of lio_listio).
> 
> In any case, it's a departure from existing API properties, i.e.
> 1) would introduce an application-visible "stream position" for
> TCP (which doesn't agree with TCP being able to send arbitrarily
> long streams, but then, a nice 64 bit position is probably close
> enough to near-infinity), and 2) adds ordering to AIO, which may
> be undesirable in terms of consistency, and also in terms of
> lock avoidance.
> 
> There's also the issue of whether an AIO read should complete
> after retrieving less than aio_nbytes. Three possibilities:
> 
>  1) never (probably not a great idea)
>  2) may always (like "read" does)
>  3) only on the last AIO read returning data
> 
> 2) would be the most flexible approach, but requires either
> application-settable positions (to fetch the missing part) or
> automatic re-arranging of subsequent AIO reads.
> 
> 3) avoids the problems of 2), but doesn't work well if the
> reader didn't correctly predict segment boundaries, and may
> cause trouble (like in 2) if there are pending requests after
> the one that was "short", and new data arrives.
> 
> Last but not least, aio_forget would have to tell TCP that we're
> not only no longer interested in retrieving a certain piece of
> data, but that we'll never be.
> 
> If positions are implicit, aio_cancel would actually have this
> effect (since there would be no way to request the same range of
> data again), so we wouldn't even need aio_forget.
> 
> > The notion of which segment to aio_forget on the Rx path 
> > is a little hazy to me (were you were indeed referring
> > to the receive side here ? I can see this more clearly for
> > the send side when coupled with zero copy).
> 
> Yes, this is mainly about receiving. Similar things could be
> done for sending, but that's largely a separate issue.
> 
> Let's say I'm issuing three AIOs:
> 
>  1: offset = 0, nbytes = 100
>  2: offset = 100, nbytes = 100
>  3: offset = 200, nbytes = 100
> 
> Now a segment arrives for 0-99, and another for 200-299.
> Normal TCP will retry (by ACKing sequence 100) until also the
> segment 100-199 has made it.
> 
> With AIO-TCP, if our application is happy with getting two
> out of the three requests, it can now aio_forget the 2nd
> request. TCP would notice that can now ACK up to sequence 200,
> for the forgotten read, and even up to sequence 300, because
> the 200-299 has been received. So it'll ACK sequence 300 now,
> and happily move on, without caring whether segment 100-199
> ever gets through.

OK, in the light of the change in semantics you described
earlier, introducing the notion of an offset, this makes sense. 
Thanks for clarifying.

Regards
Suparna

> 
> - Werner
> 
> -- 
>   _________________________________________________________________________
>  / Werner Almesberger, Buenos Aires, Argentina     werner@xxxxxxxxxxxxxxx /
> /_http://www.almesberger.net/____________________________________________/

-- 
Suparna Bhattacharya (suparna@xxxxxxxxxx)
Linux Technology Center
IBM Software Lab, India


<Prev in Thread] Current Thread [Next in Thread>