> Let's say, all N commands transmitted in a burst, and just
> one of these N gets ack-ed by the Target (via StatSN).
Let's say, all can_queue commands transmitted in a burst, and just
one of these can_queue commands gets ack-ed by the Target (via StatSN).
> -----Original Message-----
> From: Alex Aizman [mailto:itn780@xxxxxxxxx]
> Sent: Sunday, March 27, 2005 1:15 PM
> To: open-iscsi@xxxxxxxxxxxxxxxx
> Cc: mpm@xxxxxxxxxxx; andrea@xxxxxxx; michaelc@xxxxxxxxxxx;
> James.Bottomley@xxxxxxxxxxxxxxxxxxxxx; netdev@xxxxxxxxxxx;
> 'David S. Miller'; ksummit-2005-discuss@xxxxxxxxx
> Subject: RE: [Ksummit-2005-discuss] Summary of 2005 Kernel
> Summit Proposed Topics
>
>
> David S. Miller writes:
> >
> > On Sat, 26 Mar 2005 22:33:01 -0800
> > Dmitry Yusupov <dmitry_yus@xxxxxxxxx> wrote:
> >
> > > i.e. TCP stack should call NIC driver's callback after
> all SKB data
> > > been successfully copied to the user space. At that point
> > NIC driver
> > > will safely replenish HW ring. This way we could avoid most
> > of memory
> > > allocations on receive.
> >
> > How does this solve your problem? This is just simple SKB
> recycling,
> > and it's a pretty old idea.
> >
> > TCP packets can be held on receive for arbitrary amounts of time.
> >
> > This is especially true if data is received out of order or when
> > packets are dropped. We can't even wake up the user until
> the holes
> > in the sequence space are filled.
> >
> > Even if data is received properly and in order, there are no hard
> > guarentees about when the user will get back onto the CPU
> to get the
> > data copied to it.
> >
> > During these gaps in time, you will need to keep your HW
> receive ring
> > populated with packets.
>
>
> Here's the way I see it.
>
> 1) There are iSCSI connections that should be "protected",
> resources-wise.
> Examples: remote swap device, bank accounts database on RAID
> accessed via iSCSI, etc.
>
> 2) There are two ways to protect the "protected" connections.
> One "Big Brother" like way is a centralized Resource Manager
> that performs a fully deterministic resource accounting
> throughout the system, all the way from NIC descriptors and
> on-chip memory up to iSCSI buffers for Data-Out headers.
>
>
> 3) The 2nd way is *awareness* of the "protected" connections
> propagated throughout the system, along with incremental
> implementation of more sophisticated recovery schemes.
>
> 4) The Resource Manager could be used in the following way.
> At session open time iSCSI control plane calculates iSCSI and
> TCP resources that should be available at all times. The
> calculation is done based on: the number of SCSI commands to
> be processed in parallel (the 'can_queue'), the maximum size
> of the SCSI payload in the SG, the negotiated maximum number
> of outstanding R2Ts, sizes of Immediate and FirstBurst data.
>
> 5) If Resource manager says there is not enough resources,
> iSCSI fails session open. This is better than to get in
> trouble well into runtime.
>
> 6) For example: to transmit 'can_queue' commands, iSCSI needs
> N skbufs.
Let's say, all N commands transmitted in a burst, and just
> one of these N gets ack-ed by the Target (via StatSN). In the
> fully deterministic system this does not necessarily mean
> that the scsi-ml can now send one command - because the full
> condition involves also recycling of skbuf(s) used for
> transmitting this one completed command. And although it is
> hard to imagine that the command gets fully done by the
> remote target without Tx buffers getting recycled, the
> theoretical chance exists (e.g., the NIC is slow or the
> driver has a bad Tx recycling implementation), and the fully
> deterministic scheme should take it into account.
>
> 7) Therefore, prior to calling scsi_done() iSCSI asks
> Resource Manager whether all the TCP etc. resources used for
> this command are already recycled. If not, the scsi_done()
> gets postponed. In addition, iSCSI "complains" to Resource
> Manager that it enters slow path because of this, which could
> prompt the latter to take an action. (End of the example).
>
> 8) If we agree to declare some connections
> "resource-proteced", it would immediately mean that there are
> possibly other connections that are not (resource-protected).
> Which in turn gives the Resource Manager a flexibility to
> OOM-kill those unprotected connections and cannibalize the
> corresponding resources for the protected ones.
>
> 9) Without some awareness of the resource-protected
> connections, and without some kind of resource counting at
> runtime (let it be partial and incomplete for starters) - the
> only remaining way for customers that require HA (High
> Availability) is to over-engineer: use 64GB RAM, TBs of disk
> space, etc.
> Which is probably not the end of the world as long as the
> prices go down..
>
> Alex
>
|