[Top] [All Lists]

Re: zero copy skbuff.c enhancments

To: DJBARROW@xxxxxxxxxx
Subject: Re: zero copy skbuff.c enhancments
From: Donald Becker <becker@xxxxxxxxx>
Date: Tue, 25 Jul 2000 16:30:41 -0400 (EDT)
Cc: netdev@xxxxxxxxxxx
In-reply-to: <C1256926.004E9D24.00@xxxxxxxxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
On Mon, 24 Jul 2000 DJBARROW@xxxxxxxxxx wrote:

> I've written a patch for skbuff.c & skbuff.h which allows zero-copying of
> incoming network frames
> from network device drivers by using these new apis.

Everyone has a different idea of what zero copy copy is.
The rule seems to be "My code is zero copy", no matter how many copies it

Despite the rule, your scheme doesn't qualify as zero-copy.

> make_zero_copy_skb was added so network device drivers could take do
>  give the network layer data received without memcpying it,

This is pointless with most modern network cards, which almost always use
descriptor-based architecture.  The Linux drivers use this to receive
packets directly into preallocated, full-sized skbuffs.

You scheme has the driver receiving into a generic buffer, and then stuffing
the pointer into the skbuff.  This approach just adds overhead.

The only place this scheme would be useful at all is with the RTL8139.  The
RTL8139 receives directly into a linear receive ring, with packets written
one after another.  But
   - no one buys the rtl8139 for performance, just for the low cost
   - leaving packets on the small 32KB or 64KB receive ring will result in
     quickly running out of recieve space.
   - we already copy-and-checksum out of the receive ring to amortize the
     copy cost.

> under these circumstances anyway. The only case I can think of where they
> may not work
> is drivers depending on the 16 byte skb_reserve kludge in dev_alloc_skb.

I'm uncertain of which skb_reserve() you are referring to, but skb_reserve()
is often used for performance reasons.  For instance, the drivers use
skb_reserve(skb, 2) to longword- and cache-align the IP header. In some cases
the skbuff data section is offset to allow the chip to linearly write the Rx
status before the packet data in one PCI transaction.  With careful driver
construction each receive packet can be transferred in a single PCI burst
instead of four or more transactions that would otherwise be needed.

Doing true zero-copy receive would require the adapter to
  - verify the IP and TCP payload checksums
  - interpret the IP header and options
  - look up the proper socket
  - verify that this data segment is next in order
  - look up the destination region.
    If the application hasn't done a socket read, you lose.
  - look up the physical page in the page table, doing proper lockin
    If the application has done a read, but the page doesn't exist, you lose.
  - lock the socket.
  - wire down the destination pages
  - copy the data to the user application pages
  - unwire the pages, update the socket info, unlock the socket

I've missed a few steps here, but this is the general idea.  Some of the
steps could be simplifed with the adapter handling the protocol stack, but
handling initialization and exceptions must still be done by the main
processor.  So many of these operations are on likely-cached data structures
that having an adapter do zero-copy on receive is just a loss.

Donald Becker                           becker@xxxxxxxxx
Scyld Computing Corporation   
410 Severn Ave. Suite 210               Beowulf Clusters / Linux Installations
Annapolis MD 21403

<Prev in Thread] Current Thread [Next in Thread>