netdev
[Top] [All Lists]

Re: RFC: NAPI packet weighting patch

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: RFC: NAPI packet weighting patch
From: Donald Becker <becker@xxxxxxxxx>
Date: Tue, 21 Jun 2005 17:08:23 -0700 (PDT)
Cc: Rick Jones <rick.jones2@xxxxxx>, <netdev@xxxxxxxxxxx>, <davem@xxxxxxxxxx>
In-reply-to: <20050621223436.GG14251@xxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
On Wed, 22 Jun 2005, Andi Kleen wrote:

> > While much has changed since then, the same basic parameters remain
> >    - cache line size
> 
> In 96 we had 32 byte cache lines. These days 64-128 are common,
> with some 256 byte cache line systems around.

Good point.
I believe that the most common line size is 64 bytes for L1 cache.

Most L2 caches that have larger line sizes still fill only 64 byte 
blocks unless prefetching is triggered.  (Feel free to correct me with 
non-obscure CPUs and relevant cases.  For instance, I know that on the 
Itanium the 128 byte line L2 cache is used as L1, but only for FPU 
fetches.  That doesn't count.)
 
The implication here is that as soon as we look at the first byte of the 
MAC address, we have read in 64 bytes.  That's a whole minimum-size 
EThernet frame.

> >    - frame header size (MAC+IP+ProtocolHeader)
> 
> In 96 people tended to not use time stamps.

Ehh, not a big difference.
 
> >    - hot cache lines from copying or type classification
> Not sure what you mean with that.

See the comment above. We decide if a packet is multicast vs. unicast, IP 
vs. other at approximately interrupt/"rx_copybreak" time.  Very few NIC 
provide this info in status bits, so we end up looking at the packet 
header.  That read moves the previously known-uncached data (after all, it 
was just came in from a bus write) into the L1 cache for the CPU handling 
the device.  Once it's there, the copy is almost free.

[[ Background: Yes, the allocating the new skbuff is very expensive.  But 
we can either allocate a new, correctly-sized skbuff to copy into, or 
allocate a new full-sized skbuff to replace the one we will send to the Rx 
queue.  ]] 
 
> >    - cold memory lines from PCI writes
> 
> I suspect in '96 chipsets also didn't do as aggressive prefetching
> as they do today.

Prefetching helps linear read bandwidth, but we shouldn't be triggering 
it.  And I claim that cache line prefetching only restores the relative
balance between L1/L2 caches, otherwise the long L2 cache lines would be 
very expensive with bump-read-bump-read with linear scans through memory.

-- 
Donald Becker                           becker@xxxxxxxxx
Scyld Software                          Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220           www.scyld.com
Annapolis MD 21403                      410-990-9993


<Prev in Thread] Current Thread [Next in Thread>