I will let Anton respond to this. I think he may have tried this some
time back in his early prototypes to fix this.
I think the problem was not where the buffer started but where the packet
ended up within the buffer.
Due to varying sizes of TCP and IP headers the packet ended up at some
non-cache aligned address.
What we need for the DMA to work well is to have the final packet (with
datalink headers) starting on a cache line as its the final packet that
must be DMA'd. In fact it may need to to be aligned to a higher level than
that (not sure).
haveblue@xxxxxxxxxxxxxxxxxxxxxxx on 06/13/2003 11:21:03 AM
To: Herman Dierks/Austin/IBM@IBMUS
cc: "Feldman, Scott" <scott.feldman@xxxxxxxxx>, David Gibson
<dwg@xxxxxxxxxxx>, Linux Kernel Mailing List
<linux-kernel@xxxxxxxxxxxxxxx>, Anton Blanchard <anton@xxxxxxxxx>,
Nancy J Milliner/Austin/IBM@IBMUS, Ricardo C
Gonzalez/Austin/IBM@ibmus, Brian Twichell/Austin/IBM@IBMUS,
Subject: RE: e1000 performance hack for ppc64 (Power4)
Too long to quote:
Wouldn't you get most of the benefit from copying that stuff around in
the driver if you allocated the skb->data aligned in the first place?
There's already code to align them on CPU cache boundaries:
#define SKB_DATA_ALIGN(X) (((X) + (SMP_CACHE_BYTES - 1)) & \
~(SMP_CACHE_BYTES - 1))
So, do something like this:
#define SKB_ALIGN_BYTES ARCH_ALIGN_SKB_BYTES
#define SKB_ALIGN_BYTES SMP_CACHE_BYTES
#define SKB_DATA_ALIGN(X) (((X) + (ARCH_ALIGN_SKB - 1)) & \
~(SKB_ALIGN_BYTES - 1))
You could easily make this adaptive to no align on th arch size when the
request is bigger than that, just like in the e1000 patch you posted.