On Tue, 15 Jul 2003 16:01:55 -0700
David Mosberger <davidm@xxxxxxxxxxxxxxxxx> wrote:
> >>>>> On Mon, 14 Jul 2003 22:38:22 -0700, "David S. Miller"
> >>>>> <davem@xxxxxxxxxx> said:
>
> DaveM> But I don't think that's what is happening here, rather the
> DaveM> PCI controller is "talking" to the CPU's L2 cache with
> DaveM> coherency transactions on all the data of every packet going
> DaveM> to the chip.
>
> That's true. But shouldn't it be true for both the TSO and non-TSO
> case?
The transfers are each longer in the TSO case, so need more
to transfer more data from the bus just to get _one_ of
the sub-packets of the large TSO frame out. It thus makes it
more likely they'll be a delay.
> DaveM> I know how this can be fixed, can you use L2-bypassing stores
> DaveM> in your csum_and_copy_from_user() and copy_from_user()
> DaveM> implementations like we do on sparc64? That would exactly
> DaveM> eliminate this situation where the card is talking to the
> DaveM> cpu's L2 cache for all the data during the PCI DMA transation
> DaveM> on the send side.
>
> We could, but would it always be a win? Especially for
> copy_from_user(). Most of the time, that data remains cached, so I
> don't think we'd want to use non-temporal stores on those (in
> general). csum_and_copy_from_user() isn't well optimized yet. Let's
> see if I can find a volunteer... ;-)
No, I mean "bypass L2 cache on miss" for stores. Don't
tell me IA64 doesn't have that? 8) I certainly didn't mean
"always bypass L2 cache" for stores :-)
|