[Top] [All Lists]

Re: [RFC] netif_rx: receive path optimization

To: netdev <netdev@xxxxxxxxxxx>
Subject: Re: [RFC] netif_rx: receive path optimization
From: Rick Jones <rick.jones2@xxxxxx>
Date: Thu, 31 Mar 2005 14:42:36 -0800
In-reply-to: <1112305084.1073.94.camel@jzny.localdomain>
References: <> <> <1112303431.1073.67.camel@jzny.localdomain> <> <1112305084.1073.94.camel@jzny.localdomain>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; HP-UX 9000/785; en-US; rv:1.6) Gecko/20040304
At the risk of again chewing on my toes (yum), if multiple CPUs are pulling packets from the per-device queue there will be packet reordering.

;-> This happens already _today_ on Linux on non-NAPI.

Take the following scenario in non-NAPI. -packet 1 arrives -interupt happens, NIC bound to CPU0
- in the meantime packets 2,3 arrive
- 3 packets put on queue for CPU0
- interupt processing done

- packet 4 arrives, interupt, CPU1 is bound to NIC
- in the meantime packets 5,6 arrive
- CPU1 backlog queue used.
- interupt processing done

Assume CPU0 is overloaded with other systenm work and CPU1 rx processing
kicks in first ... TCP sees packet 4, 5, 6 before 1, 2, 3 ..

I "never" see that because I always bind a NIC to a specific CPU :) Just about every networking-intensive benchmark report I've seen has done the same.

Note Linux is quiet resilient to reordering compared to other OSes (as
you may know) but avoiding this is a better approach - hence my
suggestion to use NAPI when you want to do serious TCP.

Would the same apply to NIC->CPU interrupt assignments? That is, bind the NIC to a single CPU.

HP-UX 10.0 did just that and it was quite nasty even at low CPU counts (<=4). It was changed by HP-UX 10.20 (ca 1995) to per-CPU queues with queue selection computed from packet headers (hash the IP and TCP/UDP header to pick a CPU) It was called IPS for Inbound Packet Scheduling. 11.0 (ca 1998) later changed that to "find where the connection last ran and queue to that CPU" That was called TOPS - Thread Optimized Packet Scheduling.

Dont think we can do that unfortunately: We are screwed by the APIC
architecture on x86.

The IPS and TOPS stuff was/is post-NIC-interrupt. Low-level driver processing still happened/s on a specific CPU, it is the higher-level processing which is done on another CPU. The idea - with TOPS at least, is to try to access the ULP (TCP, UDP etc) structures on the same CPU as last accessed by the app to minimize that cache to cache migration.

rick jones

<Prev in Thread] Current Thread [Next in Thread>