Took Linux kernel off the cc list.
On Mon, 19 May 2003, Ralph Doncaster wrote:
> When I looked at the route-cache code, efficient wasn't the word the came
> to mind. Whether the problem is in the route-cache or not, getting
> 100kpps out of a linux router with <= 1Ghz of CPU is not at all an easy
> task. I've tried 2.2 and 2.4 (up to 2.4.20) with 3c905CX cards, with and
> without NAPI, on a 750Mhz AMD. I've never reached 100kpps without
> userland (zebra) getting starved. I've even tried the e1000 with 2.4.20,
> and it still doesn't cut it (about 50% better performance than the 3Com).
> This is always with a full routing table (~110K routes).
>
I just tested a small userland apps which does some pseudo routing in
userland. With NAPI i am able to do 148Kpps without it same hardware,
about 32Kpps.
I cant test beyond 148Kpps because thats the max pps a 100Mbps card can
do. The point i am making is i dont see the user space starvation.
Granted this is not the same thing you are testing.
> If I actually had the time to do the code, I'd try dumping the route-cache
> altogether and keep the forwarding table as an r-tree (probably 2 levels
> of 2048 entries since average prefix size is /22). Frequently-used routes
> would lookup faster due to CPU cache hits. I'd have all the crap for
> source-based routing ifdef'd out when firewalling is not compiled in.
>
I think theres definete benefit to flow/dst cache as is. Modern routing
really should not be just about destination address lookup. Thats whats
practical today (as opposed to the 80s). I agree that we should be
flexible enough to not enforce that everybody use the complexity of
looking up via 5 tuples and maintaining flows at that level - if the
cache lookup is the bottleneck. Theres a recent patch that made it into
2.5.69 which resolves (or so it seems - havent tried it myself) the
cache bucket distribution. This was a major problem before.
The second level issue is on cache misses how fast can you lookup.
So far we are saying "fast enough". Someone needs to prove it is not.
> My next try will be with FreeBSD, using device polling and the e1000 cards
> (since it seems there are no polling patches for the 3c905CX under
> FreeBSD). From the description of how polling under FreeBSD works
> http://info.iet.unipi.it/~luigi/polling/
> vs NAPI under linux, polling sounds better due to the ability to configure
> the polling cycle and CPU load triggers. From the testing and reading
> I've done so far, NAPI doesn't seem to kick in until after 75-80% CPU
> load. With less than 25kpps coming into the box zebra seems to take
> almost 10x longer to bring up a session with full routes than it does with
> no packet load. Since CPU load before zebra becomes active is 70-75%, it
> would seem a lot of cycles is being wasted on context switching when zebra
> gets busy.
>
Not interested in BSD. When they can beat Linuxs numbers i'll be
interested.
> If there is a way to get the routing performance I'm looking for in Linux,
> I'd really like to know. I've been searching an asking for over a year
> now. When I initially talked to Jamal about it, he told me NAPI was the
> answer. It does help, but from my experience it's not the answer. I get
> the impression nobody involved in the code has has tested under real-world
> conditions. If that is, in fact, the problem then I can provide an ebgp
> multihop full feed and a synflood utility for stress testing. If the
> linux routing and ethernet driver code is improved so I can handle 50kpps
> of inbound regular traffic, a 50kpps random-source DOS, and still have 50%
> CPU left for Zebra then Cisco might have something to worry about...
>
I think we could do 50Kpps in a DOS environment.
We live in the same city. I may be able to spare half a weekend day and
meet up with you for some testing.
cheers,
jamal
|