Jeremy M. Guthrie writes:
> I actually upped the buffer count to 8192 buffers instead of 10k.
> Of the 74 samples I have thus far, 57 have been clean of errors.
> Most of the sample errors appear to be shortly after the cache flush.
I don't really believe in increasing RX buffers to this extent. We verified
that you have CPU available and the drops occur when the timer based GC
happens. Increasing buffers decreases overall performance and adds jitter.
We saw also the timed based GC were taking the dst-entries from about
600k to 40k in one shot. I think this what we should look into. Just
GC is "work" also after GC a lot flows has to be recreated doing fib
lookup and creating new entries. We want to smoothen the GC process so
happen more frequent and does less work.
Some time ago an "in-flow" GC (as opposed to timer based) was added to
the routing code look for cand in route.c. In setup like yours (and ours)
it would be better to relay on this process to a higher extent. Anyway
in /proc/sys/net/ipv4/route/ you have the files.
gc_elasticity, gc_interval, gc_thresh etc I would avoid gc_min_interval.
And you can play with your running system and for drops without causing
your users to much pain.
We save the patch for routing without route hash and GC until later,
--ro
|