oprofile on AMD64 can gather lots of data, DATA_CACHE_MISSES for example...
But I think I know what happens...
nm -v /usr/src/linux/vmlinux | grep -5 rt_cache_stat
ffffffff804c6a80 b rover.5
ffffffff804c6a88 b last_gc.2
ffffffff804c6a90 b rover.3
ffffffff804c6a94 b equilibrium.4
ffffffff804c6a98 b ip_fallback_id.7
ffffffff804c6aa0 B rt_cache_stat
ffffffff804c6aa8 b ip_rt_max_size
ffffffff804c6aac b ip_rt_debug
ffffffff804c6ab0 b rt_deadline
So rt_cache_stat (which is a read only pointer) is in the middle of a
hot cache line (some parts of it are written over and over), that
probably ping pong between CPUS.
Time to provide a patch to carefully place all the static data from
net/ipv4/route.c into 2 parts : mostly readonly, and others... :)