> We dont wanna start changing the whole network stack just so that
> we can fit in VLANS, do we?
I don't think we need to. There may be some minor changes needed in
the 2.5 timeframe in order to efficiently support a lot of devices.
More on that later.
> based on VLANS; thats why building a table is so useful. Put your
> ACL on the "VLAN table"; someone makes a policy call that gets inserted
> into the VLAN table of the appropriate device (which in itself might be a
> simple pointer to a general filter database eg iptables).
> I dont have a problem with extending things like dst cache to have a
> pointer to some VLAN entry
OK, so we can hack on filters. For extra credit, imagine the same
scenario I described, but now the linux router is running gated to
thake OSPF from the two VLANs... how is that going to work if userland
sees only one network device?
As to the difficulty of working with many interfaces - I did a quick
survey of all the code in net/ that looks things up in the dev_base
list. Here's the executive summary - if anyone wants more details
I can type up more stuff from my notes.
The main user is of course core/dev.c - it provides (currently linear)
seach functions to retrieve a struct net_device by name or index.
(there's also a search by hwaddr, but that is only used if you run
"/sbin/arp -D hostname hw_addr pub" apparently... not a big deal)
These searches could be expedited by using a tree or hash, which
would solve the bulk of the problem.
The one tricky part is dev_alloc_name() function... it's current
algorithm is to search for "prefix0" through "prefix99" and if
they're all in use bail out. Not only does this mean that you
won't naturally end up with "ppp100", the search is N^2 (meaning
the time to set up N of them is N^3). The best algorithm compromise
isn't clear - I'd say that for each prefix in use ("ppp", "eth") keep
an ordered linked list of them, that way you can quickly linearly
scan looking for the lowest hole. An added optomization would be
to hold a "next_to_try_after" pointer into the list. (after adding
an interface, set to self. When deleting an interface, set to
our previous one if it's before next_to_try_after) which greatly
reduces the length of the search for many common access patterns.
This sounds complicated, but would actually be pretty easy to implement
using linux/list.h - just keep the next_to_try_after as the first
element and just rotate the list around it as it changes.
The tricker part is where the IP stack searches through the list
ipv6/addrconf.c/ipv6_get_saddr())... I assume these are just used
for corner cases, right? I assume that things on the fast path
(selecting source IPs for outgoing packets is taken care by the
routing code, right?!?)
Other protocols do things based on searching the list of all the
devices (decnet, netrom, rose)... if that is a big problem they
could easily maintain a list of devices that they're operating on.
The other places that look at all the devices are things like
/proc and netlink code where it explictly needs to operate on
Anyway, assuming that those code paths in ipv4 and ipv6 aren't
hotspots, some basic datastructure work in net/core/dev.c would
remove the algorithmic limitations on hosting thousands of devices.