On Tue, 12 Aug 2003, Jamie Lokier wrote:
> David S. Miller wrote:
> > On Tue, 12 Aug 2003 00:39:58 +0200
> > The ipv4 FIB hash tables tackle exactly this problem. The resulting
> > worse cast complexity is O(n_bits_in_prefix_mask).
> >
> > The problem you've described is exactly the same as trying to use
> > hashing for routing tables.
>
> You can do it in O(log(n_bits_in_prefix_mask)).
>
> This is achieved using binary search on the prefix lengths which
> appear in the table. (So if only a few prefix lengths are used, the
> tree is small).
>
> Illustrated by an example. The example assumes 32 bits total and
> every prefix length appears in the table. Each node in the search
> tree has one or more of these fields:
>
>  TABLE: Sparse (hash) or dense (direct) mapping of some
> subset of input bits to subtree nodes.
>  SHORTER: Subtree node, for matching shorter prefixes than the
> one which lead to the current node.
>  BEST: Best matching result given the prefix that lead to this node.
> This is either an exact match for that prefix, or the best
> shorterprefix match for it.
> (On the root node, this will be the default value).
>
> In the worst case of N bits in the prefix, and every prefix length is
> actually used, there would by N tree nodes, however the depth of the
> search tree is ceil(log2(N)).
>
> This is the lookup algorithm:
>
>  Start with the root tree node.
> Note that this isn't the shortest prefix matcher.
> It will match the "middle" prefix length, given your set of lengths.
> E.g. on a 32bit address, with rules of all prefix lenghts, the root
> tree node would match 16 bits.
> LOOP:
>  If NODE>TABLE exists, lookup the appropriate subset of input bits in it.
> For sparse tables, this involves a hash lookup.
> If a match is found, select that as the new tree node and goto LOOP.
>  Otherwise there is no table or no match in the table.
>  If NODE>SHORTER exists, select that as the new tree node and goto LOOP.
>  Otherwise, return NODE>BEST.
>
> This generalises to multiple dimensions e.g. for doing multiple
> prefixes on source+target + different combinations of other bits such
> as protocol, TOS etc.  i.e. arbitrary bitsubset classifiers. The
> basic principle and the algorithm are the same.
>
> In some cases it is worth reducing the tree size a little, e.g. if
> prefix lengths P and P+1 appear, you may as well just hash on P+1 bits
> and insert all the P prefix matches into the P+1 table, instead of
> having a P table. The cost is up to twice the number of entries in
> the P+1 table (depending on how dense your set of P+1 rules is), but
> this may be better than the cost of a extra hash lookup. Naturally,
> there are other optimisations too.
Most of this doesn't make sense to me. The part about merging
prefixlength tables is OK (i.e. 10.10.0.0/16 and 10.10.1.0/17 becomes
10.10.0.0/17 and 10.10.1.0/17).
However if you have 32 tables (one for each prefix length) I can't see any
possible way of avoiding a search in each table for the worst case. So if
my packet is destined to 217.109.118.70 you need to start looking for a
match for a /32 route, and if not found continue checking each prefix size
until you've reached /0.
Ralph
