netdev
[Top] [All Lists]

Re: [RFC] string matching based packet classification/filtering

To: Thomas Graf <tgraf@xxxxxxx>
Subject: Re: [RFC] string matching based packet classification/filtering
From: Pablo Neira <pablo@xxxxxxxxxxx>
Date: Thu, 17 Feb 2005 02:00:57 +0100
Cc: netdev@xxxxxxxxxxx, Netfilter Development Mailinglist <netfilter-devel@xxxxxxxxxxxxxxxxxxx>, Harald Welte <laforge@xxxxxxxxxxxx>
In-reply-to: <20050216223038.GQ31837@xxxxxxxxxxxxxx>
References: <20050215203211.GL31837@xxxxxxxxxxxxxx> <42126C9B.5060406@xxxxxxxxxxx> <20050216223038.GQ31837@xxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040413 Debian/1.6-5
Thomas Graf wrote:

Actually I wanted to contact Harald after this friday, once I'm done with my exams. I'd like to merge my work with his libqsearch hackings that were about to be finished.

Thinking about it, we can use the architecture to implement a regular
expression match. The pattern would consist of an array of tokens
in the form of:

struct regexp_token
{
        __u8 type;
        __u8 recur;
        __u8 value;
        __u8 unused;
};

Yes, this is a good idea. This could be useful to match things like interrogation marks or asterisks that are usually used as wildcard.

where type would be
 - specific character (must match `value')
 - wildcard
 - digit
 - xdigit
 - alpha ...

and recur
- 1
- 0..1
- 0..n
- 1..0

The matching algorithm would parse the array as a finite automation
eating up byte by byte in the text.


Looks good but Boyer-Moore algorithm doesn't need this at all. BM doesn't support wildcards like '*' because of the shiftings that it uses to look for matches. This issue together with memory consumption are two limitations that we have to live with if we want to use BM. Anyway I think it's worth it because we can perform search in O(n/m).

--
Pablo

<Prev in Thread] Current Thread [Next in Thread>