* jamal <1115296937.7680.52.camel@xxxxxxxxxxxxxxxxxxxxx> 2005-05-05 08:42
> How is this different from libqsearch? IIRC, it also kept pointers and
> callbacks.
The main difference is that my infrastructure is much simpler.
> BTW, I hope theres sync with libqsearch - at least some canibalization
> of ideas.
I read the libqsearch code but fixing up all the issues required
for my use would have taken up more time than writing up something new.
> Also hopefully, pluggin of ne algorithms is trivial (e.g boyer-moore
> could be included in addition to kmp etc)
Very simple, ts_kmp.c is 108 lines whereas simple.c in libqsearch
is over 500.
> I have a lot of questions:
> - does a string have to be terminated by \0?
All strings are length terminated.
> - do you keep state of the string from the begining? ex: how do you know
> that preceeding "hanky" was "Need a"?
I store the shift of a match in struct ts_state, add the length of
the pattern in the call to textsearch_next() and provide this shift
as offset to the get_text() callback.
> - all sorts of limits: how long is the string? etc
Both the length of a pattern and the length of a text block is
limited by INT_MAX. However, since there can be an arbitary number
of blocks there is no limit in the text length. Depending on the
search algorithm there might be more limits, for example kmp uses
unsigned int for each prefix table entry so this limits the length
of the pattern as well (theoretically).
> - what happens if a string spans multiple skbs or even multiple
> fragments?
This was the single most important requirement. Actually you can
compose the text out of anything. I already wrote some code to
handle paged skbs and multiple skbs can be implemented the same
way. You could for example store a ts_state per conntrack and
call textsearch_next continuously until you find a match. Would
require some magic in the get_text() but shouldn't be too hard.
>
> > You might wonder about the 1 given to _prepare(), it indicates whether
> > to autoload modules because the ematches will need it to be able to drop
> > rtnl sem.
> >
>
> do you really wanna leave that decision upto the user?
We have to, otherwise we can't use it in ematches without the risk
of deadlocks. There is no way around than have the caller drop its
locks and call prepare again with no locks held, at least I'm not
aware of one.
> It would be nice to have other utilities which could be loaded eg; case
> compare, regualr expressions, strchr after you match, etc
Indeed, I'm working on the regular expression thing but it has
some issues with textsearch_next() for patterns like .+abc.+ which
I want to resolve first.
|