netdev
[Top] [All Lists]

Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: [ANNOUNCE] NF-HIPAC: High Performance Packet Classification
From: Roberto Nibali <ratz@xxxxxxxxxxxx>
Date: Thu, 26 Sep 2002 22:49:14 +0200
Cc: "David S. Miller" <davem@xxxxxxxxxx>, niv@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, jamal <hadi@xxxxxxxxxx>, netdev <netdev@xxxxxxxxxxx>
References: <3D924F9D.C2DCF56A@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20020925.170336.77023245.davem@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <p73n0q5sib2.fsf@xxxxxxxxxxxxxxxx> <20020925.172931.115908839.davem@xxxxxxxxxx> <3D92CCC5.5000206@xxxxxxxxxxxx> <20020926140430.E14485@xxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.1) Gecko/20020826
For iptables/ipchain you need to write hierarchical/port range rules in this case and try to terminate searchs early.

We're still trying to find the correct mathematical functions to do this. Trust me, it is not so easy, the mapping of the port matrix and the network flow through many stacked packet filters and firewalls generates a rather complex graph (partly bigraph (LVS-DR for example)) which has complex structures (redundancy and parallelisations). It's not that we could sit down and implement a fw-script for our packet filters, the fw-script is being generated through a meta-fw layer that knows about the surrounding network nodes.

But yes, we also found that the L2 cache is limiting here
(ip_conntrack has the same problem)

I think this weekend I will do my tests also measuring some cpu performance counters with oprofile, such as DATA_READ_MISS, CODE CACHE MISS and NONCACHEABLE_MEMORY_READS.

At least  that is easily fixed. Just increase the LOG_BUF_LEN parameter
in kernel/printk.c

Tests showed that this only helps in peak situations, I think we should simply forget about printk().

Alternatively don't use slow printk, but nfnetlink to report bad packets
and print from user space. That should scale much better.

Yes and there are a few things that my collegue found out during his tests (actually pretty straight forward things):

1. A big log buffer is only useful to come by peaks
2. A big log buffer while having high CPU load doesn't help at all
3. The smaller the message, the better (binary logging thus is an
   advantage)
4. The logging via printk() is extremely expensive, because of the
   conversions and whatnot. A rough estimate would be 12500 clock
   cycles for a log entry generated by printk(). This means that on a
   PIII/450 a log entry needs 0.000028s and this again leads to
   following observation: Having 36000pps which should all be logged,
   you will end up with a system having 100% CPU load and being 0% idle.
5. The kernel should log a binary stream, also the daemon that needs to
   fetch the data. If you want to convert the binary to human readable
   format, you start a process with low prio or do it on-demand.
6. Ideally the log daemon should be preemtible to get a defined time
   slice to do its job.

Some test results conducted by a coworker of mine (Achim Gsell):

Max pkt rate the system can log without losing more then 1% of the messages:
----------------------------------------------------------------------------


kernel:         Linux 2.4.19-gentoo-r7 (low latency scheduling)

daemon:         syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                2873pkt/s       3332pkt/s       3124pkt/s       3067pkt/s
                1.4 Mb/s        6.6Mb/s         12.2Mb/s        23.9Mb/s

daemon:         syslog-ng (nice 0), logbufsiz=16k, pkts=10*10000, CPU=PIVM/1.7
packet-len:     64              256             512             1024

                7808pkt/s       7807pkt/s       7806pkt/s           pkt/s
                3.8 Mb/s        15.2Mb/s        30.5Mb/s            Mb/s

----------------------------------------------------------------------------------------------------------

daemon: cat /proc/kmsg > kernlog, logbufsiz=16k, pkts=10*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                4300pkt/s                                       3076pkt/s
                2.1 Mb/s                                        24.0Mb/s

daemon:         ulogd (nlbufsize=4k, qthreshold=1), pkts=10*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                4097pkt/s                                       4097pkt/s
                2.0 Mb/s                                        32  Mb/s

daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=1), pkts=10*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                6576pkt/s                                       5000pkt/s
                3.2 Mb/s                                        38  Mb/s

daemon:         ulogd (nlbufsize=64k, qthreshold=1), pkts=1*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                                                                    pkt/s
                                                                4.0 Mb/s

daemon: ulogd (nlbufsize=2^17 - 1, qthreshold=50), pkts=10*10000, CPU=PIII/450
packet-len:     64              256             512             1024

                6170pkt/s                                       5000pkt/s
                3.0 Mb/s                                        38  Mb/s


Best regards,
Roberto Nibali, ratz
--
echo '[q]sa[ln0=aln256%Pln256/snlbx]sb3135071790101768542287578439snlbxq'|dc


<Prev in Thread] Current Thread [Next in Thread>