netdev
[Top] [All Lists]

Linux Policy Routing-Based IDS Load Balancer HOWTO

To: netdev@xxxxxxxxxxx
Subject: Linux Policy Routing-Based IDS Load Balancer HOWTO
From: "Jeremy M. Guthrie" <jeremy.guthrie@xxxxxxxxxx>
Date: Tue, 26 Jul 2005 09:46:36 -0500
Organization: Berbee Information Networks
Reply-to: jeremy.guthrie@xxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: KMail/1.8.1
Linux Policy Routing-Based IDS Load Balancer HOWTO

What you say?  Well, some of you may have remember in early January of this 
year I was spamming the list pretty hard trying to get a host to route quite 
a bit of data.  Turns out it was for an Linux based IDS load balancer.  I am 
80-90% done and I'd like to get your feed back as to how bad I've horked 
this.  Specifically are there descriptions for things like gc_interval and 
the like that are so technically wrong I should be slapped with a wet 
penguin?

Obvious things I have not completed yet:
1.  spell checker, that will come after the list has beaten this up
2.  formatting, I gave up on general formatting till the very end
3.  the TOC isn't finished yet

If you could please, review the tuning/performance enhancing sections.

BTW, my first howto....

ids-load-balancing-HOWTO

Jeremy M. Guthrie

 jeremy.guthrie@xxxxxxxxxx

February 2004

Version 1.0
 Jeremy M. Guthrie
 2-5-2005

-----------------------------------------------------------------------------
Table of Contents

0. Credit and Notes About the Author
1. Terms
1. The Example Problem
2. Possible Solutions
3. The Linux Answer
    3.1. Further Example Details
    3.2. Needed Software Updates & Other Requirements
    3.3. Notes About Limitations
    3.4. It's All About the 'IP'
    3.5. Other assumptions
    3.6. L2CC vs EBTables & L2-NAT
    3.6.1 L2CC Config
    3.6.2 EBTables & L2-NAT
    3.7. The Policy Router Config
4. Performance Tuning
5. Copyright
    5.1. GNU Free Documentation License
    5.2. PREAMBLE
    5.3. APPLICABILITY AND DEFINITIONS
    5.4. VERBATIM COPYING
    5.5. COPYING IN QUANTITY
    5.6. MODIFICATIONS
    5.7. COMBINING DOCUMENTS
    5.8. COLLECTIONS OF DOCUMENTS
    5.9. AGGREGATION WITH INDEPENDENT WORKS
    5.10. TRANSLATION
    5.11. TERMINATION
    5.12. FUTURE REVISIONS OF THIS LICENSE
    5.13. How to use this License for your documents
   
-----------------------------------------------------------------------------

0.  Credit and Notes About the Author

  First and foremost this document would not be possible if it were not for 
Jim Leu, Robert Olsson, Stephen Hemminger, David Miller, Jesse Brandeburg, 
Chris Gerg, and Jon Vanderhill.  All of these people were instrumental in 
providing code, feedback, resources, or other critical input to help get this 
document and system where it is today.
 
  I myself am a network engineer.  I have been building or managing networks 
since 1992.  I have been a Linux advocate since late 1995.  It is my primary 
OS for what I do.  I am not a kernel developer though I have written other 
software packages.  I make no claim that I am the final authority on systems 
I reference within.  I am providing descriptions of the bits and pieces based 
on existing documentation or discussions I have had with other persons.  IOW, 
if you see a blatant error or a feature omission, it is not on purpose and 
would love the feedback to correct this documentation.
  
-----------------------------------------------------------------------------

1.  Terms

  EBTables:  A Layer-2 filtering technology in the Linux V2.6 kernel.

  Linear Search:  The process of one-by-one searching through the hash 
collision buckets.

  L2CC:  Layer Two Cross Connect:  A software patch created by Jim Leu that
         takes all input frames from an ethernet controller and NATs the 
         destination MAC address to be that of some host on an outbound 
  port.
         
  L2XC:  Another name for the L2CC software.

  Innovation:  "the introduction of something new" as defined by 
               Merriam-Webster Online(www.m-w.com).
    
  MAC-Munger:  Another name for the L2CC software.

  Policy-Router:  A router which routes because of a dictated policy rather
         than by normal IPV4 per-hop-behavior.  ie.  route all port 80
         out my DSL link, all SMTP traffic out my cable link vs. default 
         route traffic via my cable.

  SPAN:  Switch port analyzer.  Think of it as a port which mirrors traffic
         off of a VLAN or switchport out another port.  ie.  mirror all
         all traffic going in/out of the Firewall's inside interface to the
         SPAN port.

-----------------------------------------------------------------------------

1.  The Example Problem

  You are a large business with a lot of Internet bandwidth.  Your IDS is
  capable of dealing with some subset of your traffic volume but it cannot
  handle it by itself.  It also would cost too much money to buy a bigger,
  faster, IDS.  You want to buy the same model as you already have.  The two
  IDSes together can handle the volume but now you have to figure out how to
  divide the traffic up between the two.

  Sounds simple doesn't it?  

  Example bandwidth, 800 megabit per second.  Assume you are using Cisco IDS
  4255 where each IDS lists at $25,000 a piece.  A 4255 is supposed to be
  capable of 600mbps.  Remember, you want the IDS to see all of the data in
  a flow, not just one half!  So you cannot just break up the flow without
  some thought. 

-----------------------------------------------------------------------------
  
2.  Possible Solutions

   F5, Radware, and Top Layer networks all make boxes that will split data
   appart.  All of these are commercial products that help load balance
   flows for IDSs or traffic management.  They are not cheap and in some cases
   are budget busters.
   
   The example network hardware I reference in here will be Cisco platforms
   as that is what I am most familiar with.

-----------------------------------------------------------------------------
   
3.  The Linux Answer

   The good news is that Linux has an answer with L2CC/EBTables and Policy 
Routing.  The two main components run in the kernel so they operate securely.  
I will detail scaling the solution upward as it should be able to meet the 
needs of most any bandwidth.

   The Linux solution identified herein has been dubbed by Chris Gerg as the 
(I(DS)^2).  However I will to  the whole system as the IDS load balancer here 
after.

   The concept is simple:
                                (or EBTables)
   +---------------+                +---+
   | Catalyst 6509 |-[1000SX-SPAN]->| l |           +---------------+
   +---------------+                | 2 |-[1000SX]->| Policy Router |
   +---------------+                | c |           +-+-----+-------+
   | Catalyst 6509 |-[1000SX-SPAN]->| c |                   |
   +---------------+                +---+                   |[1000SX]
                                                            v
                                                  +-------------------+
                                                  | Catalyst 3750 Gig |
                                                  +---+------------+--+
                                                      |            |
                                                      |<--1000TX-->|
                                                  +---+---+        |
                                                  | IDS A |        |
                                                  +-------+        |
                                                               +---+---+
                                                               | IDS A |
                                                               +-------+

   There is a highly available network with two enterprise class switches 
front-ending the network.  Each switch hands off a SPAN port to the Layer Two 
Cross Connect(l2cc/l2xc) host AKA EBTables Host.  The SPAN data is an 
identical copy of the data from the ports they mirror.  
   
   If the destination MAC address of an example packet is 05:05:05:03:03:03, 
then we to NAT this to an address of the policy router's outside interface 
MAC address.  Why?  Normal routers will not route data unless the layer two 
destination address matches their own MAC address.  The l2cc host 
changes/L2-NATs the destination MAC address to be that of the Policy Router's 
'outside' interface.  When the data arrives at the policy router, it looks in 
its ip routing ruleset to determine what 'table' to use to forward the 
incoming packets.  Once a table is found and selected, that table is then 
used to define the per-hop-behavior at that point in time for that packet.

   3.1. Further Example Details
   
   Time to add yet another layer of detail to the example.  We will assume we 
want to run Snort or NTOP against all traffic coming through our network.  In 
this example we will assume that we have 1.0.0.0/16 assigned to us.  We also 
know that we have a pretty good distribution of traffic such that 1.0.0.0/17 
gets about 350mbps and 1.0.128.0/17 gets about 450 mbps.
   
   What we also know is that because of the network that traffic to/from 
both /17 subnets comes in over both SPANs but is NOT duplicate traffic.  This 
can because of things like per-packet or per-flow routing decisions made by 
downstream equipment.
   
   3.2. Needed Software Updates & Other Requirements
   
   You will want to make sure you have the latest tools for the job.  I have 
only ever worked this solution using Intel Gig NICs.  Others should work but 
I have never tested them.
   
   Software to get:
   1. Latest Intel E1000 drivers
   http://sourceforge.net/projects/e1000/
   
   2.  Latest IPRoute2 utilities:
   http://www.policyrouting.org/

   3.  Latest L2CC software:
   http://mpls-linux.sf.net/
   
   4.  A V2.4 & V2.6 Linux Kernel:
   http://www.kernel.org/
   L2CC requires V2.4 and you should use V2.6 for your policy router.
   
   5.  Schedutils
   http://tech9.net/rml/schedutils/

   6.  EBTables
   http://ebtables.sourceforge.net/
   
   As for other requirements....

   I HIGHLY recommend a multi-CPU box.  Hyperthread has shown advantages and 
works well in our implementation.  I typically assign one CPU per NIC on 
either the EBTables box and/or Policy Router.  Your mileage will vary but use 
common sense.  If you have a dual 3.2 ghz P4 w/ Hyperthreading then that will 
have enough horse power to handle large data volumes.  In some cases you 
could easily assign more than one NIC to a CPU.

   3.3. Notes About Limitations
   
   Every piece of hardware or software has limits.  Keep this in mind when 
deploying your system.  I will show you where counters are instrumented and 
generically how to manage them.  Like any good system administrator, you will 
have to manage your system.
   
   3.4. It's All About the 'IP'
   
   ifconfig, netstat, and other old-style utilities are being phased out 
slowly.  Familiarize yourself with the 'ip' utility from the iproute2 package 
as it replaces the prior listed programs.  The iproute2 package includes 
other programs that will help in providing access to critical performance 
information.  'lnstat' and 'rtstat' can be used to poll routing performance 
information from your kernel DEPENDING on which kernel release you are 
running.    
   
   3.5. Other assumptions
   
   Going forward it will be assumed that you have the appropriate kernels 
and/or features installed unless we talk explicitly about a feature.
   
   3.6 L2CC vs EBTables & L2-NAT

   L2CC was the only option available when this document was first written but 
now EBTables is available.  EBTables was beaten up a quite bit in the May 
2005 Networld+Interop in Vegas.  EBTables proved to be a very reliable bit of 
software.  L2CC has more burn in time for my organization but we are in the 
process of migrating.  This document will provide examples for both L2CC & 
EBTables.

   3.6.1 L2CC Config
   
   The example L2CC host has three Gig NICs.  eth0-1 are gathering SPAN data 
while eth2 is the output port.  The Policy router's eth0 MAC address is 
01:01:01:10:10:10.  
   
   The L2CC host will need the L2CC patch applied against the V2.4 kernel.  
From there do the following:
   make menuconfig
   Select "Networking Options"
   Compile in "Layer 2 Cross Connectr (EXPERIMENTAL)", do not build as a 
module.
   Rebuild your kernel and reboot.
   
   #add two entries, one for each NIC
   l2cc -a -i eth0 -o eth2 -m 01:01:01:10:10:10
   l2cc -a -i eth1 -o eth2 -m 01:01:01:10:10:10
   
   #delete two entries, one for each NIC
   l2cc -d -i eth0 -o eth2 -m 01:01:01:10:10:10
   l2cc -d -i eth1 -o eth2 -m 01:01:01:10:10:10
   
   *WARNING* Test that your configuration is working using TCPDump in an 
ISOLATED network.  

   In the above example, I should be able to run three instances of TCPDump 
and see that data coming in eth0 & eth1 is having its MAC address NAT'd when 
being transmitted out eth2.

   3.6.2 EBTables & L2-NAT

   The example EBTables host has three Gig NICs.  eth0-1 are gathering SPAN 
data while eth2 is the output port.  The Policy router's eth0 MAC address is 
01:01:01:10:10:10.  

   EBTables will require a Linux host running with a V2.6 kernel.

   To build your Linux kernel with EBTables support:
   make menuconfig
   Select "Device Drivers"
   Select "Networking Support"   
   Select "Networking Options"
   Compile in "802.1d Ethernet Bridging"
   Select "Network packet filtering"
   Select "Bridge: Netfilter Configuration"
   Select "Ethernet Bridge tables (ebtables) support"
   Select "ebt: nat table support"
   Select "ebt: dnat target support"
   Select "ebt: snat target support"
   Rebuild your kernel and reboot.

   #prep your interfaces
   ifconfig eth0 up
   ifconfig eth1 up
   ifconfig eth2 up

   #create the br0 interface
   brctl addbr br0

   #turn off spanning-tree
   brctl stp br0 off

   #add interfaces to the br0 broadcast domain
   brctl addif br0 eth0
   brctl addif br0 eth1
   brctl addif br0 eth2

   #Prep ebtables
   ebtables -F INPUT
   ebtables -F OUTPUT
   ebtables -F FORWARD
   ebtables -t nat -F PREROUTING
   
   #NAT all incoming data on eth0 to 00:11:25:8c:8c:37
   ebtables -t nat -A PREROUTING -I eth0 -j dnat -to-destination 
00:11:25:8c:8c:37
   #NAT all incoming data on eth1 to 00:11:25:8c:8c:37
   ebtables -t nat -A PREROUTING -I eth1 -j dnat -to-destination 
00:11:25:8c:8c:37

   #Tell EBTables to route/bridge data destined to 00:11:25:8c:8c:37 out eth2
   ebtables -A OUTPUT -o eth2 -d 00:11:25:8c:8c:37 -j ACCEPT  

   *WARNING* Test that your configuration is working using TCPDump in an 
ISOLATED network.  

   In the above example, I should be able to run three instances of TCPDump 
and see that data coming in eth0 & eth1 is having its MAC address NAT'd when 
being transmitted out eth2.
   
   3.7.  The Policy Router Config
   
   I am going to assume you have the appropriate iproute2 package for your 
kernel.  You may need a newer version of the iproute2 utilities.  
   ie. V2.6.9 kernel   
   strace -f rtstat ...
   --snip--
   open("/proc/net/rt_cache_stat", O_RDONLY) = -1 ENOENT (No such file or 
directory)
   --snip--
   In this case I would need the latest iproute2 code to use 
the  /proc/net/stat/rt_cache instead.  In fact, rtstat may also be called 
'lnstat'.
   
   *ASSUMPTION* This policy router config assumes there are no overlapping 
subnets.  Overlapping subnets mean you must order your policy rules 
appropriately for them to work.  Overlapping IP examples: 192.168.0.0/16 & 
192.168.0.0/24
   
   The policy router is made up of several components.  The policy router uses 
rules and tables.  Rules are used to classify which traffic belongs to which 
table.  This is where the policy in policy routing comes from.  You define 
what policies you want to implement.
   
   Tables are routing tables used to device the per-hop-behavior for the 
packet being routed via that table.  Shortly you will see how the combination 
of rules with tables are combined to split traffic.
   
   In our example, we want to split traffic of two /17s to the two sensors.  
With that in mind we will add rules to do the actual policy mapping.
   
   Eth0 is our input device while eth1 is our output device.  Eth0 will have 
an IP address of 10.0.0.1/32.  An IP address is required otherwise the Linux 
kernel will not policy-route for the interface.  Eth1 will have an IP address 
of 10.0.1.1/24.  Sensor 1 will have an IP address of 10.0.1.10, Sensor 2 will 
have an IP address of 10.0.1.11.
   
   Routing at high speed means any hiccup, EVEN SMALL ones, result in lost 
packets as recieve rings for network cards can be overrun quickly.  Thus we 
have to minimize any hiccups.  One hiccup is ARP.  Sensors NEVER transmit and 
we will always have data to send to them.  You will see several changes we 
will make to account for this.
   
   #First, turn off IP forwarding before we configure routing
   echo 0 > /proc/sys/net/ipv4/ip_forward
   
   #Add static ARP entries as we should ALWAYS know what MAC
   #address to associate with our Sensor IPS
   arp -s 10.0.1.10 00:02:50:98:DC:1C
   arp -s 10.0.1.11 00:02:50:A1:5D:5A
   
   #send any traffic to/from 1.0.0.0/17 to table 15
   ip rule add type unicast dev eth0 from 1.0.0.0/17 table 15
   ip rule add type unicast dev eth0 to 1.0.0.0/17 table 15
   
   #send any traffic to/from 1.0.128.0/17 to table 16
   ip rule add type unicast dev eth0 from 1.0.128.0/17 table 16
   ip rule add type unicast dev eth0 to 1.0.128.0/17 table 16
   
   #Tell policy routing code that the only path in table 15 is via Sensor 1!
   ip route add default via 10.1.0.10 dev eth1 table 15
   
   #Tell policy routing code that the only path in table 16 is via Sensor 2!
   ip route add default via 10.1.0.11 dev eth1 table 16
   
   #When done with our changes, flush the cache
   ip route flush cache
   
   #Lastly, turn on IP forwarding
   echo 1 > /proc/sys/net/ipv4/ip_forward
   
   This is all you actually 'have to do' to enable a working system.  There 
are however other changes you SHOULD make to help with performance.
   
   For one, if you left IP forwarding on and blew away the policy routing 
table, all traffic would then follow your normal default route on the 
box!!!!!!  
   
   *WARNING*
   Let's complicate our policy router by adding another interface, 
eth3->192.168.10.5 with a default gw to a firewall so we can remotely manage 
the system.  In our example, we had 800mbps heading towards our policy 
router.  If we turn off policy routing but left IP forwarding on, the Linux 
host will try to forward >>>>800mbps<<<< of traffic towards the firewall on 
the 192.168.10.0/24 network thereby killing it.  Okay?! Follow?  If not, 
re-read till you do.  
   
   Hint:  If your firewall has a 100mbps interface and you fire 800mbps at it, 
the firewall will stop working because you will overload its interface with 
bandwidth.
   
   There are then two ways to protect yourself:
   A)  Always turn off IP forwarding before making ANY changes to the policy 
router. 
   B)  Turn on iptables filtering.  Here is a quick an dirty example of an 
IPtables filter to apply to this example host:
   
   #The only data that will be allowed in or out of eth3 will be traffic
   #to or from 192.168.10.5.  So even if you accidentally leave IP forwarding
   #on, you can trust that IP Tables it stemming the flow from burrying your
   #gateway/firewall.
   iptables -F
   iptables -A FORWARD -s 192.168.10.5 -o eth3 -j ACCEPT
   iptables -A FORWARD -o eth0 -j DROP
   iptables -A OUTPUT -s 192.168.10.5 -o eth3 -j ACCEPT
   iptables -A OUTPUT -o eth0 -j DROP
   
   The switches will need some adjustments to make sure that the switch knows 
exactly where the sensors are.  If the sensors never transmit data on their 
ports then the switches turn act as hubs which is exactly what we don't want.
   
   #3750 config:
   interface GigabitEthernet1/0/1
   description Eth0 of Sensor 1
   switchport access vlan 100
   switchport trunk native vlan 100
   switchport trunk allowed vlan none
   switchport mode access
   switchport nonegotiate
   load-interval 30
   no cdp enable
   !
   interface GigabitEthernet1/0/2
   description Eth0 of Sensor 2
   switchport access vlan 100
   switchport trunk native vlan 100
   switchport trunk allowed vlan none
   switchport mode access
   switchport nonegotiate
   load-interval 30
   no cdp enable
   !
   interface GigabitEthernet1/0/28
   description Eth1 of Policy Router
   switchport access vlan 100
   switchport trunk native vlan 100
   switchport trunk allowed vlan none
   switchport mode access
   switchport nonegotiate
   load-interval 30
   no cdp enable
   !
   mac-address-table static 0002.5098.DC1C vlan 100 interface 
GigabitEthernet1/0/1
   mac-address-table static 0002.50A1.5D5A vlan 100 interface 
GigabitEthernet1/0/2 
   
   That's it?  Right?  Well, sure.  If your system is blazing fast and does 
not require any tuning.  This assumes your system defaults are adequate.  
That may not be the case.  The rest of this document will be dedicated to 
discussing eliminating the bottlenecks.
   
-----------------------------------------------------------------------------

4. Performance Tuning

   Jumping back to a prior statement, you need to be aware of the limits of 
the policy routing system.  I will list them out here and we will discuss how 
to go about addressing them.  Some are easy to update, others are what they 
are.
   
4.1. General Limits
   
   Kernel Limits:
   Max # of Policy Routing Rules:  32768 - some are reserved
   Max # of Policy Routing Tables:  256 - some are reserved
   Max # of Interfaces in V2.6 Kernel:  4096
   Max # of Interfaces in V2.4 Kernel:  256
   
   IP Route Hash Limits:
   
   Card Limits:
   Intel EtherExpress RX/TX Buffer Count:  256 packets
    Max RX/TX Buffer Count:  4096 packets

4.2.  Instrumentation

   Knowing that any system is running well can take a bit to figure out.  
There will be a few places that we concentrate on monitoring.  Memory, CPU 
utilization, interrupt distribution, network stack drops, network card drops, 
# of routes, garbage collection, routing packets per second, # of hash 
entries, and others.
 
4.2.1.  rtstat or lnstat in the land of iproute2

   Rtstat and lnstat are two tools used to watch routing activity within the 
Linux kernel.  Kernels prior to approximately V2.6.9 will use rtstat.  After 
2.6.9 iproute2 uses lnstat to gather routing detail.  You can tell if your 
kernel works with rtstat or lnstat by trying the existing install of rtstat.  
Both tools pull data from /proc, it is a question of which one.
   
   ie.
   [plato jguthrie 10:29am]~-> rtstat
   fopen: No such file or directory
   
4.2.2  Important /proc files for your reference.

   #Data on each route cache entries/hashes
   /proc/net/rt_cache
   
   #Aggregate statistics on route cache entries/hashes
   /proc/net/stat/rt_cache
   
   #General information about the network stack on a per-cpu basis
   /proc/net/softnet_stat

   We will examine each of the files to examine what they can tell us.

4.2.2.1.  /proc/net/softnet_stat

   cat /proc/net/softnet_stat
   00000130 00000034 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
   00000150 00000000 00000000 00000000 00000000 00000000 00000000 00000000 
00000000
   
   The basic idea behind softnet_stat is that we use it as a way to tell us if 
the Kernel itself is dropping packets.  The second field in the list of nine 
is the packet drop count.  You can see in this example that we have dropped 
0x34 packets.  Look into using NAPI or other network card features to 
possibly relieve CPU overhead.

4.2.3  CPUs, Memory, and Interrupts & the Rules To Follow

   As with any rules here, all can be affected by your budget.

   1.  Assume that each CPU on your box will be handling only one NIC's 
interrupts.
   2.  Assume that you will be using 'taskset' to keep non-kernel routing 
functions assigned to a specific CPU.
   3.  EBTables uses little memory, you can skimp here
   4.  Policy Routing Chews Memory, you cannot skimp here - minimum 1 Gigabyt 
of RAM ***HIGHLY*** recommended

4.2.3.1  CPUs should only following one NIC.

   If you look at the output below you can see that CPU0 is taking the 
interrupts for eth3.  CPU1 is taking interrupts for eth2 & eth0.  Optimising 
any system relies on keep thrashing to a minimum.  As a result I highly 
recommend disable IRQ Balancing.

   make menuconfig for your kernel config
   Select "Processor type and features"
   Disable "Enable kernel irq balancing"
   Rebuild your kernel and reboot.

   You will have to poke around /proc to set which CPU an interrupt binds to.  
Here is what was used to set the interrupt/CPU bindings down below:
   echo 01 > /proc/irq/18/smp_affinity
   echo 02 > /proc/irq/20/smp_affinity

   The value used is expressed in powers of two.  ie. CPU3 would actually be 
04.

   cat /proc/interrupts
           CPU0       CPU1
  0: 3184569581 1789102599    IO-APIC-edge  timer
  1:       1005        218    IO-APIC-edge  i8042
  7:          0          0   IO-APIC-level  ohci_hcd
  8:          1          1    IO-APIC-edge  rtc
 12:        122         74    IO-APIC-edge  i8042
 14:          2          0    IO-APIC-edge  ide0
 18:  995373697       5139   IO-APIC-level  eth3
 20:          2 1378253801   IO-APIC-level  eth2
 27:    7542100    9352305   IO-APIC-level  eth0
 28:    4150402   13187680   IO-APIC-level  aic7xxx
 30:          0          0   IO-APIC-level  acpi
NMI:          0          0
LOC:  679927478  679903506
ERR:          0
MIS:          0

4.2.3.2  Taskset is your friend.

   Taskset allows an administrator to bind a software process to a specific 
processor on a box.  By using taskset you help cut down on CPU cache 
thrashing.  If your hosts will be running SNMP daemons, snort, etc, then you 
will want to bind snort to the least used CPU.  You want to keep your base 
load balancing system predictable.

4.3  Tune your routing!

   The Linux kernel defaults for routing work great in a lot of situations and 
unfortunately this is not one of them.  You will find that the Linux kernel 
will take some tuning to get the performance you want.

   The kernel counts on route hash entries to track existing conversations.  
One route hash entry is used per host-host communication.  ie.
   Entry 1:  192.168.1.1 -> 192.168.2.2
   Entry 2:  192.168.2.4 -> 192.168.7.5

   Imagine HOW MANY of these you might have given your volumes of traffic.  
Also imagine that the kernel has NO WAY to tell when a conversation is over.  
The kernel is not following TCP/UDP conversations thus entries age out of 
existence.  You will need to conduct performance testing to confirm how well 
your environment runs.  Just be warned that the more route hash entries you 
run, the more RAM the kernel WILL use.

   Example 'free' from your Policy router:
   /proc/sys/net/ipv4/route/gc_thresh:  786432
             total       used       free     shared    buffers     cached
Mem:       1034088    1007112      26976          0     310840     217220
-/+ buffers/cache:     479052     555036
Swap:      1028120          0    1028120

4.3.1  Adjust one kernel parameter and reboot.

   You will need to bump up the maximum number of supported route hash entries 
the kernel supports.  I recommend setting this rather high and using another 
parameter to set your ceiling.  Add the following to your boot config kernel 
parameters:
 rhash_entries=2400000

4.3.2  Adjust /proc/sys/net/... to tune your routing

   The Linux kernel uses six parameters to adjust how it handles managing the 
routing hashes, collisions, and aging.  

   gc_elesticity can best be described as the average bucket depth the kernel 
will accept before it starts expiring route hash entries.  This will help 
maintain the upper limit of active routes.
   echo 8 > /proc/sys/net/ipv4/route/gc_elasticity

   I had limited success playing with these next two entries seeing as I could 
find little information on the effect of either one.
   echo 60 > /proc/sys/net/ipv4/route/gc_interval
   echo 0 > /proc/sys/net/ipv4/route/gc_min_interval

   gc_thresh is another limiting factor in controlling how much RAM your 
policy routing will eat up.  This number cannot be greater than the 
rhash_entries kernel parameter.  As a rule of thumb, set your rhash_entries 
parameter REALLY high(mine is 2.4million) and control your running limit with 
gc_thresh.
   echo 1048576 > /proc/sys/net/ipv4/route/gc_thresh

   This parameter needs better kernel docs.
   echo 300 > /proc/sys/net/ipv4/route/gc_timeout

   The secret_interval instructs the kernel how often to blow away ALL route 
hash entries regardless of how new/old they are.  In our environment this is 
generally bad.  The CPU will be busy rebuilding thousands of entries per 
second every time the cache is cleared.  However we set this to run once a 
day to keep memory leaks at bay(though we've never had one).
   echo 86400 > /proc/sys/net/ipv4/route/secret_interval

4.3.3  Basic scripts

   These aren't perfect but then again, what is....

4.3.3.1  watcherrors

#!/bin/tcsh
set interval=15
set argc=`echo $argv | wc -w | tr -s " " "\t" | cut -f2`
if ( $argc > 0 ) then
        set interval=$argv[1]
endif

set stats=`ifconfig eth3 | egrep 'RX packets:' | tr -s ": " "\t" | cut -f4,6`
set interrupts=`cat /proc/interrupts | egrep "eth[23]" | tr -s " " "\t" | cut 
-f 3,4`
while ( 1 )
        sleep $interval
        set newstats=`ifconfig eth3 | egrep 'RX packets:' | tr -s ": " "\t" | 
cut -f4,6`
        set packets=`expr $newstats[1] - $stats[1]`
        set errors=`expr $newstats[2] - $stats[2]`
        set percentage=`expr $errors "*" 10000 / $packets`
        set packetspersec=`expr $packets / $interval `
        set date=`date "+%m/%d/%y %H:%M:%S"`
        set entries=`cat /proc/net/stat/rt_cache | tr -s " " "\t" | cut -f 1 | 
head -n 2 | tail -n 1`
        set newinterrupts=`cat /proc/interrupts | egrep "eth[23]" | tr -s " " 
"\t" | cut -f 3,4`
        set eth3int=`expr $newinterrupts[1] - $interrupts[1]`
        set eth2int=`expr $newinterrupts[4] - $interrupts[4]`
        echo "$date entries:  $entries  Pkts:  $packets  Err:  $errors  PPS:  
$packetspersec  Drop %:  0.$percentage%  Eth3RXInt:  $eth3int  Eth2TXInt:  
$eth2int"
        set stats=( $newstats )
        set interrupts=( $newinterrupts )
end

4.3.3.2  Policy Routing Control Script

   This script assumes that you will have two files, a policy route file, and 
policy rule file.

   The ROUTE file should have the following format:
   [ip of next hop] [outgoing interface] [table #]

   Example 'routefile' contents:
   10.0.1.10     eth2    31
   10.0.1.11     eth2    32

   The RULE file should have the following format:
   [CIDR BLOCK] [table #]

   Example 'rulefile' contents:
   172.30.0.0/24 31
   172.16.0.0/22 32

   Data to/from 172.30.0.0/24 would be sent to sensor 10.0.1.10.  Data to/from 
172.16.0.0/22 would be sent to sensor 10.0.1.11.


#POLICY V1.0 script
#!/bin/tcsh
set policyrulefile=/opt/bin/policyrules
set policyroutefile=/opt/bin/policyroutes

set argc=`echo $argv | wc -w | tr -s " " "\t" | cut -f2`
if ( $argc < 3 ) then
        echo Policy V1.00
        echo "Usage:  policy [add|delete|show] [routefile] [rulefile]"
        exit
endif
set command=`echo $1 | tr "[a-z]" "[A-Z"`
if ( ( $command == "ADD" ) || ( $command == "DELETE" ) ) then
        if ( ! -e $argv[2] ) then
                echo Route file $argv[2] does not exist
                exit
        endif
        if ( ! -e $argv[3] ) then
                echo Rule file $argv[3] does not exist
                exit
        endif
endif

set policyroutefile=$argv[2]
set policyrulefile=$argv[3]

set policyrulecount=`egrep "[0-9]\/[0-9]" $policyrulefile | wc -l | tr -s " " 
"\t" | cut -f2`
set policyrules=`egrep "[0-9]\/[0-9]" $policyrulefile | tr -s " " "\t"`

set policyroutecount=`egrep "[0-9]      eth" $policyroutefile | wc -w | tr -s 
" " "\t" | cut -f2`
set policyroutes=`egrep "[0-9]  eth" $policyroutefile | tr -s " " "\t"`

if ( $command == "ADD" ) then
        echo -n "Turning up..."
        set alternate=0
        foreach policyrule ($policyrules )
                if ( $alternate ) then
                        set alternate=0
                        set table=$policyrule
                        #       echo "Adding rule:  $range $table"
                        echo -n "."
                        /sbin/ip rule add type unicast dev eth3 from $range 
table $table
                        /sbin/ip rule add type unicast dev eth3 to $range 
table $table
                else
                        set alternate=1
                        set range=$policyrule
                endif
        end
        set loop=0
        while ( $loop != $policyroutecount )
                set loop=`expr $loop + 1`
                set gw=$policyroutes[$loop]
                set loop=`expr $loop + 1`
                set device=$policyroutes[$loop]
                set loop=`expr $loop + 1`
                set table=$policyroutes[$loop]
                #echo "Adding default route:  $gw $device $table"
                echo -n "+"
                /sbin/ip route add default via $gw dev $device table $table
        end
        /sbin/ip route flush cache
        echo 1 > /proc/sys/net/ipv4/ip_forward
        echo ""
endif

if ( $command == "DELETE" ) then
        echo -n "Turning down..."
        echo 0 > /proc/sys/net/ipv4/ip_forward
        set alternate=0
        foreach policyrule ($policyrules )
                if ( $alternate ) then
                        set alternate=0
                        set table=$policyrule
                        #echo "Deleting rule:  $range $table"
                        echo -n "."
                        /sbin/ip rule delete type unicast dev eth3 from $range 
table $table
                        /sbin/ip rule delete type unicast dev eth3 to $range 
table $table
                else
                        set alternate=1
                        set range=$policyrule
                endif
        end
        set loop=0
        while ( $loop != $policyroutecount )
                set loop=`expr $loop + 1`
                set gw=$policyroutes[$loop]
                set loop=`expr $loop + 1`
                set device=$policyroutes[$loop]
                set loop=`expr $loop + 1`
                set table=$policyroutes[$loop]
                #echo "Deleting default route:  $gw $device $table"
                echo -n "+"
                /sbin/ip route delete default via $gw dev $device table $table
        end
        /sbin/ip route flush cache
        echo ""
endif

if ( $command == "SHOW" ) then
        set tables=`cat $policyroutefile | egrep "[0-9]" | tr -s " " "\t" | 
cut -f3 | sort -u`
        ip rule list
        foreach table ($tables)
                ip route list table $table
        end
endif



-----------------------------------------------------------------------------

5. Copyright

Copyright © 2005 Jeremy M. Guthrie

Permission is granted to copy, distribute and/or modify this document under
the terms of the GNU Free Documentation License, Version 1.1 or any later
version published by the Free Software Foundation; with no Invariant
Sections, no Front-Cover Texts and no Back-Cover Texts. A copy of the license
is included in the section entitled "GNU Free Documentation License".

-----------------------------------------------------------------------------

5.1. GNU Free Documentation License

Version 1.1, March 2000

   
    Copyright (C) 2000 Free Software Foundation, Inc. 59 Temple Place, Suite
    330, Boston, MA 02111-1307 USA Everyone is permitted to copy and
    distribute verbatim copies of this license document, but changing it is
    not allowed.
   
-----------------------------------------------------------------------------
5.2. PREAMBLE

The purpose of this License is to make a manual, textbook, or other written
document "free" in the sense of freedom: to assure everyone the effective
freedom to copy and redistribute it, with or without modifying it, either
commercially or noncommercially. Secondarily, this License preserves for the
author and publisher a way to get credit for their work, while not being
considered responsible for modifications made by others.

This License is a kind of "copyleft", which means that derivative works of
the document must themselves be free in the same sense. It complements the
GNU General Public License, which is a copyleft license designed for free
software.

We have designed this License in order to use it for manuals for free
software, because free software needs free documentation: a free program
should come with manuals providing the same freedoms that the software does.
But this License is not limited to software manuals; it can be used for any
textual work, regardless of subject matter or whether it is published as a
printed book. We recommend this License principally for works whose purpose
is instruction or reference.
-----------------------------------------------------------------------------

5.3. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work that contains a notice
placed by the copyright holder saying it can be distributed under the terms
of this License. The "Document", below, refers to any such manual or work.
Any member of the public is a licensee, and is addressed as "you".

A "Modified Version" of the Document means any work containing the Document
or a portion of it, either copied verbatim, or with modifications and/or
translated into another language.

A "Secondary Section" is a named appendix or a front-matter section of the
Document that deals exclusively with the relationship of the publishers or
authors of the Document to the Document's overall subject (or to related
matters) and contains nothing that could fall directly within that overall
subject. (For example, if the Document is in part a textbook of mathematics,
a Secondary Section may not explain any mathematics.) The relationship could
be a matter of historical connection with the subject or with related
matters, or of legal, commercial, philosophical, ethical or political
position regarding them.

The "Invariant Sections" are certain Secondary Sections whose titles are
designated, as being those of Invariant Sections, in the notice that says
that the Document is released under this License.

The "Cover Texts" are certain short passages of text that are listed, as
Front-Cover Texts or Back-Cover Texts, in the notice that says that the
Document is released under this License.

A "Transparent" copy of the Document means a machine-readable copy,
represented in a format whose specification is available to the general
public, whose contents can be viewed and edited directly and
straightforwardly with generic text editors or (for images composed of
pixels) generic paint programs or (for drawings) some widely available
drawing editor, and that is suitable for input to text formatters or for
automatic translation to a variety of formats suitable for input to text
formatters. A copy made in an otherwise Transparent file format whose markup
has been designed to thwart or discourage subsequent modification by readers
is not Transparent. A copy that is not "Transparent" is called "Opaque".

Examples of suitable formats for Transparent copies include plain ASCII
without markup, Texinfo input format, LaTeX input format, SGML or XML using a
publicly available DTD, and standard-conforming simple HTML designed for
human modification. Opaque formats include PostScript, PDF, proprietary
formats that can be read and edited only by proprietary word processors, SGML
or XML for which the DTD and/or processing tools are not generally available,
and the machine-generated HTML produced by some word processors for output
purposes only.

The "Title Page" means, for a printed book, the title page itself, plus such
following pages as are needed to hold, legibly, the material this License
requires to appear in the title page. For works in formats which do not have
any title page as such, "Title Page" means the text near the most prominent
appearance of the work's title, preceding the beginning of the body of the
text.
-----------------------------------------------------------------------------

5.4. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially
or noncommercially, provided that this License, the copyright notices, and
the license notice saying this License applies to the Document are reproduced
in all copies, and that you add no other conditions whatsoever to those of
this License. You may not use technical measures to obstruct or control the
reading or further copying of the copies you make or distribute. However, you
may accept compensation in exchange for copies. If you distribute a large
enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may
publicly display copies.
-----------------------------------------------------------------------------

5.5. COPYING IN QUANTITY

If you publish printed copies of the Document numbering more than 100, and
the Document's license notice requires Cover Texts, you must enclose the
copies in covers that carry, clearly and legibly, all these Cover Texts:
Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover.
Both covers must also clearly and legibly identify you as the publisher of
these copies. The front cover must present the full title with all words of
the title equally prominent and visible. You may add other material on the
covers in addition. Copying with changes limited to the covers, as long as
they preserve the title of the Document and satisfy these conditions, can be
treated as verbatim copying in other respects.

If the required texts for either cover are too voluminous to fit legibly, you
should put the first ones listed (as many as fit reasonably) on the actual
cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more
than 100, you must either include a machine-readable Transparent copy along
with each Opaque copy, or state in or with each Opaque copy a
publicly-accessible computer-network location containing a complete
Transparent copy of the Document, free of added material, which the general
network-using public has access to download anonymously at no charge using
public-standard network protocols. If you use the latter option, you must
take reasonably prudent steps, when you begin distribution of Opaque copies
in quantity, to ensure that this Transparent copy will remain thus accessible
at the stated location until at least one year after the last time you
distribute an Opaque copy (directly or through your agents or retailers) of
that edition to the public.

It is requested, but not required, that you contact the authors of the
Document well before redistributing any large number of copies, to give them
a chance to provide you with an updated version of the Document.
-----------------------------------------------------------------------------

5.6. MODIFICATIONS

You may copy and distribute a Modified Version of the Document under the
conditions of sections 2 and 3 above, provided that you release the Modified
Version under precisely this License, with the Modified Version filling the
role of the Document, thus licensing distribution and modification of the
Modified Version to whoever possesses a copy of it. In addition, you must do
these things in the Modified Version:

 A. Use in the Title Page (and on the covers, if any) a title distinct from
    that of the Document, and from those of previous versions (which should,
    if there were any, be listed in the History section of the Document). You
    may use the same title as a previous version if the original publisher of
    that version gives permission.
   
 B. List on the Title Page, as authors, one or more persons or entities
    responsible for authorship of the modifications in the Modified Version,
    together with at least five of the principal authors of the Document (all
    of its principal authors, if it has less than five).
   
 C. State on the Title page the name of the publisher of the Modified
    Version, as the publisher.
   
 D. Preserve all the copyright notices of the Document.
   
 E. Add an appropriate copyright notice for your modifications adjacent to
    the other copyright notices.
   
 F. Include, immediately after the copyright notices, a license notice giving
    the public permission to use the Modified Version under the terms of this
    License, in the form shown in the Addendum below.
   
 G. Preserve in that license notice the full lists of Invariant Sections and
    required Cover Texts given in the Document's license notice.
   
 H. Include an unaltered copy of this License.
   
 I. Preserve the section entitled "History", and its title, and add to it an
    item stating at least the title, year, new authors, and publisher of the
    Modified Version as given on the Title Page. If there is no section
    entitled "History" in the Document, create one stating the title, year,
    authors, and publisher of the Document as given on its Title Page, then
    add an item describing the Modified Version as stated in the previous
    sentence.
   
 J. Preserve the network location, if any, given in the Document for public
    access to a Transparent copy of the Document, and likewise the network
    locations given in the Document for previous versions it was based on.
    These may be placed in the "History" section. You may omit a network
    location for a work that was published at least four years before the
    Document itself, or if the original publisher of the version it refers to
    gives permission.
   
 K. In any section entitled "Acknowledgements" or "Dedications", preserve the
    section's title, and preserve in the section all the substance and tone
    of each of the contributor acknowledgements and/or dedications given
    therein.
   
 L. Preserve all the Invariant Sections of the Document, unaltered in their
    text and in their titles. Section numbers or the equivalent are not
    considered part of the section titles.
   
 M. Delete any section entitled "Endorsements". Such a section may not be
    included in the Modified Version.
   
 N. Do not retitle any existing section as "Endorsements" or to conflict in
    title with any Invariant Section.
   

If the Modified Version includes new front-matter sections or appendices that
qualify as Secondary Sections and contain no material copied from the
Document, you may at your option designate some or all of these sections as
invariant. To do this, add their titles to the list of Invariant Sections in
the Modified Version's license notice. These titles must be distinct from any
other section titles.

You may add a section entitled "Endorsements", provided it contains nothing
but endorsements of your Modified Version by various parties--for example,
statements of peer review or that the text has been approved by an
organization as the authoritative definition of a standard.

You may add a passage of up to five words as a Front-Cover Text, and a
passage of up to 25 words as a Back-Cover Text, to the end of the list of
Cover Texts in the Modified Version. Only one passage of Front-Cover Text and
one of Back-Cover Text may be added by (or through arrangements made by) any
one entity. If the Document already includes a cover text for the same cover,
previously added by you or by arrangement made by the same entity you are
acting on behalf of, you may not add another; but you may replace the old
one, on explicit permission from the previous publisher that added the old
one.

The author(s) and publisher(s) of the Document do not by this License give
permission to use their names for publicity for or to assert or imply
endorsement of any Modified Version.
-----------------------------------------------------------------------------

5.7. COMBINING DOCUMENTS

You may combine the Document with other documents released under this
License, under the terms defined in section 4 above for modified versions,
provided that you include in the combination all of the Invariant Sections of
all of the original documents, unmodified, and list them all as Invariant
Sections of your combined work in its license notice.

The combined work need only contain one copy of this License, and multiple
identical Invariant Sections may be replaced with a single copy. If there are
multiple Invariant Sections with the same name but different contents, make
the title of each such section unique by adding at the end of it, in
parentheses, the name of the original author or publisher of that section if
known, or else a unique number. Make the same adjustment to the section
titles in the list of Invariant Sections in the license notice of the
combined work.

In the combination, you must combine any sections entitled "History" in the
various original documents, forming one section entitled "History"; likewise
combine any sections entitled "Acknowledgements", and any sections entitled
"Dedications". You must delete all sections entitled "Endorsements."
-----------------------------------------------------------------------------

5.8. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents
released under this License, and replace the individual copies of this
License in the various documents with a single copy that is included in the
collection, provided that you follow the rules of this License for verbatim
copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it
individually under this License, provided you insert a copy of this License
into the extracted document, and follow this License in all other respects
regarding verbatim copying of that document.
-----------------------------------------------------------------------------

5.9. AGGREGATION WITH INDEPENDENT WORKS

A compilation of the Document or its derivatives with other separate and
independent documents or works, in or on a volume of a storage or
distribution medium, does not as a whole count as a Modified Version of the
Document, provided no compilation copyright is claimed for the compilation.
Such a compilation is called an "aggregate", and this License does not apply
to the other self-contained works thus compiled with the Document, on account
of their being thus compiled, if they are not themselves derivative works of
the Document.

If the Cover Text requirement of section 3 is applicable to these copies of
the Document, then if the Document is less than one quarter of the entire
aggregate, the Document's Cover Texts may be placed on covers that surround
only the Document within the aggregate. Otherwise they must appear on covers
around the whole aggregate.
-----------------------------------------------------------------------------

5.10. TRANSLATION

Translation is considered a kind of modification, so you may distribute
translations of the Document under the terms of section 4. Replacing
Invariant Sections with translations requires special permission from their
copyright holders, but you may include translations of some or all Invariant
Sections in addition to the original versions of these Invariant Sections.
You may include a translation of this License provided that you also include
the original English version of this License. In case of a disagreement
between the translation and the original English version of this License, the
original English version will prevail.
-----------------------------------------------------------------------------

5.11. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as
expressly provided for under this License. Any other attempt to copy, modify,
sublicense or distribute the Document is void, and will automatically
terminate your rights under this License. However, parties who have received
copies, or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
-----------------------------------------------------------------------------

5.12. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU
Free Documentation License from time to time. Such new versions will be
similar in spirit to the present version, but may differ in detail to address
new problems or concerns. See [http://www.gnu.org/copyleft/] http://
www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the
Document specifies that a particular numbered version of this License "or any
later version" applies to it, you have the option of following the terms and
conditions either of that specified version or of any later version that has
been published (not as a draft) by the Free Software Foundation. If the
Document does not specify a version number of this License, you may choose
any version ever published (not as a draft) by the Free Software Foundation.
-----------------------------------------------------------------------------

5.13. How to use this License for your documents

To use this License in a document you have written, include a copy of the
License in the document and put the following copyright and license notices
just after the title page:

   
    Copyright (c) YEAR YOUR NAME. Permission is granted to copy, distribute
    and/or modify this document under the terms of the GNU Free Documentation
    License, Version 1.1 or any later version published by the Free Software
    Foundation; with the Invariant Sections being LIST THEIR TITLES, with the
    Front-Cover Texts being LIST, and with the Back-Cover Texts being LIST. A
    copy of the license is included in the section entitled "GNU Free
    Documentation License".
   
If you have no Invariant Sections, write "with no Invariant Sections" instead
of saying which ones are invariant. If you have no Front-Cover Texts, write
"no Front-Cover Texts" instead of "Front-Cover Texts being LIST"; likewise
for Back-Cover Texts.

If your document contains nontrivial examples of program code, we recommend
releasing these examples in parallel under your choice of free software
license, such as the GNU General Public License, to permit their use in free
software.


-- 

--------------------------------------------------
Jeremy M. Guthrie        jeremy.guthrie@xxxxxxxxxx
Senior Network Engineer        Phone: 608-298-1061
Berbee                           Fax: 608-288-3007
5520 Research Park Drive         NOC: 608-298-1102
Madison, WI 53711

Attachment: pgpNdSWVMwLY9.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>