On Thu, 8 Jun 2000, Jerome Etienne wrote:
> On Thu, Jun 08, 2000 at 02:47:31PM +0300, Julian Anastasov wrote:
> > In LVS the "Backup" server can talk IP while in
> > rfc2338 this is not allowed. In LVS, for example, we can run
> > the service on all hosts and the LVS software (the Primary
> > Router) on one of them. Why should we (1) keep an unused
> > Backup server(s)
> > Of course, it depends on your needs.
> Indeed, vrrp and lvs are similar but still distinct.
> LVS modifies the two peers of the connection (dispatcher and server).
> VRRP is designed to work without the client being aware of it.
> The primary goal is to run several routers/servers, possibly running
> different services (e.g. 2 routers both route but dont advertize
> the same routes). If a router crash, another one with take over its
> IPs (still keeping its own ones), so the client will goes on to send
> the packets to the same IP/MAC without being aware of the transition.
Very good. But I don't know why you think this is a
difference. This is possible with LVS too. The client uses
the same VIP as dest addr. The difference is the Virtual
MAC. The transition is not so easy as you said. In LVS the
dispatcher keeps state for each connection to the real
hosts. When switching from one router to another after crash
you usually lose this connection table. So, all current
connections are broken. Sending a gratuitous ARP broadcast
is always faster than the establishing this table in the new
router. The new router don't know which connection to which
real host was forwarded and to continue the connections.
The result: the connections are broken. So, keeping the
local clients happy with the same VMAC is not a solution.
You have to restore the connection table in the new router.
Currently, this is not done in LVS. I'm not sure if it will
> > and (2) why with VIP configured?
> In vrrp, the backup must not have the virtual IPs 'online'. Only the
> master can receive/send packets with the virtual ips.
OK, VIP is configured in the Master, not in the
Backup. Or it is configured but flagged with
IFA_F_NO_NDISC? Is it configured in the real hosts? Is NAT
running in the default routers? How looks the packet from
the real host through the default router? Now I don't know
in which host you want to stop replying for VIP.
> > May be I don't understand well rfc2338, I don't
> > know how you are using it. Are you trying to implement
> > rfc2338 or just to build a working setup?
> rfc2338 as it is specified. my current implementation works as specified
> for 1 virtual group per physical interface but uses a non-standard trick
> (i.e. no more handle the virtual MAC) to support several groups per
>  http://w3.arobas.net/~jetienne/vrrpd.tgz
I read it. But I don't know if you tested only the
Master-Backup operations (which is the scope of this rfc).
Because I'm not sure the setup described in rfc2338 (which
is very limitted) will work without NAT-ing packets in the
Linux dispatcher. If they are NAT-ed I don't know why you
have VIP in the internal hosts. Where are VIPs, except in
the Master, configured, even specified with this flag?
> > I think, the only required kernel support can be:
> > 1. not reply for these (hidden) VIPs
> > - this can be solved with policy routing or other
> > kind of filtering without the "hidden device" flag
> Yes, i read the linux kernel thread and the arp_filter solution
> of andi kleen but honestly i dont understand it. why and how to
> use the route table (so a destination thing) to know if you have
> to 'arp-reply' for a locally configured address (so a local thing) ?
arp_filter is not in our game.
Here are a simple rules to hide VIP in the Backup/Real
servers. I'm still not sure if the traffic is NAT-ed in
your routers or just forwarded. Because if the packets from
the real hosts are just forwarded and saddr=VIP these
packets will not pass the Linux router due to source address
validation checks. I have a patch to solve this problem but
I don't know how your implementation works. If you reach a
point where you need it just let me know. I can propose it
to the netdev. It allows saddr=local_ip to be allowed with
the rp_filter flag. The danger is in the fact that rp_filter
defaults to 0 and a patched kernel will allow spoofing if
the flag is not set for the non-trusted interfaces.
The setup is for non-Master hosts:
# Block access from the LAN to the real server's VIP. By
# this way we ignore the router's ARP probes. The drawback:
# we ignore the client's probes too. We have to do this
# because the client on the LAN can receive replies from all
# real servers
ip rule add prio 99 from 192.168.0/24 table 99
ip route add table 99 blackhole 192.168.0.100
# Now accept locally any other traffic, i.e. not from
ip rule add prio 100 table 100
ip route add table 100 local 192.168.0.100 dev lo
Being in table 100 the VIP (192.168.0.100) is not selected
as source of any ARP probes. In other words you don't
configure VIP using ifconfig or by adding IP addresses. And
with the blackhole setting we block the traffic from the
192.168.0 logical network to the VIP (we solve the ARP
problem). We can talk only with clients through many
outgoing router(s) but not with local clients. Only the
hidden flag allows the clients to be on the same LAN from
the 192.168.0 logical network.
> > 2. not use them as source of the ARP probes
> > - currently this can be solved if the VIP is not
> > defined in the local table but in another table. But
> > I'm not sure if the planned Andrey's fixes will
> > allow this. Currently, arp_solicit selects only IP
> > addresses from the local table which allows this
> > requirement to work. After Andrey's patch they will
> > be selected from any table and this requirement will
> > not work.
> which patch are you speaking about ?
I looked it one month ago:
I see it is not changed from long time ago and I
don't know its status. Andrey?
> > Jerome, can the hidden device flag solve your
> > problem? Oh, I now see that may be you need this VIP to be
> > on an ARP device (in the Backup server)?
> I think so, i coded IFA_F_NO_NDISC because i wasnt aware of 'hidden'.
> It isnt in the kernel source i looked in i.e. 2.4.0-test1.
> As our needs are similar, we should use the same mechanism.
I'm still not sure how the packets look exiting
from the real hosts in your implementation. You have to
explain this because I can't imagine how your setup is
working. Without the above two requirements you patch is not
working for LVS. If VIP is configured as address and so put
in the "local" table it can be used as source in the ARP
probes. I see this in your "MUST NOT"s for the Backup host.
But this is requirement for all backup/real servers in LVS.
So, your patch must be tuned if the net folks like the idea
of this address flag. You have to patch inet_select_addr()
too. The problem is arp_solicit, i.e. which local addresses
are allowed in the ARP probes. This is decided from the
routing. Your patch and defining VIP as interface address
will allow using VIP in the ARP probes. But this is in our
"MUST NOT". So, the big question is how you will put the
IFA_F_NO_NDISC flag in the routing code (fib_lookup).
> IFA_F_NO_NDISC is tunable per address so seems to be more flexible
> than 'hidden' which is tunable per interface.
This is not a problem. Just put your VIP in lo or
dummy device and be happy. There is no difference at all. I
don't know why you/rfc have VIP configured if you want the
VIP to be so hidden (even not to talk IP). Just don't
configure VIP. May be I don't understand something, you
have to explain it.
> Thinking about it, i have another project in which i need to completly
> handle ARP reply and request in user space (still keeping the cache
> in the kernel).
Playing in user space is preferred. But we need to
accept packets destined for VIP in the real hosts (where we
hide these addresses). So, we need the mechanism to select
IP addresses as source for the ARP probes to be determined
in the kernel. It is not possible to control this from user
space. The next step after the "hidden" flag is to remove
the VIP but this is suitable only for your Backups.
> Why not provide a mechanism allowing to handle ARP request/reply from
> userspace ? (af_packet to sniff the reply from the network, a flag
> to prevent the kernel from replying and a kind of CONFIG_ARPD to
> send ARP request to userspace via netlink)
> It would be flexilbe and would satisfy the needs of my other project and
> vrrp. i think it would be enough for lvs too, correct ?
It is not enough. Read requirement 2. I'm sure you
will reach to the same problems as the problems in LVS. If
you explain what IP addresses are included in you
communications I will tell you where is the problem. Playing
in the happy world only with Master and Backups is not
enough. Did you added some real hosts and clients in your
tests? I assume rfc2338 is only to support failover for
routers which use NAT. Is that true? Can this rfc work for
plain routing (without mangling packets) in Linux?
> If i writes that, would it be more acceptable for the maintainers ?
I'm not sure. I'm not against your patch but I think
it will include the same requirements as the patch for the
And please explain how the packets from one client
request and the answer traverse all hosts in the cluster.
I think, the common variants are two:
1> Routers using NAT
- the setup is same as the pictures in rfc2338
- you don't need to configure VIP in the real hosts,
you can use private IP addresses
- you don't need to configure VIP in the backups
(this is common to both setups)
- you can use the dispatcher as def gw
2> Router not using NAT (forwarding without packet mangling)
- VIP is configured as hidden IP in the real servers
- your patch must avoid these ARP requests too:
"who has MASTER_ROUTER tell VIP" - we call for
our def gw.
I think, the static routes are obsoleted from
rfc2338 :) Just play with ARP to solve this problem
in your patch.
- you don't need to configure VIP in the backups
- you can use any random router for the outgoing
- the picture from rfc2338 can't work with the
current restrictions in the Linux router (the
source address validation), you can't use the
dispatcher as def gw without patching it because
the packets reach the dispatcher with saddr=VIP.
Even the policy routing can't help here.
The result: you need to flag only the addresses in the real
servers (if not using NAT). Waiting for your comments :)
Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>