[Top] [All Lists]

Re: IFA_F_NO_NDISC (for vrrp)

To: Jerome Etienne <jetienne@xxxxxxxxxx>
Subject: Re: IFA_F_NO_NDISC (for vrrp)
From: Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 9 Jun 2000 08:57:57 +0300 (EEST)
Cc: Andrey Savochkin <saw@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <>
Sender: owner-netdev@xxxxxxxxxxx

On Thu, 8 Jun 2000, Jerome Etienne wrote:

> On Thu, Jun 08, 2000 at 02:47:31PM +0300, Julian Anastasov wrote:
> >     In  LVS  the "Backup"  server can  talk IP  while in
> > rfc2338 this is not allowed. In LVS, for example, we can run
> > the  service on all hosts and  the LVS software (the Primary
> > Router)  on one  of them. Why  should we (1)  keep an unused
> > Backup server(s) 
> > 
> >     Of course, it depends on your needs.
> Indeed, vrrp and lvs are similar but still distinct.
> LVS modifies the two peers of the connection (dispatcher and server).
> VRRP is designed to work without the client being aware of it.
> The primary goal is to run several routers/servers, possibly running
> different services (e.g. 2 routers both route but dont advertize
> the same routes). If a router crash, another one with take over its 
> IPs (still keeping its own ones), so the client will goes on to send 
> the packets to the same IP/MAC without being aware of the transition.

        Very  good. But I don't know why you think this is a
difference.   This is possible with LVS too. The client uses
the  same VIP as  dest addr.  The  difference is the Virtual
MAC.   The transition is not so easy as you said. In LVS the
dispatcher  keeps  state  for each  connection  to  the real
hosts. When switching from one router to another after crash
you  usually  lose this  connection  table. So,  all current
connections  are broken.  Sending a gratuitous ARP broadcast
is always faster than the establishing this table in the new
router.  The new router don't know which connection to which
real  host was  forwarded and  to continue  the connections.
The  result: the  connections are  broken.  So,  keeping the
local  clients happy with  the same VMAC  is not a solution.
You  have to restore the connection table in the new router.
Currently, this is not done in LVS.  I'm not sure if it will
be done.

> > and (2) why with VIP configured?
> In vrrp, the backup must not have the virtual IPs 'online'. Only the 
> master can receive/send packets with the virtual ips.

        OK,  VIP  is configured  in the  Master, not  in the
Backup.     Or   it   is   configured   but   flagged   with
IFA_F_NO_NDISC?  Is it configured in  the real hosts? Is NAT
running  in the default  routers? How looks  the packet from
the  real host through the default  router? Now I don't know
in which host you want to stop replying for VIP.

> >     May  be  I don't  understand  well rfc2338,  I don't
> > know  how  you are  using it.  Are  you trying  to implement
> > rfc2338  or just to build a working setup? 
> rfc2338 as it is specified. my current implementation[1] works as specified
> for 1 virtual group per physical interface but uses a non-standard trick
> (i.e. no more handle the virtual MAC) to support several groups per 
> interface.
> [1]

        I  read it. But I don't  know if you tested only the
Master-Backup  operations (which is the  scope of this rfc).
Because  I'm not sure the  setup described in rfc2338 (which
is  very limitted) will work  without NAT-ing packets in the
Linux  dispatcher. If they  are NAT-ed I  don't know why you
have  VIP in the  internal hosts. Where  are VIPs, except in
the Master, configured, even specified with this flag?

> > I think, the only required kernel support can be:
> > 
> > 1. not reply for these (hidden) VIPs
> >     - this  can be  solved with policy  routing or other
> >     kind of filtering without the "hidden device" flag
> Yes, i read the linux kernel thread and the arp_filter solution 
> of andi kleen but honestly i dont understand it. why and how to 
> use the route table (so a destination thing) to know if you have
> to 'arp-reply' for a locally configured address (so a local thing) ?

arp_filter is not in our game.

Here  are  a simple  rules to  hide  VIP in  the Backup/Real
servers.   I'm still  not sure if  the traffic  is NAT-ed in
your routers or just forwarded.  Because if the packets from
the  real  hosts  are  just  forwarded  and  saddr=VIP these
packets will not pass the Linux router due to source address
validation checks.  I have a patch to solve this problem but
I  don't know how your implementation works.  If you reach a
point  where you need it just let  me know. I can propose it
to  the netdev. It allows  saddr=local_ip to be allowed with
the rp_filter flag. The danger is in the fact that rp_filter
defaults  to 0 and  a patched kernel  will allow spoofing if
the flag is not set for the non-trusted interfaces.

The setup is for non-Master hosts:

# Block  access from the  LAN to the  real server's VIP.  By
# this  way we ignore the router's ARP probes. The drawback:
# we  ignore the  client's probes too.   We have  to do this
# because the client on the LAN can receive replies from all
# real servers
ip rule add prio 99 from 192.168.0/24 table 99
ip route add table 99 blackhole
# Now  accept  locally  any  other  traffic,  i.e.  not from
# 192.168.0/24
ip rule add prio 100 table 100
ip route add table 100 local dev lo

Being  in table 100 the  VIP ( is not selected
as  source  of any  ARP probes.   In  other words  you don't
configure  VIP using ifconfig or by adding IP addresses. And
with  the blackhole  setting we  block the  traffic from the
192.168.0  logical  network to  the  VIP (we  solve  the ARP
problem).   We  can  talk  only  with  clients  through many
outgoing  router(s) but  not with  local clients.   Only the
hidden  flag allows the  clients to be on  the same LAN from
the 192.168.0 logical network.

> > 2. not use them as source of the ARP probes
> > 
> >     -  currently this  can be solved  if the  VIP is not
> >     defined in the local table but in another table. But
> >     I'm  not  sure if  the  planned Andrey's  fixes will
> >     allow  this.  Currently, arp_solicit selects only IP
> >     addresses  from  the local  table which  allows this
> >     requirement  to work. After Andrey's patch they will
> >     be selected from any table and this requirement will
> >     not work.
> which patch are you speaking about ?

        I looked it one month ago:

        I  see it  is not changed  from long time  ago and I
don't know its status. Andrey?

> >     Jerome,  can  the  hidden  device  flag  solve  your
> > problem?  Oh, I now see that may  be you need this VIP to be
> > on  an ARP device (in the  Backup server)?
> I think so, i coded IFA_F_NO_NDISC because i wasnt aware of 'hidden'.
> It isnt in the kernel source i looked in i.e. 2.4.0-test1.
> As our needs are similar, we should use the same mechanism.

        I'm  still  not sure  how  the packets  look exiting
from  the  real hosts  in your  implementation. You  have to
explain  this  because I  can't  imagine how  your  setup is
working. Without the above two requirements you patch is not
working for LVS.  If VIP is configured as address and so put
in  the "local" table  it can be  used as source  in the ARP
probes.  I see this in your "MUST NOT"s for the Backup host.
But  this is requirement for all backup/real servers in LVS.
So,  your patch must be tuned if the net folks like the idea
of  this address flag. You  have to patch inet_select_addr()
too.  The problem is arp_solicit, i.e. which local addresses
are  allowed in  the ARP probes.   This is  decided from the
routing.   Your patch and defining  VIP as interface address
will  allow using VIP in the ARP  probes. But this is in our
"MUST  NOT". So,  the big question  is how you  will put the
IFA_F_NO_NDISC flag in the routing code (fib_lookup).
> IFA_F_NO_NDISC is tunable per address so seems to be more flexible
> than 'hidden' which is tunable per interface.

        This  is not a  problem. Just put your  VIP in lo or
dummy device and be happy. There is no difference at all.  I
don't  know why you/rfc have VIP  configured if you want the
VIP  to  be so  hidden  (even not  to  talk IP).  Just don't
configure  VIP.  May  be I  don't understand  something, you
have to explain it.

> Thinking about it, i have another project in which i need to completly 
> handle ARP reply and request in user space (still keeping the cache
> in the kernel).

        Playing  in user space is  preferred. But we need to
accept  packets destined for VIP in the real hosts (where we
hide  these addresses).  So, we need the mechanism to select
IP  addresses as source for the  ARP probes to be determined
in the kernel.  It is not possible to control this from user
space.  The next step  after the "hidden"  flag is to remove
the VIP but this is suitable only for your Backups.

> Why not provide a mechanism allowing to handle ARP request/reply from
> userspace ? (af_packet to sniff the reply from the network, a flag 
> to prevent the kernel from replying and a kind of CONFIG_ARPD to 
> send ARP request to userspace via netlink)
> It would be flexilbe and would satisfy the needs of my other project and
> vrrp. i think it would be enough for lvs too, correct ?

        It  is not enough. Read  requirement 2. I'm sure you
will  reach to the same problems  as the problems in LVS. If
you   explain  what   IP  addresses  are   included  in  you
communications I will tell you where is the problem. Playing
in  the  happy world  only with  Master  and Backups  is not
enough.   Did you added some real  hosts and clients in your
tests?  I  assume rfc2338  is only  to support  failover for
routers  which use NAT. Is that  true? Can this rfc work for
plain routing (without mangling packets) in Linux?

> If i writes that, would it be more acceptable for the maintainers ?

        I'm not sure. I'm not against your patch but I think
it  will include the same requirements  as the patch for the
"hidden" flag.

        And  please explain how the  packets from one client
request and the answer traverse all hosts in the cluster.

        I think, the common variants are two:

1> Routers using NAT

        - the setup is same as the pictures in rfc2338

        - you don't need to configure VIP in the real hosts,
        you can use private IP addresses

        - you  don't need  to configure  VIP in  the backups
        (this is common to both setups)

        - you can use the dispatcher as def gw

2> Router not using NAT (forwarding without packet mangling)

        - VIP is configured as hidden IP in the real servers

        - your patch must avoid these ARP requests too:
        "who has MASTER_ROUTER tell VIP" - we call for
        our def gw.
        I  think,  the  static  routes  are  obsoleted  from
        rfc2338 :) Just play with ARP to solve this  problem
        in your patch.

        - you  don't need  to configure  VIP in  the backups

        - you can use any random router   for  the  outgoing

        -  the  picture  from rfc2338  can't  work  with the
        current   restrictions  in  the  Linux  router  (the
        source   address  validation),  you  can't  use  the
        dispatcher  as  def gw  without patching  it because
        the  packets  reach the  dispatcher  with saddr=VIP.
        Even the policy routing can't help here.

The  result: you need to flag only the addresses in the real
servers (if not using NAT). Waiting for your comments :)


Julian Anastasov <uli@xxxxxxxxxxxxxxxxxxxxxx>

<Prev in Thread] Current Thread [Next in Thread>