Received: by oss.sgi.com id ; Thu, 8 Jun 2000 09:13:54 -0700 Received: from dpx20.tu-varna.acad.bg ([194.141.24.4]:3337 "EHLO dpx20.tu-varna.acad.bg") by oss.sgi.com with ESMTP id ; Thu, 8 Jun 2000 09:13:47 -0700 Received: from linux.tu-varna.acad.bg (root@linux.tu-varna.acad.bg [194.141.24.6]) by dpx20.tu-varna.acad.bg (8.9.3/8.9.3) with ESMTP id OAA122510; Thu, 8 Jun 2000 14:43:50 +0300 Received: from linux.tu-varna.acad.bg (uli@linux.tu-varna.acad.bg [194.141.24.6]) by linux.tu-varna.acad.bg (8.8.5/myconf) with ESMTP id OAA05115; Thu, 8 Jun 2000 14:47:31 +0300 Date: Thu, 8 Jun 2000 14:47:31 +0300 (EEST) From: Julian Anastasov To: Andrey Savochkin cc: jetienne@arobas.net, netdev@oss.sgi.com Subject: Re: IFA_F_NO_NDISC (for vrrp) In-Reply-To: <20000608142528.A11492@saw.sw.com.sg> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Return-Path: X-Orcpt: rfc822;netdev-outgoing Hello, On Thu, 8 Jun 2000, Andrey Savochkin wrote: > Hello, > > On Wed, Jun 07, 2000 at 05:13:22PM -0400, Jerome Etienne wrote: > > On Wed, Jun 07, 2000 at 11:53:49AM +0800, Andrey Savochkin wrote: > > > Keeping the policy decision in user-space is a wise solution. > > > But you may just set NOARP flag for the device and do all the stuff in > > > user-land merging your 'virtual MAC' logic with any ARP daemon. > > > > I am not sure i understand your suggestion. I made some tests, and if > > IFF_NOARP is set i dont receive any messages on a listening NETLINK_ARPD > > socket. I need 2 things: > > Oh, I see. My quick thoughts appear to be wrong. > > > - the kernel to keep a arp cache (no arp cache in the kernel implies > > a exchange with the userspace at each ip/other packet, so not a reasonable > > solution) > > - the kernel must not reply the native MAC when it receives a ARP request > > for a virtual ip. > > > > If there is a solution which doesnt require to modify the kernel, i dont > > see it. If your suggestion fits the needs, can you please elaborate > > to help me to understand. > > > > The other VRRP implementations just run everything in the kernel to solve > > this problem. My patch is 3 lines in net/ipv4/arp.c seemed a good solution > > when i wrote it. If anybody see a better one, please tell me... > > Julian Anastasov also wanted some solution to block ARP replies for his > cluster project. Julian, I don't remember exactly your situation. May the > proposed patch solve some problems for you? Yes, but only "some" :) Jerome Etienne, can you look at the Linux Virtual Server (LVS) project: http://www.linuxvirtualserver.org http://www.linuxvirtualserver.org/arp.html Look for "Direct Route" forwarding method. The LVS uses shared addresses as in rfc2338. But rfc2338 has other requirements. They look very complex for me but I didn't looked very deeply. In LVS the "Backup" server can talk IP while in rfc2338 this is not allowed. In LVS, for example, we can run the service on all hosts and the LVS software (the Primary Router) on one of them. Why should we (1) keep an unused Backup server(s) and (2) why with VIP configured? Of course, it depends on your needs. In LVS the concept is different: Only the "Master" advertises Virtual IP but the "Backup" servers can work as normal (Real) servers. Your are free to stop the service when the Backup server takes the Primary role if you don't want to overload it. The Virtual MAC address is not needed, we reply with the MAC address of the current Primary router. You can send gratuitous ARP replies with a user space tool. - The "Backup" router(s) and the real servers can talk with VIP but they don't send ARP replies for VIP. Normally they don't receive traffic destined for VIP because they don't send ARP replies for VIP. By this way we block the direct access from the local clients to the real servers. They must forward their traffic through the virtual router. - The "Backup" and the real servers don't announce VIP as source of their ARP probes or they will not receive the expected reply (it will be received in the virtual router). There is a "hidden" device flag in 2.2.14+ which solves the above problems. We configure all VIPs on a dummy device and set this flag (we hide all VIPs on this device) in the "Backup" and in the real servers. There are other good side effects from this flag. Supporting only IFA_F_NO_NDISC is not enough even for rfc2338. I don't see reason the Backup server to hold VIP configured. Only the Primary server must have VIP configured. I see many restrictions in the rfc2338's solution to support shared addresses. May be I don't understand well rfc2338, I don't know how you are using it. Are you trying to implement rfc2338 or just to build a working setup? I think, the only required kernel support can be: 1. not reply for these (hidden) VIPs - this can be solved with policy routing or other kind of filtering without the "hidden device" flag 2. not use them as source of the ARP probes - currently this can be solved if the VIP is not defined in the local table but in another table. But I'm not sure if the planned Andrey's fixes will allow this. Currently, arp_solicit selects only IP addresses from the local table which allows this requirement to work. After Andrey's patch they will be selected from any table and this requirement will not work. These requirements are completely covered from the "hidden device" feature. Can this feature solve your problem? I have a patch for 2.3.41+ which I can send you if you don't want to play with complex policy routing rules. The discussion for this feature in 2.3 stuck at the point where we don't know how to define VIPs (hidden IP addresses) correctly, whether we can do it with policy routing rules or there is another solution. You can read our discussion from the linux-kernel archives: http://kernelnotes.org/lnxlists/linux-kernel/lk_0005_01/ Date: 04-MAY-2000 Subject: arp, kernel 2.2.15 and 2.3.99-pre6 Jerome, can the hidden device flag solve your problem? Oh, I now see that may be you need this VIP to be on an ARP device (in the Backup server)? Regards -- Julian Anastasov