netdev
[Top] [All Lists]

(some persisting) troubles with neighbour cache code backport (even) in

To: Linux Netdev List <netdev@xxxxxxxxxxx>
Subject: (some persisting) troubles with neighbour cache code backport (even) in 2.4.30
From: Harald Welte <laforge@xxxxxxxxxxxx>
Date: Sat, 16 Apr 2005 10:21:00 +0200
Cc: David Miller <davem@xxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: mutt-ng 1.5.8-r168i (Debian)
Hi!

I just received this bug report.

I'll try to look into it later today/tomorrow, but just in case somebody
else has thoughts on this, I'm forwarding it.

Thanks!

-- 
- Harald Welte <laforge@xxxxxxxxxxxx>                   http://gnumonks.org/
============================================================================
"Privacy in residential applications is a desirable marketing option."
                                                  (ETSI EN 300 175-7 Ch. A6)
--- Begin Message ---
To: laforge@xxxxxxxxxxxxx
Subject: (some persisting) troubles with neighbour cache code backport (even) in 2.4.30
From: Jan Rafaj <rafaj@xxxxxxxxxxxxxx>
Date: Wed, 13 Apr 2005 22:18:56 +0200 (MEST)
Hi Harald,

First of all, please excuse me for bothering you this way and for
my amusing english (not used to write/speak it everyday).
I'll try to be as brief as possible.

Please note I'm not a programmer (although with certain C skills),
and consider me a joe-blow Linux user (somewhat advanced since I do
linux sysadm work for 10 years). I'm writing this in a hurry, so
it may look horrible, but its better to report than nothing, I guess.



I've recently decided to upgrade to 2.4.30 kernel on one of my routers,
from 2.4.26 it has been running up to now.
All went smoothly, except of one thing:
I suddenly experienced that the router suddenly stopped sending
traffic to its default destination route.
By taking a look at ARP table, everything was OK - specifically,
the gateway IP's ARP record was cached as normal.
I just tried to delete the ARP record for gateway IP - and that
has forced router to elicit ARP query for the peer (which it didnt send
before for a long time), and the traffic was sent to gateway IP again!
However, this state semi-randomly persisted ranging only from several
minutes up to few hours, according to my observation. After that time,
the router again lost the track to gateway IP (and manual deletion
of cached ARP record for the gateway IP was needed again). It occasionally
resurrected itself (after few hours, usually), but then everything
was just repeating with same symptoms. And that circle went for
several days, while I was investigating.

Here, I have to say that the mentioned router has somewhat fairly 'odd'
routing table:
Most (direct, interface) routes are of 'host route' type, added by command
like:

ifconfig eth0 1.2.3.4 pointopoint 5.6.7.8 netmask 255.255.255.255

I know that according to RFCs that shouldnt be (and all that stuff that
says that broadcasting capability is required for ethernet media - indeed
it is, but in '2 endpoints involved' scenario ARP broadcast capability is
really only necessary), but it works, has worked long time ago already and
is still used in some special situations like in my case (assigned small
range of 'public' IPs, and the effort to save as much those precious IPs
for real machines has lead one to use private IPs on router interfaces
pointing to client machines - please consider that also all those 'client'
ethernet links have just two points (the client machine and router
interface).

First I suspected it might be just becouse of old e100 & e1000 drivers,
so I've replaced them (compiled from scratch, of course; I always do
compile everything) with those published directly by INTEL (that helped in
case of 2.4.26, where I had another problem, that just turned out
to be a driver issue).
That didnt help.
As I didnt want to downgrade back to 2.4.26 *just* becouse of
this, I googled a bit more and came by to an 'dual ARP records' issue
someone had reported with 2.4.28. Starting to suspect ARP code,
I just quicky "backported" the original neighbour ARP handling
code from 2.4.26 to 2.4.30 - by editing/replacing following files:

./include/net/neighbour.h
./net/core/neighbour.c
./net/ipv4/arp.c
./net/decnet/dn_route.c
./net/decnet/dn_neigh.c
./net/atm/proc.c
./net/netsyms.c

And that helped!
Now I again see the router asks for gateway IP's ARP in 1min. intervals
again. And no odd 'gateway IP suddenly not responding & router
sends no ARP queries after a minute' anymore.

Please note that I didnt observe this strange problem on other routers
here (already running 2.4.30), that have, however, more 'common' setup
than the one mentioned above (usual routes with commonly used matching
netmasks & broadcast addresses).

Should I investigate more thoroughly (should you want exact reports
from me), please let me know.

Do you think you could look at this issue?

Thanks & best regards,

Jan



--- End Message ---

Attachment: pgpVKte6AIZ93.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>
  • (some persisting) troubles with neighbour cache code backport (even) in 2.4.30, Harald Welte <=