netdev
[Top] [All Lists]

Re: netlink drops messages.

To: Andi Kleen <ak@xxxxxx>
Subject: Re: netlink drops messages.
From: "James R. Leu" <jleu@xxxxxxxxxxxxxx>
Date: Wed, 17 Jan 2001 12:54:05 -0600
Cc: Werner Almesberger <Werner.Almesberger@xxxxxxx>, kuznet@xxxxxxxxxxxxx, gleb@xxxxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <20010117184852.A7146@fred.local>; from ak@muc.de on Wed, Jan 17, 2001 at 06:48:52PM +0100
Organization: none
References: <20010116124319.D1299@doit.wisc.edu> <200101161848.VAA31729@ms2.inr.ac.ru> <20010116131456.E1299@doit.wisc.edu> <20010117040542.Y18286@almesberger.net> <20010117121532.B1830@fred.local> <20010117074952.C2459@doit.wisc.edu> <20010117171310.A5589@fred.local> <20010117110035.G2459@doit.wisc.edu> <20010117184852.A7146@fred.local>
Reply-to: jleu@xxxxxxxxxxxxxx
Sender: owner-netdev@xxxxxxxxxxx
Hello,

More comments at the bottom:

On Wed, Jan 17, 2001 at 06:48:52PM +0100, Andi Kleen wrote:
> On Wed, Jan 17, 2001 at 06:00:42PM +0100, James R. Leu wrote:
> > Hello,
> > 
> > Comments at the bottom:
> > 
> > On Wed, Jan 17, 2001 at 05:13:10PM +0100, Andi Kleen wrote:
> > > On Wed, Jan 17, 2001 at 02:51:19PM +0100, James R. Leu wrote:
> > > > Hello, 
> > > > 
> > > > On Wed, Jan 17, 2001 at 12:15:32PM +0100, Andi Kleen wrote:
> > > > > On Wed, Jan 17, 2001 at 04:08:14AM +0100, Werner Almesberger wrote:
> > > > > > James R. Leu wrote:
> > > > > > > I'm not asking for the impossible.  Sequence numbers and/or client
> > > > > > > to server ACKs would solve the problem.
> > > > > > 
> > > > > > So what do you do when the client doesn't ACK and you run out of 
> > > > > > buffer
> > > > > > space ? Block all activities that may trigger netlink messages ?
> > > > > > 
> > > > > > Obviously, in this case (interface up/down transitions), netlink 
> > > > > > doesn't
> > > > > > scale well. A state-based interface would be better, e.g. netlink 
> > > > > > could
> > > > > > generate a bit vector indicating the states (or the transitions, if 
> > > > > > it
> > > > > > matters whether any have occurred), and update the vector until it 
> > > > > > has
> > > > > > been read by the client. The question is of course whether we really
> > > > > > need an optimized, scalable solution for this.
> > > > > 
> > > > > A simple way is to delete ip addresses when you down an interface and 
> > > > > use
> > > > > regular SIOCGIFCONF. 
> > > > 
> > > > That is basically a dump of the entire interface table!  If we are 
> > > > talking
> > > > about 16K interfaces that is an awful lot of work just because an 
> > > > interface
> > > > when down or up.
> > > 
> > > 
> > > The thread was: when only a few interfaces go up/down then netlink 
> > > messages
> > > work fine.
> > > Then someone complained that the netlink buffer overflows when too many 
> > > interfaces
> > > go up/down. In this case you can do a whole resynchronization regularly 
> > > (e.g. every 
> > > minute) and do less work overall. 
> > 
> > Sorry for taking the previous comment out of context.
> > 
> > As far as your last comment "resynchronization regularly": I disagree
> > with it as well. :-)
> > 
> > The reason a notification system like netlink is created is to prevent the
> > clients from polling the kernel and doing aggregious dumps of information.
> > 
> > Simply pushing it off to the clients by making them poll for this 
> > information
> > is a hack.  A client should only have to dump the interface or routing table
> > when it first connects, from then on it's view of the interface and routing
> > table should be keep consistent via incremental and timely updates.  Period.
> > 
> > If netlink can not provide incremental, reliable and timely updates about 
> > the
> > status of interfaces and routes then we should change it so it can.
> 
> I don't think polling is a hack. It's just that a single strategy for
> synchronizing information is not the best in all cases. When you have lots
> of data and only minor bits change then it is best to only transmit the
> differences. When most of the data changes it is better to just dump the
> whole data regularly. 
> 
> Netlink supports both strategies, in addition you can use SIOCGIFCONF which
> is a bit cheaper for the dump case because it doesn't queue.
> Netlink even tells you when you should switch strategies.
> 
> What you should implement in your application depends on what you need to
> handle, usually you will implement both strategies because netlink requires
> you to support resynchronization anyways because it's not reliable. 
> SIOCGIFCONF is just another optional optimization for the dump case.
> 
> None of this is a hack. 

If an IP routing stack running on Linux ever hopes to achieve sub 50ms
end-to-end re-routing with 80K routes and 16K interface, netlink will have
to guarantee reliable, incremental and timely updates.  If netlink cannot
do it, routing software developers will either look to another operating
system that can provide this, or will completely by-pass the Linux IP stack
and relegate Linux to be an embedded trampoline for them to run there own
IP stack in user land.

I still assert that netlink's requirement for a client to re-read the entire
interface or routing table to maintain synchronization is a hack.

> -Andi

Jim
-- 
James R. Leu

<Prev in Thread] Current Thread [Next in Thread>