[Top] [All Lists]

Re: 802.1q Was (Re: Plans for 2.5 / 2.6 ???

To: Andrey Savochkin <saw@xxxxxxxxxxxxx>
Subject: Re: 802.1q Was (Re: Plans for 2.5 / 2.6 ???
From: Mitchell Blank Jr <mitch@xxxxxxxxxx>
Date: Mon, 5 Jun 2000 05:35:33 -0700
Cc: Ben Greear <greearb@xxxxxxxxxxxxxxx>, rob@xxxxxxxxxxx, buytenh@xxxxxxx, netdev@xxxxxxxxxxx, gleb@xxxxxxxxxxxxxxxxxxxxx, jamal <hadi@xxxxxxxxxx>
In-reply-to: <>; from on Mon, Jun 05, 2000 at 10:26:27AM +0800
References: <> <> <> <>
Sender: owner-netdev@xxxxxxxxxxx
Andrey Savochkin wrote:
> I want to ad my $0.02.

Certainly appreciated.

> Network devices in the second sense is only an abstraction.

Well, everything in networking is an abstraction.. we're just trying to
pick the right one. :-)

> Linux kernel do not bind IP addresses for devices.

OK, I being a bit sloppy with the terminology when stating my case.
What I was trying to demonstrate is why atm cards aren't net_devices's
currently.  If we consider that "net_devices" are things like "eth0",
"ppp0", or "lo" then an atm card isn't at the same level.  L3 protocols
don't run directly on top of an ATM card... you can certainly run
protocols capable of doing L3 on top of an ATM card (say, CLIP, for
example), but there isn't a strict 1-to-1 or even N-to-1 correspondence
between these protocls and cards (you could have 2 CLIP networks running
over a single card; CLIP network with two PVCs on different ATM cards;
you could have any combination of the above)  So its prettty clear
that the net_device has to be the CLIP (or LANE, or whatever) network,
not the raw ATM card.

Now, using this form of analysis, at what level is an ethernet VLAN?
Well, what can a VLAN do?  Well, it can implement the SIOC[GS]* ioctls
(i.e. it can be taken up or down), it can keep a net_dev_stats,
it can have entries in the ARP table (or AARP for that matter),
it can participate in a bridge group (net_bridge_port->dev),
it can report seperate statistics via SNMP (and thus should have
its own dev->ifindex), it can want different settings in
/proc/sys/net/ipv4/{neigh,conf}/DEV/, can have seperate IPv6
networks with different autoconfiguration (see ipv6/addrconf.c),
it could have IPX networks of different ipx_dlink_type's, we
could want to bind an AF_PACKET socket to it (tcpdump, etc),
we could want to make filtering or policy route decisions based
on incoming VLAN, etc.

In short, a VLAN can do just about anything a "real" ethernet interface
can do - the only exceptions are that it probably shouldn't be split
into another level of VLAN's (well, it could, but what other switch or
OS would support such a thing? :-), and it needs to coordinate with the
master device to handle multicast/promiscous.  So I think that any
solution that suggests VLAN decices should be completely different from
physical network devices (in the eyes of userland and all network
protocols) is SERIOUSLY suspicous.

The other thing is how would it work?  Assuming all the VLANs were
part of one net_device, what happens when a packet gets output to
that device?  Well, we have to determine the VLAN ID somehow, so
we'll need each device to implement an ARP-like table to map the
destination hwaddr to a vlan id, right?  Remember this has to happen
for every packet - when using normal net_device's, the destination
gets cached with each connected socket so this adds a per-packet
lookup where there was none before.

I think a reasonable goal for VLAN support would be that a machine
with one ethernet card on two VLANs should be able to do pretty much
the same as a machine with two seperate ethernet cards, with similar
configuration commands.  That is, after all, the promise of VLANs.

> Netfilters isn't a big problem, too.  A specific VLAN-id matching netfilter
> module is a clean and powerful solution.

Yes (and we definately need per-interface in/out netfilter hooks like
a "real" router in order to handle lots of ports), but that's just
one of the many differences listed above.

> It misses one of the most important properties of network devices -
> flow control.  Any code that doesn't provide flow control isn't a device, but 
> a
> code just manipulating of packet contents.

That's nice in theory, but it doesn't work in practice.  The idea of
what makes a net_device includes a lot of things (do a grep for
net_device in net/*/*.[ch] once).  MANY protocols rely on each
lan having its own net_device.

Flow control might not be perfect in VLANs, but that's really the
least of its problems.  The linux model of handling flow control
only can really handle simple devices that can be modeled as a FIFO
queue.  Take for example the current mess we have in the ATM
stack - suppose you have an atm net_device (can be CLIP or LANE,
doesn't matter) with two PVCs to host1 and host2.  These go
across different networks, and thus have different QoS available -
host1 is across a frac-T1 link which host2 is right on our local
OC-12 switch.  Suppose host1 gets enough traffic that the PVC
becomes full - what do we do?
  1. We could netif_stop_queue, but that would shut off all
     connectivity to host2, even though we could easily get
     packets to it
  2. We could just drop packets to host1, but then we're
     not providing the neccesary backpressure to make net
     schedulers work

Now before someone says "that's why ATM cards should be net_devices
and their protocols shouldn't be" keep in mind that ATM devices
provide packet sceduling to maintain the QoS on each VC, so
using the 1-1 corresponce between flow-controlled FIFO queues and
net_devices we might need *thousands* of net_devices to really
model a single ATM card.

So I agree that flow control is an important issue, but really we
need to seperate it from net_device anyway.  Ideally it could be
abstracted to a level where the ATM code could implement it on
each VC itself rather than putting it before thet net_device
so whether a packet needs to be queued would be decided based on
which VC we want to send the data down.  This would also have
the small advantage of liberating devices where flow control
doesn't make sense anyway (lo, dummy)


<Prev in Thread] Current Thread [Next in Thread>