From owner-netdev@oss.sgi.com Sun Jul 1 06:39:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f61Ddrg06095 for netdev-outgoing; Sun, 1 Jul 2001 06:39:53 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f61DdTV06028 for ; Sun, 1 Jul 2001 06:39:29 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id XAA11578; Sun, 1 Jul 2001 23:38:20 +1000 (EST) Message-ID: <3B3F27B7.D5FCBE75@uow.edu.au> Date: Sun, 01 Jul 2001 23:37:59 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: "David S. Miller" , netdev@oss.sgi.com, Alan Cox , Linus Torvalds , Russell King , ionut@cs.columbia.edu, Manfred Spraul , hfhsu@sis.com.tw, p_gortmaker@yahoo.com, tori@unhappy.mine.nu, breed@users.sourceforge.net, linux-tr@linuxtr.net Subject: Re: alloc_etherdev breaks ether= References: <3B3C0137.1896A895@uow.edu.au> <3B3C9089.D85A26@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk OK, here's a first cut at fixing this. background for newcomers: the new alloc_etherdev() API broke the `ether=' boot option because the name of the netdevice is not know at the start of probing. Patch is tested fairly thoroughly with 3c59x. The /sbin/hotplug stuff works OK. Modular and statically-linked drivers see the `ether=' commandline correctly. Failure of probe() works OK. The patch does the following: - Makes alloc_etherdev() instantiate the device name, and registers it in `hidden' state. So the device is on the dev_base list, but is not visible to open() attempts. This also has the effect of making the final device name known at the start of probe(), and reserves it from other, parallel probe methods. When a device is registered in `hidden' state we do *not* run /sbin/hotplug or the protocol notification. The device isn't ready yet. - Create new `publish_netdev()' API function. This unhides the previously registered device and runs /sbin/hotplug and the protocol notifier. - If the probe function fails, the prober must call unregister_netdev() on the device, then kfree it. Consequently: - unregister_netdev() will not run /sbin/hotplug and the protocol notifiers for a hidden device: it assumes the device has never really existed, and that the initial /sbin/hotplug and notifier cals were never made. - Converts all users of alloc_etherdev() and alloc_trdev() to use the new scheme. - Goes through all the (many) places where we traverse the dev_base list and tests `netif_device_hidden()' where it seems necessary to do so. If this is simply too putrid we can put together a `for_each_netdevice' macro. The "new scheme" is: xxx_probe() { dev = alloc_etherdev(sizeof(private)); /* allocates final name, reserves it, registers the device in `hidden' mode. Does not run /sbin/hotplug */ if (dev == 0) return -ESOMETHING; if (initialise_more_stuff() < 0) goto out_free_dev; publish_netdev(dev); /* Make device available to open, run /sbin/hotplug and protocol notifiers */ return 0; out_free_dev: unregister_netdev(dev); /* Does not run /sbin/hotplug, notifier */ kfree(dev); return -ESOMETHINGELSE; } A few things I noticed going through the various files: - tulip_core.c:tulip_mwi_config() local variable csr0 was used uninitialised. Took a punt and set it to zero. - fealnx.c: fealnx_pci_tbl[] __devinitdata was in the wrong place. - starfire.c was calling unregister_netdevice() on the output of alloc_etherdev(). This *used* to be incorrect. - drivers/net/Config.in allows me to select drivers/net/am79c961a.c on x86 platform. It'll only compile on ARM. I didn't fix this. - 8139too's setup scheme is a little different from the others. Jeff, please check carefully... - I did *not* go through the affected drivers making them use dev->name in their printk's again. Later. - I adopted the convention that publish_netdev() is called immediately prior to successful return from probe(). Fortunately it cannot fail, so there was a bit of cleaning up permitted in various drivers. - The 3c59x.c patch contains some extra stuff: - Instead of disabling scatter/gather if CONFIG_HIGHMEM is turned on, do it at runtime by checking the return value of nr_high_freepages(). This has the effect of permitting zerocopy to be used on highmem kernels if they're running on non-highmem hardware. - Decrease verbosity of boot messages from ~7 lines to two: 3c59x: Donald Becker and others. www.scyld.com/network/vortex.html eth0: 3Com PCI 3c980 Cyclone at 0xd400. Vers LK1.1.16 They can be restored by increasing the debug level. - debug level can be increased from `ether=' option, as per the docco. --- linux-2.4.6-pre8/include/linux/netdevice.h Wed May 2 22:00:07 2001 +++ linux-akpm/include/linux/netdevice.h Sun Jul 1 23:24:18 2001 @@ -202,7 +202,8 @@ enum netdev_state_t __LINK_STATE_START, __LINK_STATE_PRESENT, __LINK_STATE_SCHED, - __LINK_STATE_NOCARRIER + __LINK_STATE_NOCARRIER, + __LINK_STATE_HIDDEN, /* Exists, but may not be opened */ }; @@ -626,6 +627,11 @@ static inline void netif_device_attach(s } } +static inline int netif_device_hidden(struct net_device *dev) +{ + return test_bit(__LINK_STATE_HIDDEN, &dev->state); +} + /* These functions live elsewhere (drivers/net/net_init.c, but related) */ extern void ether_setup(struct net_device *dev); @@ -636,6 +642,7 @@ extern void fc_freedev(struct net_devic /* Support for loadable net-drivers */ extern int register_netdev(struct net_device *dev); extern void unregister_netdev(struct net_device *dev); +extern void publish_netdev(struct net_device *dev); /* Functions used for multicast support */ extern void dev_mc_upload(struct net_device *dev); extern int dev_mc_delete(struct net_device *dev, void *addr, int alen, int all); --- linux-2.4.6-pre8/drivers/net/net_init.c Mon May 28 13:31:47 2001 +++ linux-akpm/drivers/net/net_init.c Sun Jul 1 22:08:19 2001 @@ -29,6 +29,8 @@ allocation setups. Abolished the 16 card limits. 03/19/2000 - jgarzik and Urban Widmark: init_etherdev 32-byte align 03/21/2001 - jgarzik: alloc_etherdev and friends + 07/01/2001 - akpm: adjust alloc_etherdev() to register the device + in `hidden' mode. */ @@ -70,8 +72,7 @@ */ -static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, - void (*setup)(struct net_device *)) +static struct net_device *init_alloc_dev(int sizeof_priv) { struct net_device *dev; int alloc_size; @@ -91,32 +92,27 @@ static struct net_device *alloc_netdev(i if (sizeof_priv) dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); - setup(dev); - strcpy(dev->name, mask); - return dev; } -static struct net_device *init_alloc_dev(int sizeof_priv) +static struct net_device *alloc_netdev(int sizeof_priv, const char *mask, + void (*setup)(struct net_device *)) { struct net_device *dev; - int alloc_size; - - /* ensure 32-byte alignment of the private area */ - alloc_size = sizeof (*dev) + sizeof_priv + 31; - dev = (struct net_device *) kmalloc (alloc_size, GFP_KERNEL); - if (dev == NULL) - { - printk(KERN_ERR "alloc_dev: Unable to allocate device memory.\n"); - return NULL; + dev = init_alloc_dev(sizeof_priv); + if (dev) { + set_bit(__LINK_STATE_HIDDEN, &dev->state); + /* register_netdev() will allocate the final interface name */ + if (register_netdev(dev) < 0) { + kfree(dev); + dev = NULL; + goto out; + } + netdev_boot_setup_check(dev); + (*setup)(dev); } - - memset(dev, 0, alloc_size); - - if (sizeof_priv) - dev->priv = (void *) (((long)(dev + 1) + 31) & ~31); - +out: return dev; } @@ -210,7 +206,9 @@ struct net_device *init_etherdev(struct * for this ethernet device * * Fill in the fields of the device structure with ethernet-generic - * values. Basically does everything except registering the device. + * values. The device is then registered so that its name is known + * and reserved. However it remains in `hidden' state until it is + * made available to opens via publish_netdev(). * * Constructs a new net device, complete with a private data area of * size @sizeof_priv. A 32-byte (not bit) alignment is enforced for --- linux-2.4.6-pre8/net/core/dev.c Sun Jul 1 16:11:26 2001 +++ linux-akpm/net/core/dev.c Sun Jul 1 21:05:52 2001 @@ -399,17 +399,25 @@ __setup("netdev=", netdev_boot_setup); */ -struct net_device *__dev_get_by_name(const char *name) +static struct net_device *____dev_get_by_name(const char *name, int allow_hidden) { struct net_device *dev; for (dev = dev_base; dev != NULL; dev = dev->next) { - if (strncmp(dev->name, name, IFNAMSIZ) == 0) + if (strncmp(dev->name, name, IFNAMSIZ) == 0) { + if (!allow_hidden && netif_device_hidden(dev)) + continue; return dev; + } } return NULL; } +struct net_device *__dev_get_by_name(const char *name) +{ + return ____dev_get_by_name(name, 0); +} + /** * dev_get_by_name - find a device by its name * @name: name to find @@ -479,7 +487,7 @@ struct net_device * __dev_get_by_index(i struct net_device *dev; for (dev = dev_base; dev != NULL; dev = dev->next) { - if (dev->ifindex == ifindex) + if (dev->ifindex == ifindex && !netif_device_hidden(dev)) return dev; } return NULL; @@ -530,7 +538,8 @@ struct net_device *dev_getbyhwaddr(unsig for (dev = dev_base; dev != NULL; dev = dev->next) { if (dev->type == type && - memcmp(dev->dev_addr, ha, dev->addr_len) == 0) + memcmp(dev->dev_addr, ha, dev->addr_len) == 0 && + !netif_device_hidden(dev)) return dev; } return NULL; @@ -558,7 +567,7 @@ int dev_alloc_name(struct net_device *de */ for (i = 0; i < 100; i++) { sprintf(buf,name,i); - if (__dev_get_by_name(buf) == NULL) { + if (____dev_get_by_name(buf, 1) == NULL) { strcpy(dev->name, buf); return i; } @@ -1624,6 +1633,8 @@ static int dev_ifconf(char *arg) total = 0; for (dev = dev_base; dev != NULL; dev = dev->next) { + if (netif_device_hidden(dev)) + continue; for (i=0; ideadbeaf = 0; write_unlock_bh(&dev_base_lock); + if (!netif_device_hidden(dev)) { + /* Notify protocols, that a new device appeared. */ + notifier_call_chain(&netdev_chain, NETDEV_REGISTER, dev); + net_run_sbin_hotplug(dev, "register"); + } + return 0; +} + +/** + * publish_netdev - publish a registered but hidden netdevice + * @dev: device to register + * + * Take a completed network device structure and make it available for + * opening. A %NETDEV_REGISTER message is sent to the netdev notifier + * chain. 0 is returned on success. A negative errno code is returned + * on a failure. + */ + +void publish_netdev(struct net_device *dev) +{ + rtnl_lock(); + if (!netif_device_hidden(dev)) { + printk(KERN_ERR "Unhidden device `%s' passed to " + __FUNCTION__ "\n", dev->name); + } + clear_bit(__LINK_STATE_HIDDEN, &dev->state); /* Notify protocols, that a new device appeared. */ notifier_call_chain(&netdev_chain, NETDEV_REGISTER, dev); - net_run_sbin_hotplug(dev, "register"); - - return 0; + rtnl_unlock(); } + /** * netdev_finish_unregister - complete unregistration * @dev: device @@ -2538,13 +2575,15 @@ int unregister_netdevice(struct net_devi /* Shutdown queueing discipline. */ dev_shutdown(dev); - net_run_sbin_hotplug(dev, "unregister"); - - /* Notify protocols, that we are about to destroy - this device. They should clean all the things. - */ - notifier_call_chain(&netdev_chain, NETDEV_UNREGISTER, dev); + if (!netif_device_hidden(dev)) { + net_run_sbin_hotplug(dev, "unregister"); + /* Notify protocols, that we are about to destroy + * this device. They should clean all the things. + */ + notifier_call_chain(&netdev_chain, + NETDEV_UNREGISTER, dev); + } /* * Flush the multicast chain */ --- linux-2.4.6-pre8/net/core/dev_mcast.c Tue Oct 17 06:43:50 2000 +++ linux-akpm/net/core/dev_mcast.c Sun Jul 1 23:12:36 2001 @@ -228,6 +228,8 @@ static int dev_mc_read_proc(char *buffer read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { + if (netif_device_hidden(dev)) + continue; spin_lock_bh(&dev->xmit_lock); for (m = dev->mc_list; m; m = m->next) { int i; --- linux-2.4.6-pre8/net/core/rtnetlink.c Mon Feb 28 13:45:10 2000 +++ linux-akpm/net/core/rtnetlink.c Sun Jul 1 23:13:08 2001 @@ -215,6 +215,8 @@ int rtnetlink_dump_ifinfo(struct sk_buff read_lock(&dev_base_lock); for (dev=dev_base, idx=0; dev; dev = dev->next, idx++) { + if (netif_device_hidden(dev)) + continue; if (idx < s_idx) continue; if (rtnetlink_fill_ifinfo(skb, dev, RTM_NEWLINK, NETLINK_CB(cb->skb).pid, cb->nlh->nlmsg_seq, 0) <= 0) --- linux-2.4.6-pre8/net/ipv4/igmp.c Wed May 2 22:00:08 2001 +++ linux-akpm/net/ipv4/igmp.c Sun Jul 1 23:15:22 2001 @@ -787,9 +787,14 @@ int ip_mc_procinfo(char *buffer, char ** read_lock(&dev_base_lock); for(dev = dev_base; dev; dev = dev->next) { - struct in_device *in_dev = in_dev_get(dev); - char *querier = "NONE"; + struct in_device *in_dev; + char *querier; + if (netif_device_hidden(dev)) + continue; + + in_dev = in_dev_get(dev); + querier = "NONE"; if (in_dev == NULL) continue; --- linux-2.4.6-pre8/net/ipv4/devinet.c Mon May 28 13:31:50 2001 +++ linux-akpm/net/ipv4/devinet.c Sun Jul 1 23:25:26 2001 @@ -731,6 +731,9 @@ u32 inet_select_addr(const struct net_de read_lock(&dev_base_lock); read_lock(&inetdev_lock); for (dev=dev_base; dev; dev=dev->next) { + /* typecast avoids a compiler constness warning */ + if (netif_device_hidden((struct net_device *)dev)) + continue; if ((in_dev=__in_dev_get(dev)) == NULL) continue; @@ -878,6 +881,8 @@ static int inet_dump_ifaddr(struct sk_bu s_ip_idx = ip_idx = cb->args[1]; read_lock(&dev_base_lock); for (dev=dev_base, idx=0; dev; dev = dev->next, idx++) { + if (netif_device_hidden(dev)) + continue; if (idx < s_idx) continue; if (idx > s_idx) @@ -982,6 +987,8 @@ void inet_forward_change() read_lock(&dev_base_lock); for (dev = dev_base; dev; dev = dev->next) { struct in_device *in_dev; + if (netif_device_hidden(dev)) + continue; read_lock(&inetdev_lock); in_dev = __in_dev_get(dev); if (in_dev) --- linux-2.4.6-pre8/net/sched/sch_api.c Tue Jan 2 04:57:08 2001 +++ linux-akpm/net/sched/sch_api.c Sun Jul 1 23:19:47 2001 @@ -804,6 +804,8 @@ static int tc_dump_qdisc(struct sk_buff s_q_idx = q_idx = cb->args[1]; read_lock(&dev_base_lock); for (dev=dev_base, idx=0; dev; dev = dev->next, idx++) { + if (netif_device_hidden(dev)) + continue; if (idx < s_idx) continue; if (idx > s_idx) --- linux-2.4.6-pre8/arch/sparc64/solaris/ioctl.c Wed Nov 29 16:53:44 2000 +++ linux-akpm/arch/sparc64/solaris/ioctl.c Sun Jul 1 23:21:40 2001 @@ -670,7 +670,11 @@ static inline int solaris_i(unsigned int int i = 0; read_lock_bh(&dev_base_lock); - for (d = dev_base; d; d = d->next) i++; + for (d = dev_base; d; d = d->next) { + if (netif_device_hidden(dev)) + continue; + i++; + } read_unlock_bh(&dev_base_lock); if (put_user (i, (int *)A(arg))) --- linux-2.4.6-pre8/net/netsyms.c Sun Jul 1 16:11:26 2001 +++ linux-akpm/net/netsyms.c Sun Jul 1 20:43:06 2001 @@ -456,6 +456,7 @@ EXPORT_SYMBOL(unregister_netdevice_notif EXPORT_SYMBOL(loopback_dev); EXPORT_SYMBOL(register_netdevice); EXPORT_SYMBOL(unregister_netdevice); +EXPORT_SYMBOL(publish_netdev); EXPORT_SYMBOL(netdev_state_change); EXPORT_SYMBOL(dev_new_index); EXPORT_SYMBOL(dev_get_by_index); --- linux-2.4.6-pre8/drivers/net/3c59x.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/3c59x.c Sun Jul 1 23:07:11 2001 @@ -149,6 +149,11 @@ - Implemented alloc_etherdev() API - Special-case the 'Tx error 82' message. + LK1.1.16 1 July 2001 akpm + - Make NETIF_F_SG dependent upon nr_free_highpages(), not on CONFIG_HIGHMEM + - Lessen verbosity of bootup messages + - Use publish_netdev() API + - See http://www.uow.edu.au/~andrewm/linux/#3c59x-2.3 for more details. - Also see Documentation/networking/vortex.txt */ @@ -164,8 +169,8 @@ #define DRV_NAME "3c59x" -#define DRV_VERSION "LK1.1.15" -#define DRV_RELDATE "6 June 2001" +#define DRV_VERSION "LK1.1.16" +#define DRV_RELDATE "29 June 2001" @@ -223,6 +228,7 @@ static int vortex_debug = 1; #include #include #include +#include #include /* For NR_IRQS only. */ #include #include @@ -237,10 +243,11 @@ static int vortex_debug = 1; static char version[] __devinitdata = -DRV_NAME ".c:" DRV_VERSION " " DRV_RELDATE " Donald Becker and others. http://www.scyld.com/network/vortex.html\n"; +DRV_NAME ": Donald Becker and others. www.scyld.com/network/vortex.html\n"; MODULE_AUTHOR("Donald Becker "); -MODULE_DESCRIPTION("3Com 3c59x/3c90x/3c575 series Vortex/Boomerang/Cyclone driver"); +MODULE_DESCRIPTION("3Com 3c59x/3c9xx ethernet driver " + DRV_VERSION " " DRV_RELDATE); MODULE_PARM(debug, "i"); MODULE_PARM(options, "1-" __MODULE_STRING(8) "i"); MODULE_PARM(full_duplex, "1-" __MODULE_STRING(8) "i"); @@ -953,16 +960,13 @@ static int __devinit vortex_probe1(struc static int printed_version; int retval; struct vortex_chip_info * const vci = &vortex_info_tbl[chip_idx]; - char *print_name; + int print_info; if (!printed_version) { printk (KERN_INFO "%s", version); - printk (KERN_INFO "See Documentation/networking/vortex.txt\n"); printed_version = 1; } - print_name = pdev ? pdev->slot_name : "3c59x"; - dev = alloc_etherdev(sizeof(*vp)); retval = -ENOMEM; if (!dev) { @@ -971,8 +975,32 @@ static int __devinit vortex_probe1(struc } SET_MODULE_OWNER(dev); - printk(KERN_INFO "%s: 3Com %s %s at 0x%lx, ", - print_name, + /* The lower four bits are the media type. */ + if (dev->mem_start) { + /* + * The 'options' param is passed in as the third arg to the + * LILO 'ether=' argument for non-modular use + */ + option = dev->mem_start; + } + else if (card_idx < MAX_UNITS) + option = options[card_idx]; + else + option = -1; + + if (option > 0) { + if (option & 0x0400) + vortex_debug = 2; + if (option & 0x0800) + vortex_debug = 7; + } + + print_info = (vortex_debug > 1); + if (print_info) + printk (KERN_INFO "See Documentation/networking/vortex.txt\n"); + + printk(KERN_INFO "%s: 3Com %s %s at 0x%lx. Vers " DRV_VERSION "\n", + dev->name, pdev ? "PCI" : "EISA", vci->name, ioaddr); @@ -996,7 +1024,7 @@ static int __devinit vortex_probe1(struc if (pdev) { /* EISA resources already marked, so only PCI needs to do this here */ /* Ignore return value, because Cardbus drivers already allocate for us */ - if (request_region(ioaddr, vci->io_size, print_name) != NULL) + if (request_region(ioaddr, vci->io_size, dev->name) != NULL) vp->must_free_region = 1; /* enable bus-mastering if necessary */ @@ -1015,7 +1043,7 @@ static int __devinit vortex_probe1(struc if (pci_latency < new_latency) { printk(KERN_INFO "%s: Overriding PCI latency" " timer (CFLT) setting of %d, new value is %d.\n", - print_name, pci_latency, new_latency); + dev->name, pci_latency, new_latency); pci_write_config_byte(pdev, PCI_LATENCY_TIMER, new_latency); } } @@ -1041,19 +1069,6 @@ static int __devinit vortex_probe1(struc if (pdev) pci_set_drvdata(pdev, dev); - /* The lower four bits are the media type. */ - if (dev->mem_start) { - /* - * AKPM: ewww.. The 'options' param is passed in as the third arg to the - * LILO 'ether=' argument for non-modular use - */ - option = dev->mem_start; - } - else if (card_idx < MAX_UNITS) - option = options[card_idx]; - else - option = -1; - vp->media_override = 7; if (option >= 0) { vp->media_override = ((option & 7) == 2) ? 0 : option & 15; @@ -1110,27 +1125,33 @@ static int __devinit vortex_probe1(struc printk(" ***INVALID CHECKSUM %4.4x*** ", checksum); for (i = 0; i < 3; i++) ((u16 *)dev->dev_addr)[i] = htons(eeprom[i + 10]); - for (i = 0; i < 6; i++) - printk("%c%2.2x", i ? ':' : ' ', dev->dev_addr[i]); + if (print_info) { + for (i = 0; i < 6; i++) + printk("%c%2.2x", i ? ':' : ' ', dev->dev_addr[i]); + } EL3WINDOW(2); for (i = 0; i < 6; i++) outb(dev->dev_addr[i], ioaddr + i); #ifdef __sparc__ - printk(", IRQ %s\n", __irq_itoa(dev->irq)); + if (print_info) + printk(", IRQ %s\n", __irq_itoa(dev->irq)); #else - printk(", IRQ %d\n", dev->irq); + if (print_info) + printk(", IRQ %d\n", dev->irq); /* Tell them about an invalid IRQ. */ - if (vortex_debug && (dev->irq <= 0 || dev->irq >= NR_IRQS)) + if (dev->irq <= 0 || dev->irq >= NR_IRQS) printk(KERN_WARNING " *** Warning: IRQ %d is unlikely to work! ***\n", dev->irq); #endif EL3WINDOW(4); step = (inb(ioaddr + Wn4_NetDiag) & 0x1e) >> 1; - printk(KERN_INFO " product code %02x%02x rev %02x.%d date %02d-" - "%02d-%02d\n", eeprom[6]&0xff, eeprom[6]>>8, eeprom[0x14], - step, (eeprom[4]>>5) & 15, eeprom[4] & 31, eeprom[4]>>9); + if (print_info) { + printk(KERN_INFO " product code %02x%02x rev %02x.%d date %02d-" + "%02d-%02d\n", eeprom[6]&0xff, eeprom[6]>>8, eeprom[0x14], + step, (eeprom[4]>>5) & 15, eeprom[4] & 31, eeprom[4]>>9); + } if (pdev && vci->drv_flags & HAS_CB_FNS) { @@ -1144,8 +1165,10 @@ static int __devinit vortex_probe1(struc if (!vp->cb_fn_base) goto free_ring; } - printk(KERN_INFO "%s: CardBus functions mapped %8.8lx->%p\n", - print_name, fn_st_addr, vp->cb_fn_base); + if (print_info) { + printk(KERN_INFO "%s: CardBus functions mapped %8.8lx->%p\n", + dev->name, fn_st_addr, vp->cb_fn_base); + } EL3WINDOW(2); n = inw(ioaddr + Wn2_ResetOptions) & ~0x4010; @@ -1163,7 +1186,8 @@ static int __devinit vortex_probe1(struc if (vp->info1 & 0x8000) { vp->full_duplex = 1; - printk(KERN_INFO "Full duplex capable\n"); + if (print_info) + printk(KERN_INFO "Full duplex capable\n"); } { @@ -1174,16 +1198,17 @@ static int __devinit vortex_probe1(struc if ((vp->available_media & 0xff) == 0) /* Broken 3c916 */ vp->available_media = 0x40; config = inl(ioaddr + Wn3_Config); - if (vortex_debug > 1) + if (print_info) { printk(KERN_DEBUG " Internal config register is %4.4x, " "transceivers %#x.\n", config, inw(ioaddr + Wn3_Options)); - printk(KERN_INFO " %dK %s-wide RAM %s Rx:Tx split, %s%s interface.\n", - 8 << RAM_SIZE(config), - RAM_WIDTH(config) ? "word" : "byte", - ram_split[RAM_SPLIT(config)], - AUTOSELECT(config) ? "autoselect/" : "", - XCVR(config) > XCVR_ExtMII ? "" : - media_tbl[XCVR(config)].name); + printk(KERN_INFO " %dK %s-wide RAM %s Rx:Tx split, %s%s interface.\n", + 8 << RAM_SIZE(config), + RAM_WIDTH(config) ? "word" : "byte", + ram_split[RAM_SPLIT(config)], + AUTOSELECT(config) ? "autoselect/" : "", + XCVR(config) > XCVR_ExtMII ? "" : + media_tbl[XCVR(config)].name); + } vp->default_media = XCVR(config); if (vp->default_media == XCVR_NWAY) vp->has_nway = 1; @@ -1191,8 +1216,9 @@ static int __devinit vortex_probe1(struc } if (vp->media_override != 7) { - printk(KERN_INFO " Media override to transceiver type %d (%s).\n", - vp->media_override, media_tbl[vp->media_override].name); + printk(KERN_INFO "%s: Media override to transceiver type %d (%s).\n", + dev->name, vp->media_override, + media_tbl[vp->media_override].name); dev->if_port = vp->media_override; } else dev->if_port = vp->default_media; @@ -1219,8 +1245,10 @@ static int __devinit vortex_probe1(struc mii_status = mdio_read(dev, phyx, 1); if (mii_status && mii_status != 0xffff) { vp->phys[phy_idx++] = phyx; - printk(KERN_INFO " MII transceiver found at address %d," - " status %4x.\n", phyx, mii_status); + if (print_info) { + printk(KERN_INFO " MII transceiver found at address %d," + " status %4x.\n", phyx, mii_status); + } if ((mii_status & 0x0040) == 0) mii_preamble_required++; } @@ -1244,8 +1272,10 @@ static int __devinit vortex_probe1(struc if (vp->capabilities & CapBusMaster) { vp->full_bus_master_tx = 1; - printk(KERN_INFO" Enabling bus-master transmits and %s receives.\n", - (vp->info2 & 1) ? "early" : "whole-frame" ); + if (print_info) { + printk(KERN_INFO " Enabling bus-master transmits and %s receives.\n", + (vp->info2 & 1) ? "early" : "whole-frame" ); + } vp->full_bus_master_rx = (vp->info2 & 1) ? 1 : 2; vp->bus_master = 0; /* AKPM: vortex only */ } @@ -1254,10 +1284,10 @@ static int __devinit vortex_probe1(struc dev->open = vortex_open; if (vp->full_bus_master_tx) { dev->hard_start_xmit = boomerang_start_xmit; -#ifndef CONFIG_HIGHMEM - /* Actually, it still should work with iommu. */ - dev->features |= NETIF_F_SG; -#endif + if (nr_free_highpages() == 0) { + /* Actually, it still should work with iommu. */ + dev->features |= NETIF_F_SG; + } if (((hw_checksums[card_idx] == -1) && (vp->drv_flags & HAS_HWCKSM)) || (hw_checksums[card_idx] == 1)) { dev->features |= NETIF_F_IP_CSUM; @@ -1266,9 +1296,9 @@ static int __devinit vortex_probe1(struc dev->hard_start_xmit = vortex_start_xmit; } - if (vortex_debug > 0) { + if (print_info) { printk(KERN_INFO "%s: scatter/gather %sabled. h/w checksums %sabled\n", - print_name, + dev->name, (dev->features & NETIF_F_SG) ? "en":"dis", (dev->features & NETIF_F_IP_CSUM) ? "en":"dis"); } @@ -1279,9 +1309,8 @@ static int __devinit vortex_probe1(struc dev->set_multicast_list = set_rx_mode; dev->tx_timeout = vortex_tx_timeout; dev->watchdog_timeo = (watchdog * HZ) / 1000; - retval = register_netdev(dev); - if (retval == 0) - return 0; + publish_netdev(dev); + return 0; free_ring: pci_free_consistent(pdev, @@ -1292,6 +1321,7 @@ free_ring: free_region: if (vp->must_free_region) release_region(ioaddr, vci->io_size); + unregister_netdev(dev); kfree (dev); printk(KERN_ERR PFX "vortex_probe1 fails. Returns %d\n", retval); out: @@ -1345,19 +1375,23 @@ vortex_up(struct net_device *dev) dev->if_port = vp->media_override; } else if (vp->autoselect) { if (vp->has_nway) { - printk(KERN_INFO "%s: using NWAY device table, not %d\n", dev->name, dev->if_port); + if (vortex_debug > 1) + printk(KERN_INFO "%s: using NWAY device table, not %d\n", + dev->name, dev->if_port); dev->if_port = XCVR_NWAY; } else { /* Find first available media type, starting with 100baseTx. */ dev->if_port = XCVR_100baseTx; while (! (vp->available_media & media_tbl[dev->if_port].mask)) dev->if_port = media_tbl[dev->if_port].next; - printk(KERN_INFO "%s: first available media type: %s\n", + if (vortex_debug > 1) + printk(KERN_INFO "%s: first available media type: %s\n", dev->name, media_tbl[dev->if_port].name); } } else { dev->if_port = vp->default_media; - printk(KERN_INFO "%s: using default media %s\n", + if (vortex_debug > 1) + printk(KERN_INFO "%s: using default media %s\n", dev->name, media_tbl[dev->if_port].name); } --- linux-2.4.6-pre8/drivers/net/starfire.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/starfire.c Sun Jul 1 22:01:34 2001 @@ -731,10 +731,6 @@ static int __devinit starfire_init_one(s if (mtu) dev->mtu = mtu; - i = register_netdev(dev); - if (i) - goto err_out_cleardev; - printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, netdrv_tbl[chip_idx].name, ioaddr); for (i = 0; i < 5; i++) @@ -769,11 +765,9 @@ static int __devinit starfire_init_one(s np->phy_cnt = phy_idx; } + publish_netdev(dev); return 0; -err_out_cleardev: - pci_set_drvdata(pdev, NULL); - iounmap((void *)ioaddr); err_out_free_res: pci_release_regions (pdev); err_out_free_netdev: --- linux-2.4.6-pre8/drivers/net/via-rhine.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/via-rhine.c Sun Jul 1 22:01:47 2001 @@ -640,10 +640,6 @@ static int __devinit via_rhine_init_one dev->do_ioctl = mii_ioctl; dev->tx_timeout = via_rhine_tx_timeout; dev->watchdog_timeo = TX_TIMEOUT; - - i = register_netdev(dev); - if (i) - goto err_out_unmap; printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, via_rhine_chip_info[chip_id].name, ioaddr); @@ -694,6 +690,7 @@ static int __devinit via_rhine_init_one } } + publish_netdev(dev); return 0; err_out_unmap: @@ -703,6 +700,7 @@ err_out_free_res: #endif pci_release_regions(pdev); err_out_free_netdev: + unregister_netdev(dev); kfree (dev); err_out: return -ENODEV; --- linux-2.4.6-pre8/drivers/net/yellowfin.c Sun Jul 1 16:11:25 2001 +++ linux-akpm/drivers/net/yellowfin.c Sun Jul 1 22:01:56 2001 @@ -496,10 +496,6 @@ static int __devinit yellowfin_init_one( if (mtu) dev->mtu = mtu; - i = register_netdev(dev); - if (i) - goto err_out_cleardev; - printk(KERN_INFO "%s: %s type %8x at 0x%lx, ", dev->name, pci_id_tbl[chip_idx].name, inl(ioaddr + ChipRev), ioaddr); for (i = 0; i < 5; i++) @@ -523,6 +519,7 @@ static int __devinit yellowfin_init_one( find_cnt++; + publish_netdev(dev); return 0; err_out_cleardev: @@ -533,6 +530,7 @@ err_out_free_res: #endif pci_release_regions(pdev); err_out_free_netdev: + unregister_netdev(dev); kfree (dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/sis900.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/sis900.c Sun Jul 1 22:02:04 2001 @@ -362,10 +362,6 @@ static int __devinit sis900_probe (struc net_dev->do_ioctl = &mii_ioctl; net_dev->tx_timeout = sis900_tx_timeout; net_dev->watchdog_timeo = TX_TIMEOUT; - - ret = register_netdev(net_dev); - if (ret) - goto err_out_cleardev; /* Get Mac address according to the chip revision */ pci_read_config_byte(pci_dev, PCI_CLASS_REVISION, &revision); @@ -380,13 +376,13 @@ static int __devinit sis900_probe (struc if (ret == 0) { ret = -ENODEV; - goto err_out_unregister; + goto err_out_cleardev; } /* probe for mii transciver */ if (sis900_mii_probe(net_dev) == 0) { ret = -ENODEV; - goto err_out_unregister; + goto err_out_cleardev; } /* print some information about our NIC */ @@ -396,14 +392,14 @@ static int __devinit sis900_probe (struc printk("%2.2x:", (u8)net_dev->dev_addr[i]); printk("%2.2x.\n", net_dev->dev_addr[i]); + publish_netdev(net_dev); return 0; - err_out_unregister: - unregister_netdev(net_dev); err_out_cleardev: pci_set_drvdata(pci_dev, NULL); pci_release_regions(pci_dev); err_out: + unregister_netdev(net_dev); kfree(net_dev); return ret; } --- linux-2.4.6-pre8/drivers/net/epic100.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/epic100.c Sun Jul 1 22:02:10 2001 @@ -520,16 +520,13 @@ static int __devinit epic_init_one (stru dev->watchdog_timeo = TX_TIMEOUT; dev->tx_timeout = &epic_tx_timeout; - i = register_netdev(dev); - if (i) - goto err_out_unmap_tx; - printk(KERN_INFO "%s: %s at %#lx, IRQ %d, ", dev->name, pci_id_tbl[chip_idx].name, ioaddr, dev->irq); for (i = 0; i < 5; i++) printk("%2.2x:", dev->dev_addr[i]); printk("%2.2x.\n", dev->dev_addr[i]); + publish_netdev(dev); return 0; err_out_unmap_tx: @@ -541,6 +538,7 @@ err_out_free_res: #endif pci_release_regions(pdev); err_out_free_netdev: + unregister_netdev(dev); kfree(dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/ne2k-pci.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/ne2k-pci.c Sun Jul 1 22:02:16 2001 @@ -361,10 +361,6 @@ static int __devinit ne2k_pci_init_one ( dev->do_ioctl = &netdev_ioctl; NS8390_init(dev, 0); - i = register_netdev(dev); - if (i) - goto err_out_free_8390; - printk("%s: %s found at %#lx, IRQ %d, ", dev->name, pci_clone_list[chip_idx].name, ioaddr, dev->irq); for(i = 0; i < 6; i++) { @@ -372,11 +368,13 @@ static int __devinit ne2k_pci_init_one ( dev->dev_addr[i] = SA_prom[i]; } + publish_netdev(dev); return 0; err_out_free_8390: kfree(dev->priv); err_out_free_netdev: + unregister_netdev(dev); kfree (dev); err_out_free_res: release_region (ioaddr, NE_IO_EXTENT); --- linux-2.4.6-pre8/drivers/net/dmfe.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/dmfe.c Sun Jul 1 22:02:22 2001 @@ -496,9 +496,6 @@ static int __devinit dmfe_init_one (stru for (i = 0; i < 6; i++) dev->dev_addr[i] = db->srom[20 + i]; - i = register_netdev (dev); - if (i) goto err_out; - printk(KERN_INFO "%s: Davicom DM%04lx at 0x%lx,", dev->name, ent->driver_data >> 16, @@ -507,10 +504,12 @@ static int __devinit dmfe_init_one (stru printk("%c%02x", i ? ':' : ' ', dev->dev_addr[i]); printk(", IRQ %d\n", pci_irqline); + publish_netdev (dev); return 0; err_out: pci_set_drvdata(pdev, NULL); + unregister_netdev(dev); kfree(dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/8139too.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/8139too.c Sun Jul 1 23:29:00 2001 @@ -898,6 +898,7 @@ match: return 0; err_out: + unregister_netdev(dev); __rtl8139_cleanup_dev (dev); DPRINTK ("EXIT, returning %d\n", rc); return rc; @@ -971,11 +972,6 @@ static int __devinit rtl8139_init_one (s init_waitqueue_head (&tp->thr_wait); init_MUTEX_LOCKED (&tp->thr_exited); - /* dev is fully set up and ready to use now */ - DPRINTK("about to register device named %s (%p)...\n", dev->name, dev); - i = register_netdev (dev); - if (i) goto err_out; - pci_set_drvdata (pdev, dev); printk (KERN_INFO "%s: %s at 0x%lx, " @@ -1047,10 +1043,13 @@ static int __devinit rtl8139_init_one (s if (rtl_chip_info[tp->chipset].flags & HasHltClk) RTL_W8 (HltClk, 'H'); /* 'R' would leave the clock running. */ + /* dev is fully set up and ready to use now */ + DPRINTK("about to publish device named %s (%p)...\n", dev->name, dev); + publish_netdev (dev); + DPRINTK ("EXIT - returning 0\n"); return 0; -err_out: __rtl8139_cleanup_dev (dev); DPRINTK ("EXIT - returning %d\n", i); return i; --- linux-2.4.6-pre8/drivers/net/tulip/tulip_core.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/tulip/tulip_core.c Sun Jul 1 22:14:18 2001 @@ -1253,7 +1253,7 @@ static void __devinit tulip_mwi_config ( if (tulip_debug > 3) printk(KERN_DEBUG "%s: tulip_mwi_config()\n", pdev->slot_name); - tp->csr0 = 0; + tp->csr0 = csr0 = 0; /* check for sane cache line size. from acenic.c. */ pci_read_config_byte(pdev, PCI_CACHE_LINE_SIZE, &cache); @@ -1314,7 +1314,6 @@ static void __devinit tulip_mwi_config ( tp->csr0 = csr0; goto out; -early_out: if (csr0 & MWI) { pci_command &= ~PCI_COMMAND_INVALIDATE; pci_write_config_word(pdev, PCI_COMMAND, pci_command); @@ -1657,9 +1656,6 @@ static int __devinit tulip_init_one (str dev->do_ioctl = private_ioctl; dev->set_multicast_list = set_rx_mode; - if (register_netdev(dev)) - goto err_out_free_ring; - printk(KERN_INFO "%s: %s rev %d at %#3lx,", dev->name, tulip_tbl[chip_idx].chip_name, chip_rev, ioaddr); pci_set_drvdata(pdev, dev); @@ -1742,6 +1738,7 @@ static int __devinit tulip_init_one (str /* put the chip in snooze mode until opened */ tulip_set_power_state (tp, 0, 1); + publish_netdev(dev); return 0; err_out_free_ring: @@ -1761,6 +1758,7 @@ err_out_free_res: pci_release_regions (pdev); err_out_free_netdev: + unregister_netdev(dev); kfree (dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/tulip/ChangeLog Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/tulip/ChangeLog Sun Jul 1 21:50:24 2001 @@ -1,3 +1,9 @@ +2001-07-01 Andrew Morton + + * tulip_core.c: + Use new publish_netdev() API. + Only publish the device after all initialisation has completed. + 2001-06-16 Jeff Garzik * tulip.h, tulip_core.c: --- linux-2.4.6-pre8/drivers/net/pcmcia/xircom_tulip_cb.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/pcmcia/xircom_tulip_cb.c Sun Jul 1 22:03:40 2001 @@ -765,15 +765,6 @@ static struct net_device *tulip_probe1(s break; } - if (register_netdev(dev)) { - request_region(ioaddr, tulip_tbl[chip_idx].io_size, "xircom_tulip_cb"); - if (tp->mtable) - kfree(tp->mtable); - kfree(dev->priv); - kfree(dev); - return NULL; - } - printk(KERN_INFO "%s: %s rev %d at %#3lx,", dev->name, tulip_tbl[chip_idx].chip_name, chip_rev, ioaddr); for (i = 0; i < 6; i++) @@ -781,6 +772,7 @@ static struct net_device *tulip_probe1(s last_phys_addr[i] = dev->dev_addr[i]); printk(", IRQ %d.\n", irq); + publish_netdev(dev); return dev; } --- linux-2.4.6-pre8/drivers/net/sundance.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/sundance.c Sun Jul 1 22:25:21 2001 @@ -470,10 +470,6 @@ static int __devinit sundance_probe1 (st if (mtu) dev->mtu = mtu; - i = register_netdev(dev); - if (i) - goto err_out_cleardev; - printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, pci_id_tbl[chip_idx].name, ioaddr); for (i = 0; i < 5; i++) @@ -508,16 +504,16 @@ static int __devinit sundance_probe1 (st printk("ASIC Control is now %x.\n", readl(ioaddr + ASICCtrl)); card_idx++; + publish_netdev(dev); return 0; -err_out_cleardev: - pci_set_drvdata(pdev, NULL); #ifndef USE_IO_OPS iounmap((void *)ioaddr); err_out_res: #endif pci_release_regions(pdev); err_out_netdev: + unregister_netdev(dev); kfree (dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/hamachi.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/hamachi.c Sun Jul 1 21:53:50 2001 @@ -712,15 +712,6 @@ static int __init hamachi_init_one (stru if (mtu) dev->mtu = mtu; - i = register_netdev(dev); - if (i) { - kfree(dev); - iounmap((char *)ioaddr); - pci_release_regions(pdev); - pci_set_drvdata(pdev, NULL); - return i; - } - printk(KERN_INFO "%s: %s type %x at 0x%lx, ", dev->name, chip_tbl[chip_id].name, readl(ioaddr + ChipRev), ioaddr); @@ -755,6 +746,7 @@ static int __init hamachi_init_one (stru writew(0x1000, ioaddr + ANCtrl); /* Enable negotiation */ card_idx++; + publish_netdev(dev); return 0; } --- linux-2.4.6-pre8/drivers/net/natsemi.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/natsemi.c Sun Jul 1 21:56:15 2001 @@ -457,18 +457,13 @@ static int __devinit natsemi_probe1 (str SET_MODULE_OWNER(dev); i = pci_request_regions(pdev, dev->name); - if (i) { - kfree(dev); - return i; - } + if (i) + goto out_release_dev; { void *mmio = ioremap (ioaddr, iosize); - if (!mmio) { - pci_release_regions(pdev); - kfree(dev); - return -ENOMEM; - } + if (!mmio) + goto out_release_regions; ioaddr = (unsigned long) mmio; } @@ -521,15 +516,6 @@ static int __devinit natsemi_probe1 (str if (mtu) dev->mtu = mtu; - i = register_netdev(dev); - if (i) { - pci_release_regions(pdev); - unregister_netdev(dev); - kfree(dev); - pci_set_drvdata(pdev, NULL); - return i; - } - printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, natsemi_pci_info[chip_idx].name, ioaddr); for (i = 0; i < ETH_ALEN-1; i++) @@ -548,8 +534,13 @@ static int __devinit natsemi_probe1 (str } printk(KERN_INFO "%s: Transceiver status 0x%4.4x advertising %4.4x.\n", dev->name, (int)readl(ioaddr + 0x84), np->advertising); - + publish_netdev(dev); return 0; +out_release_regions: + pci_release_regions(pdev); +out_release_dev: + kfree(dev); + return -ENOMEM; } --- linux-2.4.6-pre8/drivers/net/winbond-840.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/winbond-840.c Sun Jul 1 22:24:52 2001 @@ -470,10 +470,6 @@ static int __devinit w840_probe1 (struct dev->tx_timeout = &tx_timeout; dev->watchdog_timeo = TX_TIMEOUT; - i = register_netdev(dev); - if (i) - goto err_out_cleardev; - printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, pci_id_tbl[chip_idx].name, ioaddr); for (i = 0; i < 5; i++) @@ -500,16 +496,16 @@ static int __devinit w840_probe1 (struct } find_cnt++; + publish_netdev(dev); return 0; -err_out_cleardev: - pci_set_drvdata(pdev, NULL); #ifndef USE_IO_OPS iounmap((void *)ioaddr); err_out_free_res: #endif pci_release_regions(pdev); err_out_netdev: + unregister_netdev(dev); kfree (dev); return -ENODEV; } --- linux-2.4.6-pre8/drivers/net/pci-skeleton.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/pci-skeleton.c Sun Jul 1 21:57:47 2001 @@ -710,13 +710,10 @@ match: tp->chipset, rtl_chip_info[tp->chipset].name); - i = register_netdev (dev); - if (i) - goto err_out_unmap; - DPRINTK ("EXIT, returning 0\n"); *ioaddr_out = ioaddr; *dev_out = dev; + publish_netdev (dev); return 0; err_out_unmap: @@ -726,6 +723,7 @@ err_out_free_res: #endif pci_release_regions (pdev); err_out: + unregister_netdev(dev); kfree (dev); DPRINTK ("EXIT, returning %d\n", rc); return rc; --- linux-2.4.6-pre8/drivers/net/fealnx.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/fealnx.c Sun Jul 1 22:20:56 2001 @@ -650,23 +650,20 @@ static int __devinit fealnx_init_one(str dev->tx_timeout = tx_timeout; dev->watchdog_timeo = TX_TIMEOUT; - err = register_netdev(dev); - if (err) - goto err_out_free_tx; - printk(KERN_INFO "%s: %s at 0x%lx, ", dev->name, skel_netdrv_tbl[chip_id].chip_name, ioaddr); for (i = 0; i < 5; i++) printk("%2.2x:", dev->dev_addr[i]); printk("%2.2x, IRQ %d.\n", dev->dev_addr[i], irq); + publish_netdev(dev); return 0; -err_out_free_tx: pci_free_consistent(pdev, TX_TOTAL_SIZE, np->tx_ring, np->tx_ring_dma); err_out_free_rx: pci_free_consistent(pdev, RX_TOTAL_SIZE, np->rx_ring, np->rx_ring_dma); err_out_free_dev: + unregister_netdev(dev); kfree(dev); err_out_unmap: #ifndef USE_IO_OPS @@ -1808,7 +1805,7 @@ static int netdev_close(struct net_devic return 0; } -static struct pci_device_id fealnx_pci_tbl[] = __devinitdata { +static struct pci_device_id fealnx_pci_tbl[] __devinitdata = { {0x1516, 0x0800, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, {0x1516, 0x0803, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 1}, {0x1516, 0x0891, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 2}, --- linux-2.4.6-pre8/drivers/net/wireless/airo.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/wireless/airo.c Sun Jul 1 22:01:09 2001 @@ -1024,6 +1024,7 @@ struct net_device *init_airo_card( unsig printk(KERN_ERR "airo: Couldn't alloc_etherdev\n"); return NULL; } + SET_MODULE_OWNER(dev); ai = dev->priv; ai->registered = 1; ai->dev = dev; @@ -1051,15 +1052,11 @@ struct net_device *init_airo_card( unsig dev->irq = irq; dev->base_addr = port; - rc = register_netdev(dev); - if (rc) - goto err_out_unlink; - rc = request_irq( dev->irq, airo_interrupt, SA_SHIRQ | SA_INTERRUPT, dev->name, dev ); if (rc) { printk(KERN_ERR "airo: register interrupt %d failed, rc %d\n", irq, rc ); - goto err_out_unregister; + goto err_out_unlink; } if (!is_pcmcia) { if (!request_region( dev->base_addr, 64, dev->name )) { @@ -1085,7 +1082,7 @@ struct net_device *init_airo_card( unsig setup_proc_entry( dev, dev->priv ); /* XXX check for failure */ netif_start_queue(dev); - SET_MODULE_OWNER(dev); + publish_netdev(dev); return dev; err_out_res: @@ -1093,11 +1090,10 @@ err_out_res: release_region( dev->base_addr, 64 ); err_out_irq: free_irq(dev->irq, dev); -err_out_unregister: - unregister_netdev(dev); err_out_unlink: del_airo_dev(dev); err_out_free: + unregister_netdev(dev); kfree(dev); return NULL; } --- linux-2.4.6-pre8/drivers/net/tokenring/olympic.c Sun Jul 1 16:11:24 2001 +++ linux-akpm/drivers/net/tokenring/olympic.c Sun Jul 1 22:06:24 2001 @@ -51,6 +51,8 @@ * * 06/02/01 - Clean up, copy skb for small packets * + * 07/01/01 - Use publish_netdev() (andrewm@uow.edu.au) + * * To Do: * * Complete full Cardbus / hot-swap support. @@ -235,6 +237,7 @@ static int __devinit olympic_probe(struc if((i = olympic_init(dev))) { iounmap(olympic_priv->olympic_mmio) ; iounmap(olympic_priv->olympic_lap) ; + unregister_netdev(dev) ; kfree(dev) ; pci_release_regions(pdev) ; return i ; @@ -251,7 +254,6 @@ static int __devinit olympic_probe(struc SET_MODULE_OWNER(dev) ; pci_set_drvdata(pdev,dev) ; - register_netdev(dev) ; printk("Olympic: %s registered as: %s\n",olympic_priv->olympic_card_name,dev->name); if (olympic_priv->olympic_network_monitor) { /* Must go after register_netdev as we need the device name */ char proc_name[20] ; @@ -260,6 +262,7 @@ static int __devinit olympic_probe(struc create_proc_read_entry(proc_name,0,0,olympic_proc_info,(void *)dev) ; printk("Olympic: Network Monitor information: /proc/%s\n",proc_name); } + publish_netdev(dev); return 0 ; } --- linux-2.4.6-pre8/Documentation/networking/vortex.txt Sun Jul 1 16:11:21 2001 +++ linux-akpm/Documentation/networking/vortex.txt Fri Jun 29 13:26:54 2001 @@ -104,6 +104,8 @@ options=N1,N2,N3,... When generating a value for the 'options' setting, the above media selection values may be OR'ed (or added to) the following: + 2048 (0x800) Set driver debugging level to 7 + 1024 (0x400) Set driver debugging level to 2 512 (0x200) Force full duplex mode. 16 (0x10) Bus-master enable bit (Old Vortex cards only) From owner-netdev@oss.sgi.com Sun Jul 1 11:24:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f61IOsC26775 for netdev-outgoing; Sun, 1 Jul 2001 11:24:54 -0700 Received: from smtp102.urscorp.com (smtp102.urscorp.com [64.17.27.233]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f61IOpV26772; Sun, 1 Jul 2001 11:24:51 -0700 To: Andrew Morton Cc: Alan Cox , breed@users.sourceforge.net, "David S. Miller" , hfhsu@sis.com.tw, ionut@cs.columbia.edu, Jeff Garzik , linux-tr@linuxtr.net, Manfred Spraul , netdev@oss.sgi.com, owner-netdev@oss.sgi.com, p_gortmaker@yahoo.com, Russell King , tori@unhappy.mine.nu, Linus Torvalds Subject: Re: alloc_etherdev breaks ether= X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 From: mike_phillips@urscorp.com Message-ID: Date: Sun, 1 Jul 2001 15:23:09 -0300 X-MIMETrack: Serialize by Router on SMTP102/URSCorp(Release 5.0.5 |September 22, 2000) at 07/01/2001 02:18:58 PM, Serialize complete at 07/01/2001 02:18:58 PM MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 69 Lines: 6 Andrew: Doesn't cause me any problems with olympic. Mike Phillips From owner-netdev@oss.sgi.com Sun Jul 1 14:43:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f61Lhpj09844 for netdev-outgoing; Sun, 1 Jul 2001 14:43:51 -0700 Received: from dea.waldorf-gmbh.de (u-184-19.karlsruhe.ipdial.viaginterkom.de [62.180.19.184]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f61LhfV09837 for ; Sun, 1 Jul 2001 14:43:44 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f61LhNs16426; Sun, 1 Jul 2001 23:43:23 +0200 Date: Sun, 1 Jul 2001 23:43:23 +0200 From: Ralf Baechle To: mike_phillips@urscorp.com Cc: netdev@oss.sgi.com Subject: Re: alloc_etherdev breaks ether= Message-ID: <20010701234323.B16341@bacchus.dhis.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from mike_phillips@urscorp.com on Sun, Jul 01, 2001 at 03:23:09PM -0300 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 725 Lines: 17 On Sun, Jul 01, 2001 at 03:23:09PM -0300, mike_phillips@urscorp.com wrote: > To: Andrew Morton > Cc: Alan Cox , breed@users.sourceforge.net, > "David S. Miller" , hfhsu@sis.com.tw, > ionut@cs.columbia.edu, Jeff Garzik , > linux-tr@linuxtr.net, Manfred Spraul , > netdev@oss.sgi.com, owner-netdev@oss.sgi.com, p_gortmaker@yahoo.com, ^^^^^^^^^^^^^^^^^^^^^^^^ Don't send postings to the list owners address. > Subject: Re: alloc_etherdev breaks ether= > X-Mailer: Lotus Notes Release 5.0.5 September 22, 2000 And this is the culprit ... Ralf From owner-netdev@oss.sgi.com Thu Jul 5 01:09:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6589NG10800 for netdev-outgoing; Thu, 5 Jul 2001 01:09:23 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6589LV10797 for ; Thu, 5 Jul 2001 01:09:21 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f6589Bh02252; Thu, 5 Jul 2001 01:09:11 -0700 (PDT) Received: from drgoldstw2k (dhcp-64-103-121-200.cisco.com [64.103.121.200]) by kaspit.cisco.com (Mirapoint) with SMTP id AJG08770; Thu, 5 Jul 2001 11:09:08 +0300 (GMT-3) From: "Dror Goldstein" To: Cc: "Dror Goldstein" Subject: Does Linux (Kernel & mrouted) supports multiple VIFs on a single i/f (device)? Date: Thu, 5 Jul 2001 11:10:31 +0300 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2379 Lines: 76 Hi all, In my Linux box, I have kernel 2.4.2 with multicast support and mrouted-3.9-beta3. I configure the Ethernet devices as follows: 1: lo: mtu 3924 qdisc noqueue link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 brd 127.255.255.255 scope host lo 2: eth0: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:50:56:bf:79:c8 brd ff:ff:ff:ff:ff:ff inet 64.103.121.40/24 brd 64.103.121.255 scope global eth0 3: eth1: mtu 1500 qdisc pfifo_fast qlen 100 link/ether 00:50:56:bf:73:41 brd ff:ff:ff:ff:ff:ff inet 10.0.0.10/24 brd 10.0.0.255 scope global eth1 inet 10.0.1.10/32 scope global eth1 When I activate mrouted (using the -d in order to see debug messages) I saw that the kernel sent a list of 3 interfaces as a candidate to be VIFs. The mrouted indeed recognized 3 VIFs one on eth0 and two VIFs on eth1. Vif # 0 : device eth0, local vif address 64.103.121.40 Vif #1 : device eth1, local vif address 10.0.0.10 Vif #2 : device eth1, local vif address 10.0.1.10 Looking at the mrouted code it looks like that, general queries will send twice on eth1, one per each VIF (The queries differ in there source IP address). Looking at the kernel code (.../ipv4/ipmr.c) it looks like that, the kernel doesn't support such a case in appropriate way. For example, then ipmr_find_vif uses the device and not the IP address in order to find the appropriate vif, to my opinion it is wrong. int ipmr_find_vif(struct net_device *dev) { int ct; for (ct=maxvif-1; ct>=0; ct--) { if (vif_table[ct].dev == dev) break; } return ct; } To my opinion (and mybe I'm wrong) another problem accrue when the kernel forwarding multicast traffic. In the function ip_mr_forward looking in the following code, it seems that if two client that connected to eth1 but belong to different vifs joined to the same multicast group, they will received each multicast packet twice, as the traffic will be sent on both VIFs. /* * Forward the frame */ for (ct = cache->mfc_un.res.maxvif-1; ct >= cache->mfc_un.res.minvif; ct--) { if (skb->nh.iph->ttl > cache->mfc_un.res.ttls[ct]) { if (psend != -1) ipmr_queue_xmit(skb, cache, psend, 0); psend=ct; } } if (psend != -1) ipmr_queue_xmit(skb, cache, psend, !local); Can someone help me with that? Thanks Dror G. From owner-netdev@oss.sgi.com Thu Jul 5 03:49:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65AnaJ13598 for netdev-outgoing; Thu, 5 Jul 2001 03:49:36 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65AnYV13594 for ; Thu, 5 Jul 2001 03:49:34 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f65AnWh04032 for ; Thu, 5 Jul 2001 03:49:33 -0700 (PDT) Received: from drgoldstw2k (dhcp-64-103-121-200.cisco.com [64.103.121.200]) by kaspit.cisco.com (Mirapoint) with SMTP id AJG09797; Thu, 5 Jul 2001 13:49:25 +0300 (GMT-3) From: "Dror Goldstein" To: Subject: Problem forwarding fragmented multicast packets. Date: Thu, 5 Jul 2001 13:50:48 +0300 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2024 Lines: 59 Hi, When multicast data packet should be fragmented there is a bug (at least to my opinion) and the packets are not forward. Following is a description of the problem as I found it in the kernel code, and after that, a patch that solved it. In the function ip_fragment (ip_output.c) when datagrame is fragmented the flags field (that is included in skb->cb field) is not copied from the old Skb to the new skb (skb2). Letter on ip_mc_output (ip_output.c) checks this flag in order to determine if the packet should be loop back. Note: - I check the fix only for multicast traffic when the flag was IPSKB_FORWARD I didn't check it for the IPSK_TRANSLATED and IPSK_MASQUERADED flags. - I compare the ip_output.c of kernel 2.4.0-test9 and kernel 2.4.2. --- ../../../../linux-2.4.2/linux/net/ipv4/ip_output.c Fri Oct 27 20:03:14 2000 +++ ip_output.c Wed Mar 14 18:42:21 2001 @@ -5,7 +5,7 @@ * * The Internet Protocol (IP) output module. * - * Version: $Id: ip_output.c,v 1.87 2000/10/25 20:07:22 davem Exp $ + * Version: $Id: ip_output.c,v 1.85 2000/08/31 23:39:12 davem Exp $ * * Authors: Ross Biro, * Fred N. van Kempen, @@ -37,6 +37,8 @@ * and more readibility. * Marc Boucher : When call_out_firewall returns FW_QUEUE, * silently drop skb instead of failing with -EPERM. + * Dror Goldstein: Copy the the flags (of inet_skb_parm structure) + * to each IP packet fragment. */ #include @@ -828,6 +830,10 @@ */ if (offset == 0) ip_options_fragment(skb); + + /* Dror g.:Copy the flags to each fragment */ + IPCB(skb2)->flags = IPCB(skb)->flags; + /* * Added AC : If we are fragmenting a fragment that's not the From owner-netdev@oss.sgi.com Thu Jul 5 04:10:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65BA7o14122 for netdev-outgoing; Thu, 5 Jul 2001 04:10:07 -0700 Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65BA4V14116 for ; Thu, 5 Jul 2001 04:10:05 -0700 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id OAA12975; Thu, 5 Jul 2001 14:11:17 +0300 Date: Thu, 5 Jul 2001 14:11:17 +0300 (EEST) From: Julian Anastasov X-Sender: ja@l To: Dror Goldstein cc: netdev@oss.sgi.com Subject: Re: Problem forwarding fragmented multicast packets. In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 612 Lines: 30 Hello, On Thu, 5 Jul 2001, Dror Goldstein wrote: > In the function ip_fragment (ip_output.c) when datagrame is fragmented the > flags field (that is included in skb->cb field) is not copied from the old > Skb to the new skb (skb2). ... > + > + /* Dror g.:Copy the flags to each fragment */ > + IPCB(skb2)->flags = IPCB(skb)->flags; > + I see this fix in 2.4.4. But what about the other fields: tc_index nfmark This will allow matching these fragments by fw mark and tc index classifiers. Or I'm overlooking something. QoS gurus? Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Thu Jul 5 06:47:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65Dl6J17306 for netdev-outgoing; Thu, 5 Jul 2001 06:47:06 -0700 Received: from nic.nigdzie (pa3.gliwice.sdi.tpnet.pl [213.25.220.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65DkuV17303 for ; Thu, 5 Jul 2001 06:46:59 -0700 Received: (qmail 4006 invoked by uid 500); 4 Jul 2001 14:46:54 -0000 Date: Wed, 4 Jul 2001 16:46:54 +0200 From: Jacek Konieczny To: netdev@oss.sgi.com Cc: pld-devel-en@pld.org.pl Subject: Oops in arp_rcv, patch Message-ID: <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> Mail-Followup-To: netdev@oss.sgi.com, pld-devel-en@pld.org.pl Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4171 Lines: 98 Hi, One of my router has rebooted a lot last days. I couldn't find the reason as the oops were not logged, and during most of crashes there was noone at the console. But finnaly I got the oops on a serial console. After decoding the oops and examining kernel sources I found the problem --- it was neigh_release() function which failed. Everywhere else in the code its argument is protected against being NULL, but not in the one place. Here is my patch: ===== cut ==== --- linux/net/ipv4/arp.c.orig Thu Jun 28 17:29:10 2001 +++ linux/net/ipv4/arp.c Tue Jul 3 19:37:25 2001 @@ -738,7 +738,7 @@ (addr_type == RTN_UNICAST && rt->u.dst.dev != dev && (IN_DEV_PROXY_ARP(in_dev) || pneigh_lookup(&arp_tbl, &tip, dev, 0)))) { n = neigh_event_ns(&arp_tbl, sha, &sip, dev); - neigh_release(n); + if (n) neigh_release(n); if (skb->stamp.tv_sec == 0 || skb->pkt_type == PACKET_HOST || ============== The bug cames out when proxy-arp is configured. It seems number of entries in ARP table matters to (on my host "ip nieghb show|wc" gives more than 1000), or it may be number of ethernet ports (I have 10). The buggy code seems unchanged in 2.4.5 kernel. Here is the decoded oops: =============== ksymoops 2.4.1 on i686 2.2.19. Options used -V (default) -k /proc/ksyms (default) -l /lib/modules/2.2.19-16/ (specified) -o /lib/modules/2.2.19/ (default) -m /boot/System.map (specified) Error (expand_objects): cannot stat(/lib/ext2.o) for ext2 Error (expand_objects): cannot stat(/lib/ide-disk.o) for ide-disk Error (expand_objects): cannot stat(/lib/ide-probe-mod.o) for ide-probe-mod Error (expand_objects): cannot stat(/lib/ide-mod.o) for ide-mod Error (regular_file): read_lsmod /lib/modules/2.2.19-16/ is not a regular file, ignored Warning (map_ksym_to_module): cannot match loaded module ext2 to a unique module object. Trace may not be reliable. Oops: 0002 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010286 eax: 00000000 ebx: 00000000 ecx: 5343e3d5 edx: 00000401 esi: c62a7430 edi: ca4d71f0 ebp: c62a7438 esp: c0211f04 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, process nr: 0, stackpage=c0211000) Stack: c0200494 c0210608 5343e3d5 c0211f20 c0211f24 cb8f0750 00017b80 5343e3d5 5143e3d5 c0148038 c7bf6640 ca4d71f0 c0200494 00000001 c023d8e4 0003c15b c0211f60 c7bf6640 0003c15b c011a269 00000000 c0210000 c010b56a 00001000 Call Trace: [] [] [] [] [] [] [] [] [] [] [] Code: ff 4b 2c 0f 94 c0 84 c0 74 0f 83 7b 04 00 75 09 53 e8 a9 cd >>EIP; c016d089 <===== Trace; c0148038 Trace; c011a269 Trace; c010b56a Trace; c010b230 Trace; c01088dd Trace; c0106000 Trace; c010a008 Trace; c0106000 Trace; c0106077 Trace; c0106000 Trace; c0100175 Code; c016d089 00000000 <_EIP>: Code; c016d089 <===== 0: ff 4b 2c decl 0x2c(%ebx) <===== Code; c016d08c 3: 0f 94 c0 sete %al Code; c016d08f 6: 84 c0 test %al,%al Code; c016d091 8: 74 0f je 19 <_EIP+0x19> c016d0a2 Code; c016d093 a: 83 7b 04 00 cmpl $0x0,0x4(%ebx) Code; c016d097 e: 75 09 jne 19 <_EIP+0x19> c016d0a2 Code; c016d099 10: 53 push %ebx Code; c016d09a 11: e8 a9 cd 00 00 call cdbf <_EIP+0xcdbf> c0179e48 Aiee, killing interrupt handler Kernel panic: Attempted to kill the idle task! In swapper task - not syncing 1 warning and 5 errors issued. Results may not be reliable. From owner-netdev@oss.sgi.com Thu Jul 5 07:11:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65EBNW17763 for netdev-outgoing; Thu, 5 Jul 2001 07:11:23 -0700 Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.140.186]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65EBMV17760 for ; Thu, 5 Jul 2001 07:11:22 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id XAA06078; Thu, 5 Jul 2001 23:02:55 +0900 To: jajcus@bnet.pl Cc: netdev@oss.sgi.com, pld-devel-en@pld.org.pl Subject: Re: Oops in arp_rcv, patch In-Reply-To: <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> References: <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010705230255N.yoshfuji@wide.ad.jp> Date: Thu, 05 Jul 2001 23:02:55 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 857 Lines: 22 In article <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> (at Wed, 4 Jul 2001 16:46:54 +0200), Jacek Konieczny says: > --- it was neigh_release() function which failed. Everywhere else in > the code its argument is protected against being NULL, but not in the > one place. Here is my patch: > ===== cut ==== > --- linux/net/ipv4/arp.c.orig Thu Jun 28 17:29:10 2001 > +++ linux/net/ipv4/arp.c Tue Jul 3 19:37:25 2001 > @@ -738,7 +738,7 @@ > (addr_type == RTN_UNICAST && rt->u.dst.dev != dev && > (IN_DEV_PROXY_ARP(in_dev) || pneigh_lookup(&arp_tbl, &tip, dev, 0)))) { > n = neigh_event_ns(&arp_tbl, sha, &sip, dev); > - neigh_release(n); > + if (n) neigh_release(n); > > if (skb->stamp.tv_sec == 0 || > skb->pkt_type == PACKET_HOST || linux-2.4.x (0-test1 to 6) kernels are ok. -- yoshfuji From owner-netdev@oss.sgi.com Thu Jul 5 10:21:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65HLQa22822 for netdev-outgoing; Thu, 5 Jul 2001 10:21:26 -0700 Received: from c0mailgw06.prontomail.com (mailgw.prontomail.com [216.163.180.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65HLOV22819 for ; Thu, 5 Jul 2001 10:21:24 -0700 Received: from c6web104 (216.163.178.10) by c0mailgw06.prontomail.com (NPlex 5.5.029) id 3B425D35000657C6 for netdev@oss.sgi.com; Thu, 5 Jul 2001 10:16:34 -0700 X-Version: beer 6.3.3353.0 From: "william fitzgerald" Message-Id: Date: Thu, 5 Jul 2001 18:21:13 +0200 X-Priority: Normal Content-Type: text/plain; charset=iso-8859-1 To: netdev@oss.sgi.com Subject: dev.c query X-Mailer: Web Based Pronto Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2796 Lines: 68 hi All, My name is william fitzgerald and i'm doing my first year of research in Maynooth N.U.I in Ireland.My research is basically about measuring and documenting the points in which packets are lost by a router on a network.To be perfectly honest i am a complete novice in this whole area of linux and routing but i'm learning.In the future i would like to develop some optimization strategies in the area of routing for the linux community. i have a query that i hope you can help me with because i just can't seem to get my head around some stuff. To begin with i'm trying to discover a packets life the the dev.c (on my router) program so i can hopefully soon add some probes to certain areas to measure what happens. So basically,i am at the moment trying to figure out what functions are called when a packet is travelling through the dev.c program . I persume during the initial vortex interrupt netif_rx() function is called to pass the sk_buff to the backlog queue of size 300.When that is done it finishes.Then the vortex interupt finishes. Then i think net_rx_action is called to do stuff with the recieved packet. While in this funtion the backlog queue is dequeued i think with the line : skb=_skb_dequeue(&queue -> input_pkt_queue); i'm not sure what FASTROUTE is about yet. I quess during the dequeueing function of net_rx_action a call is made to ip_rec() in input.c I cant realy see a direct reference to ip_recv() put i think it makes sense that as packets are being dequeued that the ip_recieve function is called to decide if the packet is for local host or for fowarding (what i'm looking at) While i think that net_rx_action deals with handling the recieved packets than i conclude that net_tx_action deals with transmitting packets onto some queue for exiting the system. Am i close or well half right so far, its just i'm hopless at reading code, i was never good a programming but i am learning although slowly!! I was wondering how come there is no function for netbh() like in the 2.2.14 kernel.I thought that bottom half handlers handle incoming packets from the nic.Is this gone in the 2.4 kernel or is it wrapped up somewhere else in the dev.c program.(the kernel i am using is 2.4.0-test9) what does the function: static void deliever_to_old_ones(struct packet *pt,struct sk_buff *skb,int last) do,its just its the only place i found reference to what i think may be the net_bh() (ie.what i think is network bottom half handler.) i know your busy with work and all but you would be helping me a great deal if you could respond to me what ever the outcome just so i know one way or the other. many thanks in advance, regards william A beer.com Beer Mail fanatic Beer Mail, brought to you by your friends at beer.com. From owner-netdev@oss.sgi.com Thu Jul 5 12:29:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65JT4s25607 for netdev-outgoing; Thu, 5 Jul 2001 12:29:04 -0700 Received: from cr416993-a.ym1.on.wave.home.com (cr416993-a.ym1.on.wave.home.com [24.112.193.232]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65JSxV25604 for ; Thu, 5 Jul 2001 12:29:02 -0700 Received: from redshift.mimosa.com (IDENT:root@redshift.mimosa.com [192.139.70.107]) by cr416993-a.ym1.on.wave.home.com (8.9.3/8.9.3) with ESMTP id PAA07055; Thu, 5 Jul 2001 15:31:20 -0400 Received: from localhost (hugh@localhost) by redshift.mimosa.com (8.11.0/8.11.0) with ESMTP id f65JXZb07278; Thu, 5 Jul 2001 15:33:35 -0400 X-Authentication-Warning: redshift.mimosa.com: hugh owned process doing -bs Date: Thu, 5 Jul 2001 15:33:35 -0400 (EDT) From: "D. Hugh Redelmeier" Reply-To: To: Andi Kleen cc: , Marco Berizzi Subject: Re: select says I can read, but recvfrom hangs In-Reply-To: Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1866 Lines: 55 We're still having this problem. Marco has observed it with 2.2.19 and 2.4.5. I've had him instrument the Pluto code, so we have more information. - select says that the file descriptor is ready to be read from - poll on that file descriptor, with POLLIN | POLLPRI | POLLOUT, returns 1, with revents = 12. I decode this to be these two flags: #define POLLOUT 0x0004 /* Writing now will not block */ #define POLLERR 0x0008 /* Error condition */ Notice that POLLIN is not on. - the recvfrom with flags = 0 hangs waiting for a message. Here's a part of the strace: select(9, [4 6 8], NULL, NULL, {36, 0}) = 2 (in [4 8], left {36, 0}) select says that there is input available on file descriptors 4 and 8. 4 is a unix domain socket for control information. 8 is a UDP socket. We'll look at 8 first. select(9, [8], NULL, NULL, {0, 0}) = 1 (in [8], left {0, 0}) Just to make sure (because we're paranoid): yes, select says 8 has input for us. poll([{fd=8, events=POLLIN|POLLPRI|POLLOUT, revents=POLLOUT|POLLERR}], 1, 0) = 1 Further paranoia: what does poll say about 8? The program doesn't actually use this info -- just for testing. It says that there is POLLERR but not POLLIN. recvfrom(8, 0xbffef5c8, 65536, 0, 0xbffef5b0, 0xbffef5ac) = ? ERESTARTSYS (To be restarted) --- SIGHUP (Hangup) --- Hang in recvfrom until SIGHUP liberates us. Since the select said that there was something to read, the recvfrom must not hang, but it does. I'm not sure what the correct kernel behaviour should be. Either the select should say that there is nothing to read, or the recvfrom should not hang. Andi suggested earlier that the read should return immediately, with a -1 result, indicating error. This looks like a kernel bug to me (or perhaps a documentation bug). Hugh Redelmeier hugh@mimosa.com voice: +1 416 482-8253 From owner-netdev@oss.sgi.com Thu Jul 5 15:37:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f65MbYG08560 for netdev-outgoing; Thu, 5 Jul 2001 15:37:34 -0700 Received: from almesberger.net (IDENT:root@lsb-catv-1-p021.vtxnet.ch [212.147.5.21]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f65MbWV08555 for ; Thu, 5 Jul 2001 15:37:32 -0700 Received: (from almesber@localhost) by almesberger.net (8.9.3/8.9.3) id AAA28790; Fri, 6 Jul 2001 00:37:07 +0200 Date: Fri, 6 Jul 2001 00:37:07 +0200 From: Werner Almesberger To: Julian Anastasov Cc: Dror Goldstein , netdev@oss.sgi.com Subject: Re: Problem forwarding fragmented multicast packets. Message-ID: <20010706003707.L22891@almesberger.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ; from ja@ssi.bg on Thu, Jul 05, 2001 at 02:11:17PM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 381 Lines: 13 Julian Anastasov wrote: > I see this fix in 2.4.4. But what about the other fields: > > tc_index Yes, tc_index should be preserved too. - Werner -- _________________________________________________________________________ / Werner Almesberger, Lausanne, CH wa@almesberger.net / /_http://icawww.epfl.ch/almesberger/_____________________________________/ From owner-netdev@oss.sgi.com Thu Jul 5 17:06:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6606br21803 for netdev-outgoing; Thu, 5 Jul 2001 17:06:37 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6606ZV21796 for ; Thu, 5 Jul 2001 17:06:35 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA29194; Thu, 5 Jul 2001 17:05:58 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15173.230.571263.353935@pizda.ninka.net> Date: Thu, 5 Jul 2001 17:05:58 -0700 (PDT) To: Jacek Konieczny Cc: netdev@oss.sgi.com, pld-devel-en@pld.org.pl Subject: Re: Oops in arp_rcv, patch In-Reply-To: <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> References: <20010704164654.B3805@nic.gliwice.sdi.tpnet.pl> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 663 Lines: 28 Jacek Konieczny writes: > - neigh_release(n); > + if (n) neigh_release(n); ... > The buggy code seems unchanged in 2.4.5 kernel. It looks perfectly fine to me, all neigh_event_ns() return arguments are checked in the current 2.4.x sources, I don't know what you are talking about wrt. 2.4.5 since it is fine there too: n = neigh_event_ns(&arp_tbl, sha, &sip, dev); if (n) { ... neigh_release(n); } ... n = neigh_event_ns(&arp_tbl, sha, &sip, dev); if (n) neigh_release(n); ... What made you think that 2.4.5 still has the bug? You must have been looking at some other source tree. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Jul 6 01:35:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f668ZkH31633 for netdev-outgoing; Fri, 6 Jul 2001 01:35:46 -0700 Received: from nic.nigdzie (pa3.gliwice.sdi.tpnet.pl [213.25.220.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f668ZVV31620 for ; Fri, 6 Jul 2001 01:35:32 -0700 Received: (qmail 4211 invoked by uid 500); 6 Jul 2001 08:35:08 -0000 Date: Fri, 6 Jul 2001 10:35:08 +0200 From: Jacek Konieczny To: "David S. Miller" Cc: netdev@oss.sgi.com Subject: Re: Oops in arp_rcv, patch Message-ID: <20010706103507.A3043@nic.gliwice.sdi.tpnet.pl> Mail-Followup-To: "David S. Miller" , netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <15173.230.571263.353935@pizda.ninka.net> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 978 Lines: 32 On Thu, Jul 05, 2001 at 05:05:58PM -0700, David S. Miller wrote: > > Jacek Konieczny writes: > > - neigh_release(n); > > + if (n) neigh_release(n); > ... > > The buggy code seems unchanged in 2.4.5 kernel. > > It looks perfectly fine to me, all neigh_event_ns() return arguments > are checked in the current 2.4.x sources, I don't know what you are > talking about wrt. 2.4.5 since it is fine there too: > > n = neigh_event_ns(&arp_tbl, sha, &sip, dev); > if (n) { > ... > neigh_release(n); > } > ... > n = neigh_event_ns(&arp_tbl, sha, &sip, dev); > if (n) > neigh_release(n); > ... > > What made you think that 2.4.5 still has the bug? You must have been > looking at some other source tree. Or maybe I was looking not good enough. Sorry, for false alarm. But the bug is really present in 2.2.19 and the patch was made for that version. At least, after applying that my router started building its uptime again. Greets, Jacek From owner-netdev@oss.sgi.com Fri Jul 6 04:40:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f66BeGJ16006 for netdev-outgoing; Fri, 6 Jul 2001 04:40:16 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f66BeFV16001 for ; Fri, 6 Jul 2001 04:40:15 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id EAA10834; Fri, 6 Jul 2001 04:37:53 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15173.41745.282549.680549@pizda.ninka.net> Date: Fri, 6 Jul 2001 04:37:53 -0700 (PDT) To: Jacek Konieczny Cc: netdev@oss.sgi.com Subject: Re: Oops in arp_rcv, patch In-Reply-To: <20010706103507.A3043@nic.gliwice.sdi.tpnet.pl> References: <15173.230.571263.353935@pizda.ninka.net> <20010706103507.A3043@nic.gliwice.sdi.tpnet.pl> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 352 Lines: 12 Jacek Konieczny writes: > Or maybe I was looking not good enough. Sorry, for false alarm. > But the bug is really present in 2.2.19 and the patch was made for that > version. At least, after applying that my router started building its > uptime again. Thanks, fixed in my 2.2.x tree and sent off to Alan. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Jul 6 18:02:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6712Yi30822 for netdev-outgoing; Fri, 6 Jul 2001 18:02:34 -0700 Received: from yoda.planetinternet.be (yoda.planetinternet.be [195.95.30.146]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6712WV30819 for ; Fri, 6 Jul 2001 18:02:33 -0700 Received: from dialup.planetinternet.be (postfix@u212-239-144-5.dialup.planetinternet.be [212.239.144.5]) by yoda.planetinternet.be (8.11.3/8.11.1) with ESMTP id f6712U909093 for ; Sat, 7 Jul 2001 03:02:30 +0200 Received: by dialup.planetinternet.be (Postfix, from userid 501) id 9FEE226133; Sat, 07 Jul 2001 03:02:27 +0200 (CEST) Date: Sat, 7 Jul 2001 03:02:27 +0200 From: Kurt Roeckx To: netdev@oss.sgi.com Subject: ICMP NDISC: fake message with non-255 Hop Limit received: 249 Message-ID: <20010707030227.A1676@ping.be> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0pre2i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1292 Lines: 40 I upgraded from 2.4.3 to 2.4.6 two days ago, and suddenly get message like this: Jul 5 19:05:51 thunderbird kernel: ICMP NDISC: fake message with non-255 Hop Limit received: 249 Jul 5 19:05:53 thunderbird last message repeated 2 times Jul 5 19:07:16 thunderbird last message repeated 20 times Which seems to go on for about 4 minutes. Then: Jul 6 06:38:18 thunderbird kernel: ICMP NDISC: fake message with non-255 Hop Limit received: 250 Jul 6 06:38:37 thunderbird last message repeated 8 times Jul 6 07:13:49 thunderbird kernel: ICMP NDISC: fake message with non-255 Hop Limit received: 250 Jul 6 07:13:51 thunderbird last message repeated 2 times Jul 6 07:19:28 thunderbird kernel: ICMP NDISC: fake message with non-255 Hop Limit received: 250 Jul 6 07:19:55 thunderbird last message repeated 8 times Which went on until 7:39 I also see alot of "ICMPv6 checksum failed", which seem to come from the same source most of the time. I did get those before, maybe it's related? Is this a bad thing? Can I do something to help debug this? This box is mostly doing IPv6 traffic. The counter says we received 433 MB and send 338 MB of ipv6 trafic so far during the last 2 days, which is about 2.4 KB/s receive, and 1.9 KB/s send. Please CC me, as I'm not on the list. Kurt From owner-netdev@oss.sgi.com Sat Jul 7 00:04:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6774ce21469 for netdev-outgoing; Sat, 7 Jul 2001 00:04:38 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6774aV21465 for ; Sat, 7 Jul 2001 00:04:36 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6774Kb24019; Sat, 7 Jul 2001 10:04:20 +0300 Date: Sat, 7 Jul 2001 10:04:19 +0300 (EEST) From: Pekka Savola To: Kurt Roeckx cc: Subject: Re: ICMP NDISC: fake message with non-255 Hop Limit received: 249 In-Reply-To: <20010707030227.A1676@ping.be> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1072 Lines: 31 On Sat, 7 Jul 2001, Kurt Roeckx wrote: > I upgraded from 2.4.3 to 2.4.6 two days ago, and suddenly get > message like this: > > Jul 5 19:05:51 thunderbird kernel: ICMP NDISC: fake message with > non-255 Hop Limit received: 249 > Jul 5 19:05:53 thunderbird last message repeated 2 times > Jul 5 19:07:16 thunderbird last message repeated 20 times [snip] > Is this a bad thing? Can I do something to help debug this? The specs require that all IPv6 neighbour discovery messages MUST be originated in the same network. In your case, you're getting these messages from over the Internet. Someone is probably trying to do something nasty. Still, I'd suggest getting tcpdump 3.6.2 (compiled with ipv6), and capturing the traffic a bit if this happens again: # tcpdump -n -s 512 -vvv icmp6 If you do capture something, please also describe your network topology. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Mon Jul 9 17:40:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6A0eae22543 for netdev-outgoing; Mon, 9 Jul 2001 17:40:36 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6A0eZV22540 for ; Mon, 9 Jul 2001 17:40:35 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id RAA09790; Mon, 9 Jul 2001 17:40:29 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15178.20220.911259.162566@pizda.ninka.net> Date: Mon, 9 Jul 2001 17:40:28 -0700 (PDT) To: Werner Almesberger Cc: Julian Anastasov , Dror Goldstein , netdev@oss.sgi.com Subject: Re: Problem forwarding fragmented multicast packets. In-Reply-To: <20010706003707.L22891@almesberger.net> References: <20010706003707.L22891@almesberger.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 263 Lines: 14 Werner Almesberger writes: > Julian Anastasov wrote: > > I see this fix in 2.4.4. But what about the other fields: > > > > tc_index > > Yes, tc_index should be preserved too. I've fixed this in my tree, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Jul 10 08:32:25 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6AFWPY28397 for netdev-outgoing; Tue, 10 Jul 2001 08:32:25 -0700 Received: from c0mailgw08.prontomail.com (mailgw.prontomail.com [216.163.180.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6AFWOV28394 for ; Tue, 10 Jul 2001 08:32:25 -0700 Received: from c6web109 (216.163.178.10) by c0mailgw08.prontomail.com (NPlex 5.5.029) id 3B4A4F1800021CC9 for netdev@oss.sgi.com; Tue, 10 Jul 2001 08:27:21 -0700 X-Version: beer 6.3.3353.0 From: "william fitzgerald" Message-Id: <9976AFD313575D115AE50005B83D1402@william.fitzgerald.beer.com> Date: Tue, 10 Jul 2001 16:31:51 +0200 X-Priority: Normal Content-Type: text/plain; charset=iso-8859-1 To: netdev@oss.sgi.com Subject: is there any linux 2.4 ip networking documentation? X-Mailer: Web Based Pronto Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 745 Lines: 17 hi all, my name is william fitzgerald and i'm doing a research masters at maynooth,N.U.I,ireland.my research area is in the performance and optimisation of linux routers.at the moment i'm interested in seeing the life cycle of a packet going through a linux network. i was wondering can anyone direct me to any documentation that deals with the 2.4 linux kernel in regard to linux ip networking. the kernel i'm using is 2.4.0-test9.i have come accross linux ip networking for the 2.2 kernel (which has completely different code structures) from Gregory Parrott (thanks Gregory,that version did help me some of the way). many thanks in advance william A beer.com Beer Mail fanatic Beer Mail, brought to you by your friends at beer.com. From owner-netdev@oss.sgi.com Tue Jul 10 10:02:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6AH2uk31441 for netdev-outgoing; Tue, 10 Jul 2001 10:02:56 -0700 Received: from imr2.ericy.com (imr2.ericy.com [12.34.240.68]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6AH2tV31438 for ; Tue, 10 Jul 2001 10:02:55 -0700 Received: from mr5.exu.ericsson.se (mr5att.ericy.com [138.85.92.13]) by imr2.ericy.com (8.11.3/8.11.3) with ESMTP id f6AH2o523024 for ; Tue, 10 Jul 2001 12:02:50 -0500 (CDT) Received: from noah.lmc.ericsson.se (noah.lmc.ericsson.se [142.133.1.1]) by mr5.exu.ericsson.se (8.11.3/8.11.3) with ESMTP id f6AH2nb20489 for ; Tue, 10 Jul 2001 12:02:49 -0500 (CDT) Received: from LMC37.lmc.ericsson.se (lmc51.lmc.ericsson.se [142.133.16.191]) by noah.lmc.ericsson.se (8.11.2/8.9.2) with ESMTP id f6AH2kA11463 for ; Tue, 10 Jul 2001 13:02:49 -0400 (EDT) Received: by lmc37.lmc.ericsson.se with Internet Mail Service (5.5.2653.19) id <34MD7V1V>; Tue, 10 Jul 2001 13:02:45 -0400 Message-ID: <32CD630F6CBED411AE180008C7894CBC04DA8B58@lmc37.lmc.ericsson.se> From: "Ibrahim Haddad (LMC)" To: netdev@oss.sgi.com Subject: IPv6 Linux Implementation Date: Tue, 10 Jul 2001 13:02:36 -0400 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 421 Lines: 18 Hello all, I am searching for a reference on who wrote/writing the IPv6 stack in the Linux kernel. Is there a web site where we can get all the info on the supported features, who is writing what .. and so on. I would really appreciate your feedback. Best Regards, Ibrahim Haddad Open Architecture Research Lab Ericsson Canada Inc. Tel: +1 514 345 7900 x5484 Fax: +1 514 345 6105 Mailto:Ibrahim.Haddad@Ericsson.ca From owner-netdev@oss.sgi.com Wed Jul 11 03:36:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6BAaUZ07984 for netdev-outgoing; Wed, 11 Jul 2001 03:36:30 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6BAaSV07980 for ; Wed, 11 Jul 2001 03:36:28 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6BAZRH20954; Wed, 11 Jul 2001 13:35:27 +0300 Date: Wed, 11 Jul 2001 13:35:27 +0300 (EEST) From: Pekka Savola To: "Ibrahim Haddad (LMC)" cc: Subject: Re: IPv6 Linux Implementation In-Reply-To: <32CD630F6CBED411AE180008C7894CBC04DA8B58@lmc37.lmc.ericsson.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1129 Lines: 27 On Tue, 10 Jul 2001, Ibrahim Haddad (LMC) wrote: > I am searching for a reference on who wrote/writing the > IPv6 stack in the Linux kernel. Is there a web site where > we can get all the info on the supported features, who is > writing what .. and so on. The original writer Pedro Roque has left the community. Alexey Kuznetsov and others have made quite a few enhancements over the years. Current "official" maintainers are listed in MAINTAINERS file. See the source code in net/ipv6/ headers for details on changes (not all are listed of course). USAGI Project (http://www.linux-ipv6.org/) is working on IPv6 too. So far they've submitted quite a few smaller fixes, but more complete integration is an open issue yet. Unfortunately there does not exist a complete "IMPLEMENTATION" document, describing all the supported RFC's etc. and motivations behind different design decisions. Source Code Is Your Friend. ;-) -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Thu Jul 12 01:14:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6C8EFj09459 for netdev-outgoing; Thu, 12 Jul 2001 01:14:15 -0700 Received: from yue.hongo.wide.ad.jp (yue.hongo.wide.ad.jp [203.178.140.186]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6C8ECV09438 for ; Thu, 12 Jul 2001 01:14:12 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id RAA07058; Thu, 12 Jul 2001 17:15:53 +0900 To: netdev@oss.sgi.com, usagi-users@linux-ipv6.org CC: usagi-core@linux-ipv6.org Subject: USAGI Project in 2001 Linux Symposium X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010712171553C.yoshfuji@linux-ipv6.org> Date: Thu, 12 Jul 2001 17:15:53 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 706 Lines: 19 Hi, all, Most core members of USAGI - UniverSAl playGround for Ipv6 - Project will attend the 2001 Linux Symposium held in Ottawa Canada at the week after next (July 25th-28th, 2001; ). Though our project do not have official presentation / BOF time at the venue, we'd like to see Linux developers and users there and to have any inputs such as impression of our works from you. We'd also like to talk on the following items: o current status o coding style o collaboration etc. Thanks for your interest. We're looking forward to seeing you there! Best Regards, -- Hideaki YOSHIFUJI / USAGI Project From owner-netdev@oss.sgi.com Thu Jul 12 11:44:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6CIii828169 for netdev-outgoing; Thu, 12 Jul 2001 11:44:44 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6CIiZV28165 for ; Thu, 12 Jul 2001 11:44:35 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 15KlSp-0006c6-00 for netdev@oss.sgi.com; Thu, 12 Jul 2001 19:45:27 +0100 Received: from [199.183.24.194] (helo=vger.kernel.org) by the-village.bc.nu with esmtp (Exim 3.22 #1) id 15KkA8-0006UX-00 for alan@lxorguk.ukuu.org.uk; Thu, 12 Jul 2001 18:22:04 +0100 Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id ; Thu, 12 Jul 2001 13:19:54 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id ; Thu, 12 Jul 2001 13:19:45 -0400 Received: from mail.linuxcare.com ([216.88.157.164]:3050 "HELO mail.linuxcare.com") by vger.kernel.org with SMTP id ; Thu, 12 Jul 2001 13:19:31 -0400 Received: from linuxcare.com (dmz-gw.linuxcare.com [216.88.157.161]) by mail.linuxcare.com (Postfix) with SMTP id AC7E13CE for ; Thu, 12 Jul 2001 10:19:30 -0700 (PDT) Received: by linuxcare.com (sSMTP sendmail emulation); Thu, 12 Jul 2001 10:12:19 -0700 From: Ned Bass Date: Thu, 12 Jul 2001 10:12:19 -0700 To: linux-kernel@vger.kernel.org Subject: Oops triggered by ftp connection attempt through Linux firewall Message-ID: <20010712101219.D5476@linuxcare.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Mailing-List: linux-kernel@vger.kernel.org Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 13412 Lines: 322 Hello, I wish to be personally CC'ed the answers/comments posted to the list in response to my posting. [1.] One line summary of the problem: Ftp connection attempt through Linux firewall triggers kernel Oops. [2.] Full description of the problem/report: A kernel Oops error occurs on a Linux 2.4.6 system that provides IP masquerading for a local area network. The crash is triggered when a connection is attempted to port 21 on ftp.freesoftware.com from any host on the local network which uses the Linux system as its default gateway. Attempting to connect to port 21 on the Linux system itself results in a no route to host error message. The severity of the crash prevents interactive access to the system, and a hard reboot is required. The error has been 100% reproducible using the scenario described above. After the Oops, the filesystems can be synced using Alt-Printscreen-S, however attempting to remount readonly using Alt-Printscreen-U caused additional kernel panics. [3.] Keywords (i.e., modules, networking, kernel): NAT, IP masquerading, networking [4.] Kernel version (from /proc/version): Linux version 2.4.6 (root@debian) (gcc version 2.95.4 20010319 (Debian prerelease)) #8 Tue Jul 10 20:50:32 PDT 2001 [5.] Output of Oops.. message (if applicable) with symbolic information resolved (see Documentation/oops-tracing.txt) ksymoops 2.4.1 on i586 2.4.6. Options used -V (default) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.6/ (default) -m /boot/System.map-2.4.6 (default) Warning: You did not tell me where to find symbol information. I will assume that the log matches the kernel and modules that are running right now and I'll use the default options above for symbol resolution. If the current kernel and/or modules do not match the log, you can get more accurate output by telling me the kernel version and where to find map, modules, ksyms etc. ksymoops -h explains the options. Error (regular_file): read_ksyms stat /proc/ksyms failed ksymoops: No such file or directory No modules in ksyms, skipping objects No ksyms, skipping lsmod Unable to handle kernel paging request at virtual address 000078e5 c02393c2 *pde = 00000000 Oops: 0000 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010202 eax: c7c738e0 ebx: 000078e5 ecx: 00000000 edx: 000078e5 esi: c7dfe4e0 edi: c7c73680 ebp: 00000000 esp: c030bdd8 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c030b000) Stack: fffffe00 c023936b c7dfe4e0 fffffe00 c7dfe4e0 c0239923 c7dfe4e0 c7dfe4e0 c12dd800 c62ef120 c12dd800 ffffffe6 c023c807 c7dfe4e0 00000020 c4d4a320 c7dfe4e0 c62ef120 c023fb0f c7dfe4e0 c7dfe4e0 00000000 00000004 c02487ac Call Trace: [] [] [] [] ] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 8b 1b 8b 42 70 83 f8 01 74 0a ff 4a 70 0f 94 c0 84 c0 74 09 >>EIP; c02392c2 <===== Trace; c023936b Trace; c0239923 Trace; c023c807 Trace; c023fb0f Trace; c02487ac Trace; c024883d Trace; c0240a96 Trace; c0245e9c Trace; c0248792 Trace; c02487ac Trace; c0245eea Trace; c0240a96 Trace; c02405dc Trace; c0245e44 Trace; c0245e9c Trace; c0245118 Trace; c0245299 Trace; c0245118 Trace; c0240a96 Trace; c0244f76 Trace; c0245118 Trace; c023ce2d Trace; c0113e3f Trace; c0107e6d Trace; c0105130 Trace; c0106b70 Trace; c0105130 Trace; c0105153 Trace; c01051b7 Trace; c0105000 <_stext+0/0> Code; c02392c2 00000000 <_EIP>: Code; c02392c2 <===== 0: 8b 1b mov (%ebx),%ebx <===== Code; c02392c4 2: 8b 42 70 mov 0x70(%edx),%eax Code; c02392c7 5: 83 f8 01 cmp $0x1,%eax Code; c02392ca 8: 74 0a je 14 <_EIP+0x14> c02392d6 Code; c02392cc a: ff 4a 70 decl 0x70(%edx) Code; c02392cf d: 0f 94 c0 sete %al Code; c02392d2 10: 84 c0 test %al,%al Code; c02392d4 12: 74 09 je 1d <_EIP+0x1d> c02392df Kernel panic: Aiee, killing interrup handler! 1 warning and 1 error issued. Results may not be reliable. [6.] A small shell script or example program which triggers the problem (if possible) # run from host on local network # with Linux box as default gw telnet ftp.freesoftware.com 21 [7.] Environment Linux 2.4.6 IP masquerading firewall shares DSL Internet connection with home LAN. DSL provider uses PPP over Ethernet. PPPoE support in the kernel is being used with a patched version of pppd. Firewall has two network interfaces; ppp0 has a dynamically assigned public IP address and eth1 uses a static private IP adress. IP masquerading is allowed for the local network. Loadable module support is disabled in the kernel for security reasons. [7.1.] Software (add the output of the ver_linux script here) Linux debian 2.4.6 #8 Tue Jul 10 20:50:32 PDT 2001 i586 unknown Gnu C 2.95.4 Gnu make 3.79.1 binutils 2.11.90.0.7 util-linux util-linux Note: /usr/bin/fdformat is obsolete and is no longer available. util-linux Please use /usr/bin/superformat instead (make sure you have the util-linux fdutils package installed first). Also, there had been some util-linux major changes from version 4.x. Please refer to the documentation. util-linux mount 2.10s modutils 2.4.2 e2fsprogs 1.19 reiserfsprogs 3.x.0k-pre8 PPP 2.4.0 Linux C Library 2.2.3 Dynamic linker (ldd) 2.2.3 Procps 2.0.7 Net-tools 1.58 Console-tools 0.2.3 Sh-utils 2.0.11 Modules Loaded [7.2.] Processor information (from /proc/cpuinfo): processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 4 model name : Pentium MMX stepping : 3 cpu MHz : 199.907 fdiv_bug : no hlt_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 mmx bogomips : 398.95 [7.3.] Module information (from /proc/modules): Loadable module support disabled in kernel. [7.4.] Loaded driver and hardware information (/proc/ioports, /proc/iomem) 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 0220-022f : soundblaster 02e8-02ef : serial(set) 02f8-02ff : serial(auto) 0330-0333 : MPU-401 UART 0378-037a : parport0 03c0-03df : vga+ 03c0-03df : matrox 03f8-03ff : serial(set) 0cf8-0cff : PCI conf1 d400-d4ff : Adaptec 7892A d800-d87f : 3Com Corporation 3c900B-TPO [Etherlink XL TPO] d800-d87f : 00:0b.0 e000-e03f : Intel Corporation 82557 [Ethernet Pro 100] e000-e03f : eepro100 e800-e80f : Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] 00000000-0009fbff : System RAM 0009fc00-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000c8000-000c8fff : Extension ROM 000cc000-000d25ff : Extension ROM 000f0000-000fffff : System ROM 00100000-07ffffff : System RAM 00100000-00285619 : Kernel code 0028561a-00309d7b : Kernel data e4800000-e4800fff : Adaptec 7892A e4800000-e4800fff : aic7xxx e5000000-e500007f : 3Com Corporation 3c900B-TPO [Etherlink XL TPO] e5800000-e58fffff : Intel Corporation 82557 [Ethernet Pro 100] e6000000-e6000fff : Intel Corporation 82557 [Ethernet Pro 100] e6000000-e6000fff : eepro100 e6800000-e6803fff : Matrox Graphics, Inc. MGA 2064W [Millennium] e6800000-e6803fff : matroxfb MMIO e7800000-e7ffffff : Matrox Graphics, Inc. MGA 2064W [Millennium] e7800000-e7ffffff : matroxfb FB ffff0000-ffffffff : reserved [7.5.] PCI information ('lspci -vvv' as root) 00:00.0 Host bridge: Intel Corporation 430HX - 82439HX TXC [Triton II] (rev 03) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- [disabled] [size=64K] 00:0a.0 Ethernet controller: Intel Corporation 82557 [Ethernet Pro 100] (rev 08) Subsystem: Intel Corporation EtherExpress PRO/100+ Management Adapter Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=1M] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI+ D1+ D2+ AuxCurrent=0mA PME(D0+,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=2 PME- 00:0b.0 Ethernet controller: 3Com Corporation 3c900B-TPO [Etherlink XL TPO] (rev 04) Subsystem: 3Com Corporation 3C900B-TPO Etherlink XL TPO 10Mb Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=128K] Capabilities: [dc] Power Management version 1 Flags: PMEClk- DSI- D1+ D2+ AuxCurrent=0mA PME(D0-,D1+,D2+,D3hot+,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- 00:0c.0 SCSI storage controller: Adaptec 7892A (rev 02) Subsystem: Adaptec: Unknown device e2a0 Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- [disabled] [size=128K] Capabilities: [dc] Power Management version 2 Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 PME-Enable- DSel=0 DScale=0 PME- [7.6.] SCSI information (from /proc/scsi/scsi) Attached devices: Host: scsi0 Channel: 00 Id: 00 Lun: 00 Vendor: HITACHI Model: DK32CJ-18MC Rev: J6A6 Type: Direct-Access ANSI SCSI revision: 03 - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ From owner-netdev@oss.sgi.com Fri Jul 13 09:22:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6DGMrW21069 for netdev-outgoing; Fri, 13 Jul 2001 09:22:53 -0700 Received: from iisc.ernet.in (mail-relay.iisc.ernet.in [144.16.64.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6DGMmV21064 for ; Fri, 13 Jul 2001 09:22:51 -0700 Received: from eis.iisc.ernet.in (eis.iisc.ernet.in [144.16.64.5]) by iisc.ernet.in (8.9.2/8.9.0) with SMTP id VAA91041 for ; Fri, 13 Jul 2001 21:58:05 +0530 (IST) Received: by eis.iisc.ernet.in (SMI-8.6/SMI-4.1) id VAA10368; Fri, 13 Jul 2001 21:52:42 +0530 From: anand@eis.iisc.ernet.in (SVR Anand) Message-Id: <200107131622.VAA10368@eis.iisc.ernet.in> Subject: Diffserv and NetApplications To: netdev@oss.sgi.com Date: Fri, 13 Jul 2001 21:52:41 +0530 (GMT+05:30) X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1001 Lines: 21 Hi, Considering diffserv as QoS supporting mechanism, I am contemplating on the possibility of making some of the known netapplications like http, telnet, ftp and so on, inherently diffserv aware. By this I mean the applications automatically send out their packets out of the box with appropriate dscp values depending on their QoS requirements. While tc is one way of achieving the same result, it is an external mechanism requiring a seperate agent to configure diffserv. I tend to believe that the QoS requirements are best known to the applications, and the developer of the application knows what QoS support is required from the net. Can I contribute in some way to create new version of the well known apps, by providing diffserv as an option, if enabled, will internally trigger diffserv supportive functional code ? My wish is, all the future netapps from Linux should take advantage of QoS as applicable, naturally. I appreciate your feedback on my initial ideas. Regards Anand From owner-netdev@oss.sgi.com Sat Jul 14 00:18:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6E7Iea27227 for netdev-outgoing; Sat, 14 Jul 2001 00:18:40 -0700 Received: from dmz.ruault.com (adsl-63-193-243-214.dsl.snfc21.pacbell.net [63.193.243.214]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6E7IaV27223 for ; Sat, 14 Jul 2001 00:18:36 -0700 Received: from ruault.com (ce.ruault.com [10.1.0.2]) by dmz.ruault.com (8.11.4/8.11.4) with ESMTP id f6E7JTZ22622; Sat, 14 Jul 2001 00:19:29 -0700 Message-ID: <3B4FF220.29B208AB@ruault.com> Date: Sat, 14 Jul 2001 00:17:53 -0700 From: Charles-Edouard Ruault X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.6 i686) X-Accept-Language: en MIME-Version: 1.0 To: davem@redhat.com, netdev@oss.sgi.com Subject: PROBLEM: kernel oops in skbuff.c when tcp mss>mtu Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 8333 Lines: 238 Hi, i hope i'm sending the report to the right persons, if not please forward to appropritate person. I was upgrading my firewall from kernel 2.2.19 to 2.4.6 today and i found the following problem : i'm using pptp to connect my internal network to my company network. The tunnel is established from my firewall to my company and i'm using masquerading to allow the machines on my home network ( including my firewall ) to connect to the company network. As you certainly know, pptp uses gre to encapsulate the encrypted data. To allow my internal machines to communicate properly with machines on the other side of the tunnel, i have to use the following rules in the forwarding chain of my firewall : ipchains -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu -o ppp+ ipchains -A FORWARD -p tcp --tcp-flags SYN,RST SYN -j TCPMSS --clamp-mss-to-pmtu -i ppp+ without that, the mss that is used in the tcp connection is larger than the Path MTU and large tcp packets are dropped. When rewriting my firewall config file ( migrating from ipchains to iptables ) i made the mistake of specifying my rule allowing forwarding from my internal net to the VPN before the clamp-mss-to-pmtu rule, wich caused the later to be ignored. In this config, the tcp connections establised from my internal net to my company net have a mss larger than the MTU. When a tcp packet larger than the mtu is sent, the kernel crashes and i need to reboot my machine. After fixing the rules and moving the clamp-mss-to-pmtu before the forwading rules, it works fine. This was not occuring with kernel 2.2.19 ( i was using ipchains and the mssclampfw module ). here's the oops ( i had to copy it manually since the whole machine is locked up after the oops and it's not logged anywhere ) this is on kernel 2.4.6 skput:over: c385100e:1405 put: 1504 dev: kernel bug at skbuff.c line 93 invalid operand 0000 cpu: 0 EIP: 0010:[] EFLAGS: 00010296 EAX: 1b EBX: c184d400 ecx: c142c000 edx: 01 esi: c184d400 edi: 5e0 ebp: c11d9d60 esp: c0559e28 ds:18 es:18 ss:18 process: pptp stackpage: c0559000 stack: c01f7a0c c01f7c00 5d c3851016 c184d400 5e0 c3851002 c11d9d60 c163440 c17b05a0 c1437813 c3850a0d c11d9d60 c17b05a0 c11d9d60 c16a3440 call trace: c015c416 c015acef c0156d59 c015ab54 c012ee6a c0106b53 code: 0f 0b 83 c4 0c c3 90 8b 54 24 04 8b 42 18 85 c0 75 05 b8 c7 kernel panic: killing interrupt handler in interrupt handler, not syncing i've tried several times, it crashes everytime with the same error. here's the output of scripts/ver_linux Linux ns.ruault.com 2.4.6 #1 Fri Jul 13 16:26:58 PDT 2001 i586 unknown Gnu C egcs-2.91.66 Gnu make 3.77 binutils 2.11.90.0.8 util-linux 2.10s mount 2.9o modutils 2.4.6 e2fsprogs 1.22 pcmcia-cs 3.0.9 PPP 2.4.0 Linux C Library 2.1.3 Dynamic linker (ldd) 2.1.3 Procps 2.0.2 Net-tools 1.52 Console-tools 1999.03.02 Sh-utils 1.16 Modules Loaded mppe ppp_async ppp_generic slhc eepro100 3c59x my cpuinfo is the following : cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 5 model : 2 model name : Pentium 75 - 200 stepping : 12 cpu MHz : 180.001 fdiv_bug : no hlt_bug : no f00f_bug : yes coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr mce cx8 bogomips : 358.80 moduleinfo is the following : mppe 21984 2 (autoclean) ppp_async 6640 1 (autoclean) ppp_generic 14624 3 (autoclean) [mppe ppp_async] slhc 5056 1 (autoclean) [ppp_generic] eepro100 16400 2 (autoclean) 3c59x 26016 1 (autoclean) ioports: 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0070-007f : rtc 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 01f0-01f7 : ide0 02f8-02ff : serial(auto) 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial(auto) 0cf8-0cff : PCI conf1 6100-613f : Intel Corporation 82557 [Ethernet Pro 100] 6100-613f : eepro100 6200-623f : Intel Corporation 82557 [Ethernet Pro 100] (#2) 6200-623f : eepro100 6300-633f : 3Com Corporation 3c905 100BaseTX [Boomerang] 6300-633f : 00:0a.0 f000-f00f : Intel Corporation 82371SB PIIX3 IDE [Natoma/Triton II] f000-f007 : ide0 f008-f00f : ide1 iomem: 00000000-0009ffff : System RAM 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000c8000-000c8fff : Extension ROM 000c9000-000c9fff : Extension ROM 000f0000-000fffff : System ROM 00100000-02ffffff : System RAM 00100000-001d667f : Kernel code 001d6680-0022227f : Kernel data e0000000-e3ffffff : S3 Inc. 86c325 [ViRGE] e4000000-e40fffff : Intel Corporation 82557 [Ethernet Pro 100] (#2) e4100000-e41fffff : Intel Corporation 82557 [Ethernet Pro 100] e4200000-e4200fff : Intel Corporation 82557 [Ethernet Pro 100] e4200000-e4200fff : eepro100 e4201000-e4201fff : Intel Corporation 82557 [Ethernet Pro 100] (#2) e4201000-e4201fff : eepro100 ffff0000-ffffffff : reserved lspci -vvv 00:00.0 Host bridge: Intel Corporation 430VX - 82437VX TVX [Triton VX] (rev 02) Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- ; Sat, 14 Jul 2001 13:18:59 -0700 Received: (qmail 8123 invoked by uid 99); 14 Jul 2001 20:18:57 -0000 To: netdev@oss.sgi.com Subject: OOPS in tcp_input.c..? (stock redhat, 2.4.2-2) Message-ID: <995141937.3b50a931331ac@mail1.digitalcyclone.com> Date: Sat, 14 Jul 2001 15:18:57 -0500 (CDT) From: Ray Pitmon Cc: rp@digitalcyclone.com MIME-Version: 1.0 Content-Type: text/plain Content-Transfer-Encoding: 8bit X-IMP-Server: Mail1 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5115 Lines: 120 Hi, First time posting here, please let me know if this is right, or if more info is needed. I installed RedHat 7.1 on two similar systems, with the same results. At a random time, maybe after a day or 2, maybe hours, I would get a kernel panic. It started after I started serving up a website (using apache, all static HTML) that produces ~30k/sec in traffic. It was a stock 7.1 install, with all the errata applied. One of the machines had an uptime of a month or so, producing ~10k/sec web traffic, but then died a couple days after adding this additional load. The machines are: Dell Optiplex GX1, 640 MB RAM, P3-733 (coppermine) Dell Optiplex GX1, 640 MB RAM, P3-550 (katmai) (please send any replies to my email as well.. thx, -ray) ksymoops output: [root@web4 /tmp]# cat ksym-out ksymoops 2.4.0 on i686 2.4.2-2. Options used -v /boot/vmlinux-2.4.2-2 (specified) -k /proc/ksyms (default) -l /proc/modules (default) -o /lib/modules/2.4.2-2/ (default) -m /boot/System.map (specified) Warning (compare_maps): ksyms_base symbol __VERSIONED_SYMBOL(shmem_file_setup) not found in vmlinux. Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol partition_name , ksyms_base says c01af860, vmlinux says c0153510. Ignoring ksyms_base entry Warning (compare_maps): mismatch on symbol usb_devfs_handle , usbcore says e088e1a0, /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o says e088dcc0. Ignoring /lib/modules/2.4.2-2/kernel/drivers/usb/usbcore.o entry Unable to handle kernel NULL pointer dereference at virtual address 00000044 c01d549c Oops: 0000 CPU: 0 EIP: 0010:[] Using defaults from ksymoops -t elf32-i386 -a i386 EFLAGS: 00010246 eax: 0063a570 ebx: d12fcbf4 ecx: 00000001 edx: 00000000 esi: 00000000 edi: d12fcac0 ebp: 00000003 esp: c026fcd8 ds: 0018 es: 0018 ss: 0018 Process swapper (pid: 0, stackpage=c026f000) Stack: 0000000c d12fcbf4 00000000 d12fcac0 0000010e c01d61b6 d12fcac0 d12fcbf4 00000000 00000002 7e04a897 d12fcac0 7e04a897 0000010e 00000002 c01d6ac5 d12fcac0 7e04a897 00000002 0000010e 00000002 7e04a897 c1afad00 d12fcbf4 Call Trace: [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] [] Code: 2b 42 44 ba 01 00 00 00 3b 83 84 00 00 00 0f 47 f2 85 f6 b8 >>EIP; c01d549c <===== Trace; c01d61b6 Trace; c01d6ac5 Trace; c01d98f9 Trace; c01d8b0c Trace; c01e249d Trace; c01b93fc Trace; c01dfcc1 Trace; c01dfb1b Trace; c01e00bd Trace; c01c104e Trace; c01ca6eb Trace; c01ca5f0 Trace; c01c12db Trace; c01ca5f0 Trace; c01c1312 Trace; c01ca26b Trace; c01ca5f0 Trace; c01ca8e3 Trace; c01ca760 Trace; c01c12db Trace; c01ca760 Trace; c01c1312 Trace; c01bbf03 Trace; c01ca5bb Trace; c01ca760 Trace; c01bc26a Trace; c010a30a Trace; c0119a8b Trace; c010a4bf Trace; c0107240 Trace; c0107240 Trace; c01090c4 Trace; c0107240 Trace; c0107240 Trace; c0100018 Trace; c0107263 Trace; c01072e2 Trace; c0105000 Trace; c0100191 Code; c01d549c 00000000 <_EIP>: Code; c01d549c <===== 0: 2b 42 44 sub 0x44(%edx),%eax <===== Code; c01d549f 3: ba 01 00 00 00 mov $0x1,%edx Code; c01d54a4 8: 3b 83 84 00 00 00 cmp 0x84(%ebx),%eax Code; c01d54aa e: 0f 47 f2 cmova %edx,%esi Code; c01d54ad 11: 85 f6 test %esi,%esi Code; c01d54af 13: b8 00 00 00 00 mov $0x0,%eax Kernel panic: Aiee, killing interrupt handler! 3 warnings issued. Results may not be reliable. From owner-netdev@oss.sgi.com Sat Jul 14 13:45:54 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6EKjsF31065 for netdev-outgoing; Sat, 14 Jul 2001 13:45:54 -0700 Received: from the-village.bc.nu (router-100M.swansea.linux.org.uk [194.168.151.17]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6EKjoV31062 for ; Sat, 14 Jul 2001 13:45:50 -0700 Received: from alan by the-village.bc.nu with local (Exim 3.22 #1) id 15LWJO-0001j5-00 for netdev@oss.sgi.com; Sat, 14 Jul 2001 21:46:50 +0100 Received: from [128.227.128.108] (helo=smtp.ufl.edu) by the-village.bc.nu with esmtp (Exim 3.22 #1) id 15LWDw-0001id-00 for alan@lxorguk.ukuu.org.uk; Sat, 14 Jul 2001 21:41:13 +0100 Received: from ufl.edu (sp17fe.nerdc.ufl.edu [128.227.128.97]) by smtp.ufl.edu (8.11.2/8.11.3/2.2.1) with ESMTP id f6EKdxX130966 for ; Sat, 14 Jul 2001 16:39:59 -0400 Received: from ufl.edu (labop.ecel.ufl.edu [128.227.232.206]) by ufl.edu (8.9.3/8.9.3/2.3.3) with ESMTP id QAA462958 for ; Sat, 14 Jul 2001 16:39:59 -0400 Message-ID: <3B50B10A.7E1A8DC9@ufl.edu> Date: Sat, 14 Jul 2001 16:52:26 -0400 From: sridhar X-Mailer: Mozilla 4.73 [en]C-CCK-MCD (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: alan@lxorguk.ukuu.org.uk Subject: firewall code in 2.2.19 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 529 Lines: 14 hi alan , have a question about the firewall code in the 2.2.19 kernel. it has 2 functions called register_fw and call_fw. register adds a packet filter to the linked list and call_fw traverses the list and calls the appropriate functions. call_fw doesn't have any semaphores for accessing the linked list. isn't it necessary? register() uses a semaphore to prevent 2 ppl from adding at the same time. pl clarify sridhar first i thot it may be because a process may not be preempted in the kernel mode .. but doesn't seem so. From owner-netdev@oss.sgi.com Sat Jul 14 21:12:12 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6F4CC223175 for netdev-outgoing; Sat, 14 Jul 2001 21:12:12 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6F4CAV23169 for ; Sat, 14 Jul 2001 21:12:10 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id B95DB13D6E; Sun, 15 Jul 2001 16:12:11 +1200 (NZST) Date: Sun, 15 Jul 2001 16:12:11 +1200 From: Chris Wedgwood To: SVR Anand Cc: netdev@oss.sgi.com Subject: Re: Diffserv and NetApplications Message-ID: <20010715161211.J7624@weta.f00f.org> References: <200107131622.VAA10368@eis.iisc.ernet.in> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107131622.VAA10368@eis.iisc.ernet.in> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 914 Lines: 26 On Fri, Jul 13, 2001 at 09:52:41PM +0530, SVR Anand wrote: Considering diffserv as QoS supporting mechanism, I am contemplating on the possibility of making some of the known netapplications like http, telnet, ftp and so on, inherently diffserv aware. MAny already are, I believe telnet and ssh are for example. By this I mean the applications automatically send out their packets out of the box with appropriate dscp values depending on their QoS requirements. While tc is one way of achieving the same result, it is an external mechanism requiring a seperate agent to configure diffserv. My wish is, all the future netapps from Linux should take advantage of QoS as applicable, naturally. As I said, some (many) already do... QoS means different things to different people. Nice idea in theory, in reality its not worth much to acheive what you want. --cw From owner-netdev@oss.sgi.com Sat Jul 14 21:13:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6F4DHN23256 for netdev-outgoing; Sat, 14 Jul 2001 21:13:17 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6F4DGV23253 for ; Sat, 14 Jul 2001 21:13:16 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 43A5413D5B; Sun, 15 Jul 2001 16:13:18 +1200 (NZST) Date: Sun, 15 Jul 2001 16:13:18 +1200 From: Chris Wedgwood To: Ray Pitmon Cc: netdev@oss.sgi.com Subject: Re: OOPS in tcp_input.c..? (stock redhat, 2.4.2-2) Message-ID: <20010715161318.K7624@weta.f00f.org> References: <995141937.3b50a931331ac@mail1.digitalcyclone.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <995141937.3b50a931331ac@mail1.digitalcyclone.com> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 309 Lines: 13 On Sat, Jul 14, 2001 at 03:18:57PM -0500, Ray Pitmon wrote: First time posting here, please let me know if this is right, or if more info is needed. Is thie repeatable? If so, can you try a recent non-redhat kernel, say 2.4.6-pre There have been networking updates since then. --cw From owner-netdev@oss.sgi.com Sun Jul 15 19:11:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6G2BTX14000 for netdev-outgoing; Sun, 15 Jul 2001 19:11:29 -0700 Received: from gollum.w-link.net (dedmons@gollum.w-link.net [206.98.114.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6G2BNV13995 for ; Sun, 15 Jul 2001 19:11:23 -0700 Received: (from dedmons@localhost) by gollum.w-link.net (8.9.0/8.9.0) id TAA08968; Sun, 15 Jul 2001 19:10:48 -0700 (PDT) From: Dale Edmons (Main Dialup) Message-Id: <200107160210.TAA08968@gollum.w-link.net> Subject: K7 unresolved symbols in modules To: mec@shout.net Date: Sun, 15 Jul 2001 19:10:47 -0700 (PDT) Cc: kbuild@lists.sourceforge.net, davem@redhat.com, netdev@oss.sgi.com, torvalds@transmeta.com X-Mailer: ELM [version 2.4 PL25] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 13971 Lines: 379 From: Dale E. Edmons dedmons@w-link.net Subject: Bug Report (First time I've sent one.) CC: mec@shout.net kbuild@lists.sourceforge.net davem@redhat.com netdev@oss.sgi.com torvalds@transmeta.com ------------------------------------------------------------------------------- 1.> Summary Build for AMD K7 Athlon produces many modules that fail to load. Using K6 is a workaround. ------------------------------------------------------------------------------- 2.> Full description ------------------------------------------------------------------------------- Sources: v2.4.4 Clean Source Tree --> v2.4.5 Patch of clean source tree --> v2.4.6 Patch of clean source tree Commands: cp ../linux-2.4.5/.config ../linux-2.4.6 # from unclean make oldconfig # otherwise build failed make menuconfig; make dep; make clean; make bzImage make modules; make modules_install; "lilo -C lilo.conf" On Boot: Many modules have unresolved symbols. The following are most of the affected modules: loop.o, rd.o, cdrom.o, zft_compressor.o, zftape.o, ide-cd.o, 3c59x.o, eepro100.o, hdlcdru.o, soundmodem.o, natsemi.o, ppp_async.o, ppp_deflate.o, ppp_generic.o,... Workaround: Change cpu from to . All modules except nfsd then compile without 'unresolved symbols'. Also, I can 'insmod /lib/modules/2.4.6-pentium/kernel/drivers/net/3c59x.o' and that Pentium module loads (as seems reasonable). ------------------------------------------------------------------------------- 3.> Keywords K7, Athlon, unresolved symbols, K6, networking, menuconfig, xconfig, nfsd, ppp ------------------------------------------------------------------------------- 4.> Kernel Version 2.4.6 Sources: v2.4.4 Clean Source Tree --> v2.4.5 Patch of clean source tree --> v2.4.6 Patch of clean source tree ------------------------------------------------------------------------------- 5.> oops none ------------------------------------------------------------------------------- 6.> script none ------------------------------------------------------------------------------- 7.> Environment Originally Slackware 7 distribution. Numerous packages updated. Some Debian tar.bz2 packages installed. (New web page nice huh?) ------------------------------------------------------------------------------- 7.1> Output from 'sh scripts/ver_linux' /* If some fields are empty or look unusual you may have an old version. Compare to the current minimal requirements in Documentation/Changes. Linux cathy 2.4.6-cathy-bug #9 Sun Jul 15 16:47:40 PDT 2001 i686 unknown Gnu C 2.95.3 Gnu make 3.79.1 binutils 2.9.1.0.25 mount 2.9v modutils 2.4.5 e2fsprogs 1.19 pcmcia-cs 3.0.14 PPP 2.4.0b4 Linux C Library 2.1.2 ldd: version 1.9.9 Procps 2.0.2 Net-tools 1.52 Kbd 0.99 Sh-utils 1.16 Modules Loaded es1371 soundcore gameport ac97_codec */ ------------------------------------------------------------------------------- 7.1> /proc/cpuinfo /* processor : 0 vendor_id : AuthenticAMD cpu family : 6 model : 4 model name : AMD Athlon(tm) Processor stepping : 2 cpu MHz : 1007.353 cache size : 256 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow bogomips : 2011.95 */ ------------------------------------------------------------------------------- 7.3> /proc/modules (eg. The BUG. Only these load.) /* es1371 26256 0 (unused) soundcore 4208 4 [es1371] gameport 1888 0 [es1371] ac97_codec 8736 0 [es1371] */ ------------------------------------------------------------------------------- 7.4> /proc/ioports, /proc/iomem /* 0000-001f : dma1 0020-003f : pic1 0040-005f : timer 0060-006f : keyboard 0070-007f : rtc 0080-008f : dma page reg 00a0-00bf : pic2 00c0-00df : dma2 00f0-00ff : fpu 01f0-01f7 : ide0 02f8-02ff : serial(auto) 03c0-03df : vga+ 03f6-03f6 : ide0 03f8-03ff : serial(auto) 0cf8-0cff : PCI conf1 a800-a8ff : Symbios Logic Inc. (formerly NCR) 53c875 a800-a87f : sym53c8xx b000-b007 : US Robotics/3Com 56K FaxModem Model 5610 b000-b007 : serial(auto) b400-b43f : Ensoniq CT5880 [AudioPCI] b400-b43f : es1371 b800-b83f : 3Com Corporation 3c905 100BaseTX [Boomerang] d400-d40f : Acer Laboratories Inc. [ALi] M5229 IDE d400-d407 : ide0 d408-d40f : ide1 e400-e43f : Acer Laboratories Inc. [ALi] M7101 PMU e800-e81f : Acer Laboratories Inc. [ALi] M7101 PMU */ /* 00000000-0009f7ff : System RAM 0009f800-0009ffff : reserved 000a0000-000bffff : Video RAM area 000c0000-000c7fff : Video ROM 000cc000-000cdfff : Extension ROM 000f0000-000fffff : System ROM 00100000-0ffebfff : System RAM 00100000-00201acb : Kernel code 00201acc-0026a47f : Kernel data 0ffec000-0ffeefff : ACPI Tables 0ffef000-0fffefff : reserved 0ffff000-0fffffff : ACPI Non-volatile Storage f1800000-f1800fff : Symbios Logic Inc. (formerly NCR) 53c875 f2000000-f20000ff : Symbios Logic Inc. (formerly NCR) 53c875 f3000000-f3000fff : Acer Laboratories Inc. [ALi] M5237 USB (#2) f4000000-f4000fff : Acer Laboratories Inc. [ALi] M5237 USB f4800000-f5dfffff : PCI Bus #01 f4800000-f4ffffff : Matrox Graphics, Inc. MGA G400 AGP f5000000-f5003fff : Matrox Graphics, Inc. MGA G400 AGP f5f00000-f7ffffff : PCI Bus #01 f6000000-f7ffffff : Matrox Graphics, Inc. MGA G400 AGP f8000000-fbffffff : PCI device 10b9:1647 (Acer Laboratories Inc. [ALi]) ffff0000-ffffffff : reserved */ ------------------------------------------------------------------------------- 7.5> lspci -vvv /* 00:00.0 Host bridge: Acer Laboratories Inc.: Unknown device 1647 (rev 04) Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- Reset- FastB2B- 00:02.0 USB Controller: Acer Laboratories Inc. M5237 (rev 03) (prog-if 10) Subsystem: Unknown device 10b9:5237 Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV+ VGASnoop- ParErr- Stepping- SERR- FastB2B- Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- TAbort- SERR- SCSI /* CdD attAched devices: Host: scsi0 Channel: 00 Id: 01 Lun: 00 Vendor: SEAGATE Model: ST318203LC Rev: 0003 Type: Direct-Access ANSI SCSI revision: 02 */ ------------------------------------------------------------------------------- 7.7> Other ------------------------------------------------------------------------------- 8.> Notes, Patches, workarounds. 8.1 Running 'make menuconfig' and changing from Athlon to K6 allows the kernel to compile correctly (except nfsd.o). ------------------------------------------------------------------------------- 8.2 Motherboard ASUS, BIOS as delivered (upgrade not possible since manufacture does not seem to support linux). I don't even run as much as DOSEMU or WINE on the basis that if Linux is a standalone system it should not need a crutch. 8.3 Nothing to do with this bug report, but v2.4.6 pcmcia actually works on my thinkpad 240! I very much don't like having to patch the kernel and *support* having "extras" included into the kernel (bloating). Now, if only RTAI could be included.... (Not rtlinux.) As always, many many thanks to all Linux and GNU developers etc.... 8.4 My first linux version was (>) 1.0.19/1.2.3 and I've been hooked ever since. ------------------------------------------------------------------------------- Thanks in advance for looking at this bug report. No reply is required but would be appreciated. I may be contacted (when my server does the thing right) at: dedmons@w-link.net ------------------------------------------------------------------------------- Dale E. Edmons From owner-netdev@oss.sgi.com Sun Jul 15 19:40:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6G2eMJ16604 for netdev-outgoing; Sun, 15 Jul 2001 19:40:22 -0700 Received: from lacrosse.corp.redhat.com (host154.207-175-42.redhat.com [207.175.42.154]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6G2eIV16601 for ; Sun, 15 Jul 2001 19:40:19 -0700 Received: from toomuch.toronto.redhat.com (toomuch.toronto.redhat.com [172.16.14.22]) by lacrosse.corp.redhat.com (8.9.3/8.9.3) with ESMTP id WAA01444; Sun, 15 Jul 2001 22:40:17 -0400 Date: Sun, 15 Jul 2001 22:37:03 -0400 (EDT) From: Ben LaHaise X-X-Sender: To: cc: Subject: [PATCH] skb_{over,under}_panic optimizations Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 6545 Lines: 167 Hello Dave & all, Below is a patch that moves the skb_{over,under}_panic function calls in skb_put and skb_push out of line on i386 and hopefully improves register allocation. If this turns out to be useful, there are bunch of other places we can move these kinds useful debugging assertions out of line. Comments? -ben diff -urN v2.4.6/include/asm-alpha/skbuff.h linux-skbhacks1/include/asm-alpha/skbuff.h --- v2.4.6/include/asm-alpha/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-alpha/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-arm/skbuff.h linux-skbhacks1/include/asm-arm/skbuff.h --- v2.4.6/include/asm-arm/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-arm/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-cris/skbuff.h linux-skbhacks1/include/asm-cris/skbuff.h --- v2.4.6/include/asm-cris/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-cris/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-generic/skbuff.h linux-skbhacks1/include/asm-generic/skbuff.h --- v2.4.6/include/asm-generic/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-generic/skbuff.h Sun Jul 15 10:54:15 2001 @@ -0,0 +1,16 @@ +#ifndef ASM_GENERIC__SKBUF_H +#define ASM_GENERIC__SKBUF_H + +#define arch_skb_under_check(skb, len) \ + do {\ + if(skb->datahead)\ + skb_under_panic(skb, len, current_text_addr());\ + } while(0) + +#define arch_skb_over_check(skb, len) \ + do {\ + if(skb->tail>skb->end)\ + skb_over_panic(skb, len, current_text_addr());\ + } while(0) + +#endif /*ASM_GENERIC__SKBUF_H*/ diff -urN v2.4.6/include/asm-i386/skbuff.h linux-skbhacks1/include/asm-i386/skbuff.h --- v2.4.6/include/asm-i386/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-i386/skbuff.h Sun Jul 15 21:58:01 2001 @@ -0,0 +1,24 @@ +#ifndef ASM_I386__SKBUF_H +#define ASM_I386__SKBUF_H + +#define __arch_skb_check(insn, left, right, skb, why, len) \ + __asm__ __volatile__( \ + " cmpl %1,%0\n" \ + "1: " insn " 2f\n" \ + ".section .text.lock,\"ax\"\n" \ + "2: pushl 1b\n" \ + " pushl %3\n" \ + " pushl %2\n" \ + " call skb_" why "_panic\n" \ + " ud2a\n" \ + ".previous\n" \ + : : "r" (left), "rm" (right), "rmi" (skb), "rmi" (len) \ + : "cc" ) + +#define arch_skb_under_check(skb, len) \ + __arch_skb_check("jl", skb->data, skb->head, skb, "under", len) + +#define arch_skb_over_check(skb, len) \ + __arch_skb_check("jg", skb->tail, skb->end, skb, "over", len) + +#endif /*ASM_I386__SKBUF_H*/ diff -urN v2.4.6/include/asm-ia64/skbuff.h linux-skbhacks1/include/asm-ia64/skbuff.h --- v2.4.6/include/asm-ia64/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-ia64/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-m68k/skbuff.h linux-skbhacks1/include/asm-m68k/skbuff.h --- v2.4.6/include/asm-m68k/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-m68k/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-mips/skbuff.h linux-skbhacks1/include/asm-mips/skbuff.h --- v2.4.6/include/asm-mips/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-mips/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-mips64/skbuff.h linux-skbhacks1/include/asm-mips64/skbuff.h --- v2.4.6/include/asm-mips64/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-mips64/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-parisc/skbuff.h linux-skbhacks1/include/asm-parisc/skbuff.h --- v2.4.6/include/asm-parisc/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-parisc/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-ppc/skbuff.h linux-skbhacks1/include/asm-ppc/skbuff.h --- v2.4.6/include/asm-ppc/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-ppc/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-s390/skbuff.h linux-skbhacks1/include/asm-s390/skbuff.h --- v2.4.6/include/asm-s390/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-s390/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-s390x/skbuff.h linux-skbhacks1/include/asm-s390x/skbuff.h --- v2.4.6/include/asm-s390x/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-s390x/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-sh/skbuff.h linux-skbhacks1/include/asm-sh/skbuff.h --- v2.4.6/include/asm-sh/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-sh/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-sparc/skbuff.h linux-skbhacks1/include/asm-sparc/skbuff.h --- v2.4.6/include/asm-sparc/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-sparc/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/asm-sparc64/skbuff.h linux-skbhacks1/include/asm-sparc64/skbuff.h --- v2.4.6/include/asm-sparc64/skbuff.h Wed Dec 31 19:00:00 1969 +++ linux-skbhacks1/include/asm-sparc64/skbuff.h Sun Jul 15 10:12:37 2001 @@ -0,0 +1 @@ +#include diff -urN v2.4.6/include/linux/skbuff.h linux-skbhacks1/include/linux/skbuff.h --- v2.4.6/include/linux/skbuff.h Tue Jul 3 18:43:04 2001 +++ linux-skbhacks1/include/linux/skbuff.h Sun Jul 15 10:56:16 2001 @@ -22,6 +22,7 @@ #include #include +#include #include #include #include @@ -787,9 +788,7 @@ SKB_LINEAR_ASSERT(skb); skb->tail+=len; skb->len+=len; - if(skb->tail>skb->end) { - skb_over_panic(skb, len, current_text_addr()); - } + arch_skb_over_check(skb, len); return tmp; } @@ -814,9 +813,7 @@ { skb->data-=len; skb->len+=len; - if(skb->datahead) { - skb_under_panic(skb, len, current_text_addr()); - } + arch_skb_under_check(skb, len); return skb->data; } -- "The world would be a better place if Larry Wall had been born in Iceland, or any other country where the native language actually has syntax" -- Peter da Silva From owner-netdev@oss.sgi.com Sun Jul 15 20:12:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6G3CTW19189 for netdev-outgoing; Sun, 15 Jul 2001 20:12:29 -0700 Received: from deliverator.sgi.com (deliverator.sgi.com [204.94.214.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6G3CSV19186 for ; Sun, 15 Jul 2001 20:12:28 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by deliverator.sgi.com (980309.SGI.8.8.8-aspam-6.2/980310.SGI-aspam) via SMTP id UAA06974 for ; Sun, 15 Jul 2001 20:12:17 -0700 (PDT) mail_from (kaos@ocs.com.au) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA20438; Mon, 16 Jul 2001 13:11:06 +1000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Ben LaHaise cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] skb_{over,under}_panic optimizations In-reply-to: Your message of "Sun, 15 Jul 2001 22:37:03 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 16 Jul 2001 13:11:05 +1000 Message-ID: <22936.995253065@kao2.melbourne.sgi.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 947 Lines: 29 On Sun, 15 Jul 2001 22:37:03 -0400 (EDT), Ben LaHaise wrote: >Below is a patch that moves the skb_{over,under}_panic function calls in >skb_put and skb_push out of line on i386 and hopefully improves register >allocation. >+#define __arch_skb_check(insn, left, right, skb, why, len) \ >+ __asm__ __volatile__( \ >+ " cmpl %1,%0\n" \ >+ "1: " insn " 2f\n" \ >+ ".section .text.lock,\"ax\"\n" \ >+ "2: pushl 1b\n" \ >+ " pushl %3\n" \ >+ " pushl %2\n" \ >+ " call skb_" why "_panic\n" \ >+ " ud2a\n" \ >+ ".previous\n" \ >+ : : "r" (left), "rm" (right), "rmi" (skb), "rmi" (len) \ >+ : "cc" ) 'call skb_" why "_panic' will break using module symbol versions, skb_*_panic symbols are exported so all references to those symbols need to be visible to cpp, not inside string quotes. This should work (untested). " call %c4\n" \ .... : : "r" (left), "rm" (right), "rmi" (skb), "rmi" (len) \ "i" (skb_ ## why ## _panic) \ From owner-netdev@oss.sgi.com Sun Jul 15 20:22:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6G3M3819516 for netdev-outgoing; Sun, 15 Jul 2001 20:22:03 -0700 Received: from pneumatic-tube.sgi.com (pneumatic-tube.sgi.com [204.94.214.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6G3M2V19511 for ; Sun, 15 Jul 2001 20:22:02 -0700 Received: from larry.melbourne.sgi.com (larry.melbourne.sgi.com [134.14.52.130]) by pneumatic-tube.sgi.com (980327.SGI.8.8.8-aspam/980310.SGI-aspam) via SMTP id UAA07140 for ; Sun, 15 Jul 2001 20:19:32 -0700 (PDT) mail_from (kaos@ocs.com.au) Received: from kao2.melbourne.sgi.com (kao2.melbourne.sgi.com [134.14.55.180]) by larry.melbourne.sgi.com (950413.SGI.8.6.12/950213.SGI.AUTOCF) via ESMTP id NAA20480; Mon, 16 Jul 2001 13:20:43 +1000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: Ben LaHaise cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] skb_{over,under}_panic optimizations In-reply-to: Your message of "Sun, 15 Jul 2001 22:37:03 -0400." Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Mon, 16 Jul 2001 13:20:43 +1000 Message-ID: <23105.995253643@kao2.melbourne.sgi.com> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1498 Lines: 45 On Sun, 15 Jul 2001 22:37:03 -0400 (EDT), Ben LaHaise wrote: >Below is a patch that moves the skb_{over,under}_panic function calls in >skb_put and skb_push out of line on i386 and hopefully improves register >allocation. >+#define __arch_skb_check(insn, left, right, skb, why, len) \ >+ __asm__ __volatile__( \ >+ " cmpl %1,%0\n" \ >+ "1: " insn " 2f\n" \ >+ ".section .text.lock,\"ax\"\n" \ >+ "2: pushl 1b\n" \ >+ " pushl %3\n" \ >+ " pushl %2\n" \ >+ " call skb_" why "_panic\n" \ >+ " ud2a\n" \ >+ ".previous\n" \ >+ : : "r" (left), "rm" (right), "rmi" (skb), "rmi" (len) \ >+ : "cc" ) Oops, pressed send too soon. Because you are putting the code in .text.lock which is an unusual section, it will make debugging easier if you make the skb code look like the existing lock code, where possible. In particular include a branch back to the mainline code, existing debuggers look for the return branch to determine where the "lock" code was entered from. So the whole asm code is #define __arch_skb_check(insn, left, right, skb, why, len) \ __asm__ __volatile__( \ " cmpl %1,%0\n" \ "1: " insn " 2f\n" \ ".section .text.lock,\"ax\"\n" \ "2: pushl 1b\n" \ " pushl %3\n" \ " pushl %2\n" \ " call %c4\n" \ " ud2a\n" \ " jmp 1b\n" \ ".previous\n" \ : : "r" (left), "rm" (right), "rmi" (skb), "rmi" (len) \ "i" (skb_ ## why ## _panic) \ : "cc" ) The jmp 1b should never be executed so is a small waste of space but the debugging improvements are worth it. From owner-netdev@oss.sgi.com Mon Jul 16 05:00:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GC00O26514 for netdev-outgoing; Mon, 16 Jul 2001 05:00:00 -0700 Received: from popeye.ipv6.univ-nantes.fr (postfix@popeye.ipv6.univ-nantes.fr [193.52.101.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GBxxV26498 for ; Mon, 16 Jul 2001 04:59:59 -0700 Received: from IPv6.univ-nantes.fr (olive.ipv6.univ-nantes.fr [193.52.101.22]) by popeye.ipv6.univ-nantes.fr (Postfix) with ESMTP id C610367D for ; Mon, 16 Jul 2001 13:59:56 +0200 (CEST) Message-ID: <3B52D73C.4060004@IPv6.univ-nantes.fr> Date: Mon, 16 Jul 2001 13:59:56 +0200 From: Yann Dupont User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2+) Gecko/20010715 X-Accept-Language: en-us MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: dst cache overflow on 2.2.16 kernel. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 686 Lines: 19 Hello. We have a firewall here (Checkpoint FW 1), installed on a RH 6.2 Every week or so, the FW logs this error : dst cache overflow and the routing stops. Is changing the value of /proc/sys/net/ipv4/route (actually set to 4096) a way to prevent this ? OR is this a kernel bug with this 2.2.16 realease ? I CAN'T change the kernel, nor the distro, as the whole is under contract ... and validated for this exact combination :( Yann Dupont. -- \|/ ____ \|/ Fac. des sciences de Nantes-Linux-Python-IPv6-ATM-BONOM.... "@'/ ,. \@" Tel :(+33) [0]251125865 [0]251125868(Fax) /_| \__/ |_\ Yann.Dupont@sciences.univ-nantes.fr \__U_/ http://www.unantes.univ-nantes.fr/~dupont From owner-netdev@oss.sgi.com Mon Jul 16 06:11:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GDBbC32746 for netdev-outgoing; Mon, 16 Jul 2001 06:11:37 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GDBZV32740 for ; Mon, 16 Jul 2001 06:11:36 -0700 Received: (qmail 14436 invoked by uid 99); 16 Jul 2001 13:11:27 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 16 Jul 2001 13:11:27 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 57B8AE2D4B; Mon, 16 Jul 2001 15:11:03 +0200 (CEST) Date: Mon, 16 Jul 2001 15:11:03 +0200 From: Andi Kleen To: Ben LaHaise Cc: davem@redhat.com, netdev@oss.sgi.com Subject: Re: [PATCH] skb_{over,under}_panic optimizations Message-ID: <20010716151103.A945@fred.local> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: ; from bcrl@redhat.com on Mon, Jul 16, 2001 at 04:37:03AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 564 Lines: 17 On Mon, Jul 16, 2001 at 04:37:03AM +0200, Ben LaHaise wrote: > Hello Dave & all, > > Below is a patch that moves the skb_{over,under}_panic function calls in > skb_put and skb_push out of line on i386 and hopefully improves register > allocation. If this turns out to be useful, there are bunch of other > places we can move these kinds useful debugging assertions out of line. > Comments? How ugly. Does this really make any benchmarkable difference? (very much doubting it) If you really care about such microoptimization just use __builtin_expect. -Andi From owner-netdev@oss.sgi.com Mon Jul 16 06:18:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GDIYJ00961 for netdev-outgoing; Mon, 16 Jul 2001 06:18:34 -0700 Received: from iisc.ernet.in (iisc.ernet.in [144.16.64.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GDIRV00954 for ; Mon, 16 Jul 2001 06:18:31 -0700 Received: from eis.iisc.ernet.in (eis.iisc.ernet.in [144.16.64.5]) by iisc.ernet.in (8.9.2/8.9.0) with SMTP id SAA43765; Mon, 16 Jul 2001 18:53:28 +0530 (IST) Received: by eis.iisc.ernet.in (SMI-8.6/SMI-4.1) id SAA17536; Mon, 16 Jul 2001 18:48:08 +0530 From: anand@eis.iisc.ernet.in (SVR Anand) Message-Id: <200107161318.SAA17536@eis.iisc.ernet.in> Subject: Re: Diffserv and NetApplications To: cw@f00f.org (Chris Wedgwood) Date: Mon, 16 Jul 2001 18:48:07 +0530 (GMT+05:30) Cc: netdev@oss.sgi.com In-Reply-To: <20010715161211.J7624@weta.f00f.org> from "Chris Wedgwood" at Jul 15, 2001 04:12:11 PM X-Mailer: ELM [version 2.5 PL2] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2165 Lines: 56 Chris Wedgwood, Thanks for your reply. I would certainly be happy to visit the sites that have tcp/udp applications using diffserv QoS mechanism. Can you please pass it on ? I searched on Google without much success :( With respect to your other point of the practicality of the proposed QoS, all I have to say is that it is atleast good to have QoS provisioning rather than not having at all. Those who can make use of it, let them use it. What I am suggesting is an augmentation to what is already available rather than a replacement. At least in one case, I can clearly see advantage of adaptively using diffserv dscp values in the http servers to serve the urls in an intelligent and responsive way to its clients. The critiria for choosing these values can be part of server configuration, or run-time. Now, if the network wants to remark or drop the packets, let it. The server conveyed its priorities. What is the big point of having architecture, and no one to use it ? Don't you think we have to make a beginning somewhere ? Since you pointed out that there are indeed applications that takes advantage of QoS (if it exists! :)) I will try to hunt for them. Regards Anand > > On Fri, Jul 13, 2001 at 09:52:41PM +0530, SVR Anand wrote: > > Considering diffserv as QoS supporting mechanism, I am > contemplating on the possibility of making some of the known > netapplications like http, telnet, ftp and so on, inherently > diffserv aware. > > MAny already are, I believe telnet and ssh are for example. > > By this I mean the applications automatically send > out their packets out of the box with appropriate dscp values > depending on their QoS requirements. While tc is one way of > achieving the same result, it is an external mechanism requiring a > seperate agent to configure diffserv. > > My wish is, all the future netapps > from Linux should take advantage of QoS as applicable, naturally. > > As I said, some (many) already do... QoS means different things to > different people. Nice idea in theory, in reality its not worth much > to acheive what you want. > > > > > --cw > From owner-netdev@oss.sgi.com Mon Jul 16 06:31:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GDV3G02051 for netdev-outgoing; Mon, 16 Jul 2001 06:31:03 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GDV1V02045 for ; Mon, 16 Jul 2001 06:31:01 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 6B6862856; Tue, 17 Jul 2001 01:31:00 +1200 (NZST) Date: Tue, 17 Jul 2001 01:31:00 +1200 From: Chris Wedgwood To: SVR Anand Cc: netdev@oss.sgi.com Subject: Re: Diffserv and NetApplications Message-ID: <20010717013100.A12401@weta.f00f.org> References: <200107161318.SAA17536@eis.iisc.ernet.in> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107161318.SAA17536@eis.iisc.ernet.in> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1944 Lines: 50 On Mon, Jul 16, 2001 at 06:48:07PM +0530, SVR Anand wrote: Thanks for your reply. I would certainly be happy to visit the sites that have tcp/udp applications using diffserv QoS mechanism. Can you please pass it on ? I searched on Google without much success :( No idea, I just noticed the QoS bits are sometimes set when using tcpdump for telnet and ssh. I don't think any special version exist, just grab the source and take as look. With respect to your other point of the practicality of the proposed QoS, all I have to say is that it is atleast good to have QoS provisioning rather than not having at all. But this means different things to different people. I know of many uses for QoS bits, that aren't strictly related to QoS issues (such as tagging traffic on ingress to specific parts on an AS, for internal metering purposes). At least in one case, I can clearly see advantage of adaptively using diffserv dscp values in the http servers to serve the urls in an intelligent and responsive way to its clients. And the upstream provider and go and changes these value on you, ignoring your QoS requests :) What is the big point of having architecture, and no one to use it? QoS has great intentions, they just never really happenned for lack of good core support, which we are only just beginning to see appear now. Don't you think we have to make a beginning somewhere? Indeed, but I think QoS has almost missed the boat completely. With the rise of MPLS (be that good or bad), QoS might be finally become something useful, or die completely. Core routers now support LSP selection and even in the non-MPLS case queue selection (for use with WRED) based on TOS bits. (Actually, this is a lie, Juniper support this, I don't think IOS does yet, and if it does, I doubt it will at line-speed, but still, thats the majority of the emerginghigh-end anyhow). --cw From owner-netdev@oss.sgi.com Mon Jul 16 06:35:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GDZNp02539 for netdev-outgoing; Mon, 16 Jul 2001 06:35:23 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GDZMV02536 for ; Mon, 16 Jul 2001 06:35:22 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 233BA285D; Tue, 17 Jul 2001 01:35:26 +1200 (NZST) Date: Tue, 17 Jul 2001 01:35:26 +1200 From: Chris Wedgwood To: Yann Dupont Cc: netdev@oss.sgi.com Subject: Re: dst cache overflow on 2.2.16 kernel. Message-ID: <20010717013526.B12401@weta.f00f.org> References: <3B52D73C.4060004@IPv6.univ-nantes.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3B52D73C.4060004@IPv6.univ-nantes.fr> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 178 Lines: 10 On Mon, Jul 16, 2001 at 01:59:56PM +0200, Yann Dupont wrote: Every week or so, the FW logs this error : dst cache overflow and the routing stops. is lo down? --cw From owner-netdev@oss.sgi.com Mon Jul 16 07:13:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GEDYh05033 for netdev-outgoing; Mon, 16 Jul 2001 07:13:34 -0700 Received: from popeye.ipv6.univ-nantes.fr (postfix@popeye.ipv6.univ-nantes.fr [193.52.101.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GEDWV05030 for ; Mon, 16 Jul 2001 07:13:32 -0700 Received: from IPv6.univ-nantes.fr (olive.ipv6.univ-nantes.fr [193.52.101.22]) by popeye.ipv6.univ-nantes.fr (Postfix) with ESMTP id 86D8467D; Mon, 16 Jul 2001 16:13:31 +0200 (CEST) Message-ID: <3B52F68B.9050504@IPv6.univ-nantes.fr> Date: Mon, 16 Jul 2001 16:13:31 +0200 From: Yann Dupont User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7-pre5 i686; en-US; rv:0.9.1) Gecko/20010620 X-Accept-Language: en-us MIME-Version: 1.0 To: netdev@oss.sgi.com, cw@f00f.org Subject: Re: dst cache overflow on 2.2.16 kernel. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 560 Lines: 30 Hello... lo is up and running.... Now, I don't know if some rules by default of fw-1 make it looks like lo is down ... Yann Dupont. -- On Mon, Jul 16, 2001 at 01:59:56PM +0200, Yann Dupont wrote: Every week or so, the FW logs this error : dst cache overflow and the routing stops. is lo down? --cw -- \|/ ____ \|/ Fac. des sciences de Nantes-Linux-Python-IPv6-ATM-BONOM.... "@'/ ,. \@" Tel :(+33) [0]251125865 [0]251125868(Fax) /_| \__/ |_\ Yann.Dupont@sciences.univ-nantes.fr \__U_/ http://www.unantes.univ-nantes.fr/~dupont From owner-netdev@oss.sgi.com Mon Jul 16 07:19:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GEJMY05572 for netdev-outgoing; Mon, 16 Jul 2001 07:19:22 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GEJLV05568 for ; Mon, 16 Jul 2001 07:19:21 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id B62EF289B; Tue, 17 Jul 2001 02:19:25 +1200 (NZST) Date: Tue, 17 Jul 2001 02:19:25 +1200 From: Chris Wedgwood To: Yann Dupont Cc: netdev@oss.sgi.com Subject: Re: dst cache overflow on 2.2.16 kernel. Message-ID: <20010717021925.B12532@weta.f00f.org> References: <3B52F68B.9050504@IPv6.univ-nantes.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3B52F68B.9050504@IPv6.univ-nantes.fr> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 350 Lines: 16 On Mon, Jul 16, 2001 at 04:13:31PM +0200, Yann Dupont wrote: lo is up and running.... so ping 127.0.0.1 works? Now, I don't know if some rules by default of fw-1 make it looks like lo is down ... I'm not sure I follow here, are you saying fw-1 makes it look like lo is down, or that perhaps it does because of my question? --cw From owner-netdev@oss.sgi.com Mon Jul 16 07:37:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GEbna07072 for netdev-outgoing; Mon, 16 Jul 2001 07:37:49 -0700 Received: from popeye.ipv6.univ-nantes.fr (postfix@popeye.ipv6.univ-nantes.fr [193.52.101.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GEbmV07069 for ; Mon, 16 Jul 2001 07:37:48 -0700 Received: from IPv6.univ-nantes.fr (olive.ipv6.univ-nantes.fr [193.52.101.22]) by popeye.ipv6.univ-nantes.fr (Postfix) with ESMTP id 3642A67D; Mon, 16 Jul 2001 16:37:47 +0200 (CEST) Message-ID: <3B52FC3B.7050107@IPv6.univ-nantes.fr> Date: Mon, 16 Jul 2001 16:37:47 +0200 From: Yann Dupont User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2+) Gecko/20010715 X-Accept-Language: en-us MIME-Version: 1.0 To: cw@f00f.org Cc: netdev@oss.sgi.com Subject: Re: dst cache overflow on 2.2.16 kernel. Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1019 Lines: 34 >On Mon, Jul 16, 2001 at 04:13:31PM +0200, Yann Dupont wrote: > > lo is up and running.... > >so ping 127.0.0.1 works? > yes. > Now, I don't know if some rules by default of fw-1 make it looks > like lo is down ... > >I'm not sure I follow here, are you saying fw-1 makes it look like lo >is down, or that perhaps it does because of my question? Sorry for my poor english ... fw-1 is a module that sits beetwen the IP layer & the interface (Well. I think it perform like this. Mayne I'm totally wrong here) So even if I can ping 127.0.0.1, Maybe it blocks acces to lo for some other programs. As the firewall as been installed by a 3rd party - And not very well installed, because there are LOTS of daemons still there - I can't say... Yann Dupont. -- \|/ ____ \|/ Fac. des sciences de Nantes-Linux-Python-IPv6-ATM-BONOM.... "@'/ ,. \@" Tel :(+33) [0]251125865 [0]251125868(Fax) /_| \__/ |_\ Yann.Dupont@sciences.univ-nantes.fr \__U_/ http://www.unantes.univ-nantes.fr/~dupont From owner-netdev@oss.sgi.com Mon Jul 16 07:52:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GEqm308333 for netdev-outgoing; Mon, 16 Jul 2001 07:52:48 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GEqlV08329 for ; Mon, 16 Jul 2001 07:52:47 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 937C228BF; Tue, 17 Jul 2001 02:52:51 +1200 (NZST) Date: Tue, 17 Jul 2001 02:52:51 +1200 From: Chris Wedgwood To: Yann Dupont Cc: netdev@oss.sgi.com Subject: Re: dst cache overflow on 2.2.16 kernel. Message-ID: <20010717025251.A12614@weta.f00f.org> References: <3B52FC3B.7050107@IPv6.univ-nantes.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <3B52FC3B.7050107@IPv6.univ-nantes.fr> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 781 Lines: 27 On Mon, Jul 16, 2001 at 04:37:47PM +0200, Yann Dupont wrote: Sorry for my poor english ... Mine is probably worse. fw-1 is a module that sits beetwen the IP layer & the interface (Well. I think it perform like this. Mayne I'm totally wrong here) I assume it works this way. So even if I can ping 127.0.0.1, Maybe it blocks acces to lo for some other programs. Depending on the rules, it may well do so. I don't think that is the cause here, its just the reported error I have seen as the result of there being no loopback device. As the firewall as been installed by a 3rd party - And not very well installed, because there are LOTS of daemons still there - I can't say... Does the problem still occur with the firewall unloaded? --cw From owner-netdev@oss.sgi.com Mon Jul 16 15:00:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GM0Rm00637 for netdev-outgoing; Mon, 16 Jul 2001 15:00:27 -0700 Received: from mail2.lsil.com (mail2.lsil.com [147.145.40.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GM0QV00633 for ; Mon, 16 Jul 2001 15:00:26 -0700 Received: from mhbs.lsil.com (mhbs [147.145.31.100]) by mail2.lsil.com (8.9.3+Sun/8.9.1) with ESMTP id PAA03951 for ; Mon, 16 Jul 2001 15:00:25 -0700 (PDT) Received: from [153.79.12.11] by mhbs.lsil.com with ESMTP; Mon, 16 Jul 2001 15:00:07 -0700 Received: from lsil.com (nromernt.ks.lsil.com [153.79.8.107]) by exw-ks.ks.lsil.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id NKW2X27F; Mon, 16 Jul 2001 16:59:24 -0500 Message-Id: <3B5355AB.C5D62090@lsil.com> Date: Mon, 16 Jul 2001 16:59:24 -0400 From: Noah Romer Organization: LSI Logic X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev CC: "Romer, Noah" Subject: hard_start_xmit behavior Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1128 Lines: 24 I'm tracking down some obscure bugs in the mpt fusion network driver and am wondering about the correct behavior of hard_start_xmit implementations. Specifically, if the network stack gives you a packet to transmit and you immediately know you're not going to be able to do so (in my case, when I'm unable to get a message frame so I can talk with the board, a rare condition), what's the expected handling of the skb and what does the network stack want to see as the return value? The existing behavior of the driver is to call netif_stop_queu, call dev_kfree_skb, and then return a non-zero value. From the results I'm seeing, calling dev_kfree_skb isn't a good thing. Should I be placing the skb on an internal queue and retrying once I've got the resources to do so or just leave it alone and let the network stack deal with it? I haven't yet found anything to indicate that the return value of hard_start_xmit is checked for anything other than 0, does it make any diff what non-zero value is used? Thanks. -- Noah Romer Driver Developer, CM gopher and Linux Whipping Boy Storage Components Firmware LSI Logic Corp. From owner-netdev@oss.sgi.com Mon Jul 16 15:08:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GM8gR01062 for netdev-outgoing; Mon, 16 Jul 2001 15:08:42 -0700 Received: from havoc.gtf.org (IDENT:postfix@panic.ohr.gatech.edu [130.207.47.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GM8eV01057 for ; Mon, 16 Jul 2001 15:08:40 -0700 Received: from mandrakesoft.com (adsl-20-73-169.asm.bellsouth.net [66.20.73.169]) by havoc.gtf.org (Postfix) with ESMTP id 9867A1F73; Mon, 16 Jul 2001 18:08:38 -0400 (EDT) Message-ID: <3B536600.B6D09BFA@mandrakesoft.com> Date: Mon, 16 Jul 2001 18:09:04 -0400 From: Jeff Garzik Organization: MandrakeSoft X-Mailer: Mozilla 4.77 [en] (X11; U; Linux 2.4.7-pre5 i686) X-Accept-Language: en MIME-Version: 1.0 To: Noah Romer Cc: netdev Subject: Re: hard_start_xmit behavior References: <3B5355AB.C5D62090@lsil.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1782 Lines: 42 Noah Romer wrote: > I'm tracking down some obscure bugs in the mpt fusion network driver and am > wondering about the correct behavior of hard_start_xmit implementations. > Specifically, if the network stack gives you a packet to transmit and you > immediately know you're not going to be able to do so (in my case, when I'm > unable to get a message frame so I can talk with the board, a rare condition), > what's the expected handling of the skb and what does the network stack want to > see as the return value? > > The existing behavior of the driver is to call netif_stop_queu, call > dev_kfree_skb, and then return a non-zero value. From the results I'm seeing, > calling dev_kfree_skb isn't a good thing. Should I be placing the skb on an > internal queue and retrying once I've got the resources to do so or just leave > it alone and let the network stack deal with it? I haven't yet found anything > to indicate that the return value of hard_start_xmit is checked for anything > other than 0, does it make any diff what non-zero value is used? It sounds like the original author was a bit confused :) Is it possible to decide whether there is room to transmit more, after you queue a packet? If so, the hard_start_xmit logic is ...send 1 skb to hardware... if (no more tx buffers available) netif_stop_queue If it is not possible to use this scheme, then the driver's existing logic simply needs to remove the dev_kfree_skb: if (no more tx buffers available) { netif_stop_queue(dev); return 1; } ...send 1 skb to hardware... If your driver supports fragmented skbs, then the latter logic is what must be used. -- Jeff Garzik | A recent study has shown that too much soup Building 1024 | can cause malaise in laboratory mice. MandrakeSoft | From owner-netdev@oss.sgi.com Mon Jul 16 15:28:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GMS0r01716 for netdev-outgoing; Mon, 16 Jul 2001 15:28:00 -0700 Received: from mail2.lsil.com (mail2.lsil.com [147.145.40.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GMRxV01712 for ; Mon, 16 Jul 2001 15:27:59 -0700 Received: from mhbs.lsil.com (mhbs [147.145.31.100]) by mail2.lsil.com (8.9.3+Sun/8.9.1) with ESMTP id PAA08689 for ; Mon, 16 Jul 2001 15:27:57 -0700 (PDT) Received: from [153.79.12.11] by mhbs.lsil.com with ESMTP; Mon, 16 Jul 2001 15:27:50 -0700 Received: from lsil.com (nromernt.ks.lsil.com [153.79.8.107]) by exw-ks.ks.lsil.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id NKW2XJBP; Mon, 16 Jul 2001 17:27:04 -0500 Message-Id: <3B535C25.1EC01A0F@lsil.com> Date: Mon, 16 Jul 2001 17:27:01 -0400 From: Noah Romer Organization: LSI Logic X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.3 i686) X-Accept-Language: en MIME-Version: 1.0 To: Jeff Garzik CC: Noah Romer , netdev Subject: Re: hard_start_xmit behavior References: <3B5355AB.C5D62090@lsil.com> <3B536600.B6D09BFA@mandrakesoft.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2898 Lines: 62 Jeff Garzik wrote: > > Noah Romer wrote: > > I'm tracking down some obscure bugs in the mpt fusion network driver and am > > wondering about the correct behavior of hard_start_xmit implementations. > > Specifically, if the network stack gives you a packet to transmit and you > > immediately know you're not going to be able to do so (in my case, when I'm > > unable to get a message frame so I can talk with the board, a rare condition), > > what's the expected handling of the skb and what does the network stack want to > > see as the return value? > > > > The existing behavior of the driver is to call netif_stop_queu, call > > dev_kfree_skb, and then return a non-zero value. From the results I'm seeing, > > calling dev_kfree_skb isn't a good thing. Should I be placing the skb on an > > internal queue and retrying once I've got the resources to do so or just leave > > it alone and let the network stack deal with it? I haven't yet found anything > > to indicate that the return value of hard_start_xmit is checked for anything > > other than 0, does it make any diff what non-zero value is used? > > It sounds like the original author was a bit confused :) Yep. That would be me. Back when I wrote that section of code, I was pretty much confused about everything. :) > Is it possible to decide whether there is room to transmit more, after > you queue a packet? If so, the hard_start_xmit logic is > > ...send 1 skb to hardware... > if (no more tx buffers available) > netif_stop_queue Well, there's two limits I can run up against: the boards LAN Tx queue depth (128), and the driver's request frame queue depth (defaults to 256). When the board is being used for lan only, I'll always hit the Tx queue depth first, and I check for that, but there's two scenarios where I might be refused a request message frame: 1) The scsih host driver module is also loaded and in use (the board in question is a Fibre Channel board that supports concurent FCP/scsi and lan traffic) and has eaten up < 128 message frames or 2) we're in the middle of a diagnostic/hard reset and the message queues are stopped. In either of the latter two cases, the hard_start_xmit function won't know until it tries to get a message frame. So, I guess I'll be doing what you suggest below, which is actually what I'm doing now, minus the brain-dead call to dev_kfree_skb. > If it is not possible to use this scheme, then the driver's existing > logic simply needs to remove the dev_kfree_skb: > > if (no more tx buffers available) { > netif_stop_queue(dev); > return 1; > } > ...send 1 skb to hardware... > > If your driver supports fragmented skbs, then the latter logic is what > must be used. Thanks. -- Noah Romer Driver Developer, CM gopher and Linux Whipping Boy Storage Components Firmware LSI Logic Corp. From owner-netdev@oss.sgi.com Mon Jul 16 15:30:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GMUOS01856 for netdev-outgoing; Mon, 16 Jul 2001 15:30:24 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GMUNV01853 for ; Mon, 16 Jul 2001 15:30:23 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA16168; Mon, 16 Jul 2001 15:30:22 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15187.27389.948344.370461@pizda.ninka.net> Date: Mon, 16 Jul 2001 15:30:21 -0700 (PDT) To: Noah Romer Cc: netdev , "Romer, Noah" Subject: Re: hard_start_xmit behavior In-Reply-To: <3B5355AB.C5D62090@lsil.com> References: <3B5355AB.C5D62090@lsil.com> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 505 Lines: 18 Noah Romer writes: > Specifically, if the network stack gives you a packet to transmit and you > immediately know you're not going to be able to do so (in my case, when I'm > unable to get a message frame so I can talk with the board, a rare condition), > what's the expected handling of the skb and what does the network stack want to > see as the return value? If you cannot queue the packet then: netif_stop_queue(dev); return 1; And do nothing else. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Mon Jul 16 15:36:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6GManc02206 for netdev-outgoing; Mon, 16 Jul 2001 15:36:49 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6GMamV02203 for ; Mon, 16 Jul 2001 15:36:48 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id PAA16263; Mon, 16 Jul 2001 15:36:46 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15187.27774.725094.688515@pizda.ninka.net> Date: Mon, 16 Jul 2001 15:36:46 -0700 (PDT) To: Ben LaHaise Cc: Subject: Re: [PATCH] skb_{over,under}_panic optimizations In-Reply-To: References: X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 802 Lines: 24 Ben LaHaise writes: > Below is a patch that moves the skb_{over,under}_panic function calls in > skb_put and skb_push out of line on i386 and hopefully improves register > allocation. If this turns out to be useful, there are bunch of other > places we can move these kinds useful debugging assertions out of line. > Comments? I can't see how this can make all that much of a difference, as Andi has stated already. Especially on x86. I mean, GCC is going to generate a branch around the "call skb_foo_panic" code block, and that code block will likely sit in (at least mostly) it's own cache line. x86 code is so dense anyways, so this change can hardly make a difference. I'm willing to, and would happily be, proven wrong. Show me some numbers. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Tue Jul 17 01:33:29 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6H8XTd12768 for netdev-outgoing; Tue, 17 Jul 2001 01:33:29 -0700 Received: from stsl.siemens.com.tw (stsl.siemens.com.tw [192.72.45.189]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6H8XLV12764 for ; Tue, 17 Jul 2001 01:33:22 -0700 Received: from stslex.siemens.com.tw (stslex [192.72.45.13]) by stsl.siemens.com.tw (8.11.4/8.11.4) with ESMTP id f6H8kNX02864 for ; Tue, 17 Jul 2001 16:46:23 +0800 (CST) Received: by stslex.siemens.com.tw with Internet Mail Service (5.5.2650.21) id <3KMST67B>; Tue, 17 Jul 2001 16:31:23 +0800 Message-ID: <92C0C0AC8AE8D411864300105A835CBB5015E7@stslex.siemens.com.tw> From: Moter Du To: netdev@oss.sgi.com Subject: illegal source address & source address selection Date: Tue, 17 Jul 2001 16:30:55 +0800 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C10E9A.CC45BF40" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1654 Lines: 60 This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_001_01C10E9A.CC45BF40 Content-Type: text/plain; charset="BIG5" Hi, Current linux ipv6 source address selection doesn't check if an address is illegal as specified in RFC2373 and the draft 'source address selection (s.a.s.)'; - RFC2373: must not use multicast and anycast address as source. - I-D 's.a.s.': must not use 6to4, etc., as source. Is there already any patch to improve the checking? Thanks a lot. Regards, Moter ------_=_NextPart_001_01C10E9A.CC45BF40 Content-Type: text/html; charset="BIG5" Content-Transfer-Encoding: quoted-printable illegal source address & source address selection

Hi,

Current linux ipv6 source address selection doesn't = check if an address is illegal as specified in RFC2373 and the draft = 'source address selection (s.a.s.)';

- RFC2373: must not use multicast and anycast address = as source.
- I-D 's.a.s.': must not use 6to4, etc., as = source.

Is there already any patch to improve the = checking?  Thanks a lot.

Regards,
Moter

------_=_NextPart_001_01C10E9A.CC45BF40-- From owner-netdev@oss.sgi.com Tue Jul 17 02:23:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6H9N7c13669 for netdev-outgoing; Tue, 17 Jul 2001 02:23:07 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6H9N4V13666 for ; Tue, 17 Jul 2001 02:23:04 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6H9Mod26089; Tue, 17 Jul 2001 12:22:51 +0300 Date: Tue, 17 Jul 2001 12:22:50 +0300 (EEST) From: Pekka Savola To: Moter Du cc: Subject: Re: illegal source address & source address selection In-Reply-To: <92C0C0AC8AE8D411864300105A835CBB5015E7@stslex.siemens.com.tw> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 826 Lines: 23 On Tue, 17 Jul 2001, Moter Du wrote: > Current linux ipv6 source address selection doesn't check if an address is > illegal as specified in RFC2373 and the draft 'source address selection > (s.a.s.)'; > - RFC2373: must not use multicast and anycast address as source. If you configure a multicast address to an interface, unfortunately it is used, yes. > - I-D 's.a.s.': must not use 6to4, etc., as source. 6to4 is a legal source address. > Is there already any patch to improve the checking? Thanks a lot. Nope. Please submit one ;-). USAGI have made minor tweaks wrt. wrongly configured multicast recently AFAIR. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 17 13:49:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6HKnkK32050 for netdev-outgoing; Tue, 17 Jul 2001 13:49:46 -0700 Received: from zero.aec.at (qmailr@zero.aec.at [195.3.98.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6HKniV32044 for ; Tue, 17 Jul 2001 13:49:45 -0700 Received: (qmail 19443 invoked by uid 99); 17 Jul 2001 20:49:34 -0000 Received: from unknown (HELO fred.muc.de) (unknown) by unknown with SMTP; 17 Jul 2001 20:49:34 -0000 Received: by fred.muc.de (Postfix, from userid 500) id 79C1AE2D53; Tue, 17 Jul 2001 17:30:05 +0200 (CEST) Date: Tue, 17 Jul 2001 17:30:05 +0200 From: Andi Kleen To: Yann Dupont Cc: netdev@oss.sgi.com Subject: Re: dst cache overflow on 2.2.16 kernel. Message-ID: <20010717173005.A23851@fred.local> References: <3B52D73C.4060004@IPv6.univ-nantes.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 1.0.1i In-Reply-To: <3B52D73C.4060004@IPv6.univ-nantes.fr>; from Yann.Dupont@IPv6.univ-nantes.fr on Mon, Jul 16, 2001 at 01:59:56PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 693 Lines: 19 On Mon, Jul 16, 2001 at 01:59:56PM +0200, Yann Dupont wrote: > > Hello. We have a firewall here (Checkpoint FW 1), installed on a RH 6.2 > > Every week or so, the FW logs this error : dst cache overflow > and the routing stops. > > Is changing the value of /proc/sys/net/ipv4/route (actually set to 4096) > a way to prevent this ? OR is this a kernel bug with this 2.2.16 realease ? > > I CAN'T change the kernel, nor the distro, as the whole is under > contract ... and validated for this exact combination :( You can increase the /proc/sys/net/ipv4/route/gc_thresh sysctl trying to work around it, but likely it's a bug in the FW-1 kernel module. I would talk to Checkpoint. -Andi From owner-netdev@oss.sgi.com Tue Jul 17 18:19:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6I1J4S19337 for netdev-outgoing; Tue, 17 Jul 2001 18:19:04 -0700 Received: from netbank.com.br (IDENT:postfix@garrincha.netbank.com.br [200.203.199.88]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6I1J2V19334 for ; Tue, 17 Jul 2001 18:19:02 -0700 Received: from brinquedo.distro.conectiva (1-102.ctame701-2.telepar.net.br [200.181.138.102]) by netbank.com.br (Postfix) with ESMTP id E5FE346807; Tue, 17 Jul 2001 22:18:10 -0300 (BRST) Received: by brinquedo.distro.conectiva (Postfix, from userid 501) id 979FBCC25; Tue, 17 Jul 2001 22:18:57 -0300 (BRT) Date: Tue, 17 Jul 2001 22:18:57 -0300 From: Arnaldo Carvalho de Melo To: Steven Whitehouse Cc: "David S.Miller" , netdev@oss.sgi.com Subject: [PATCH + RFC] decnet fib s/rwlock/spinlock/g Message-ID: <20010717221857.H10373@conectiva.com.br> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i X-Url: http://advogato.org/person/acme Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1737 Lines: 61 Hi, Please take a look if this is worth applying, from a quick look it just uses write_lock and no read_lock, maybe its needed to protect some other list usage sections, but as it seems we can just use plain spinlocks for this. patch against 2.4.5, but it should apply to newer kernels, I think. - Arnaldo Index: net/decnet/dn_fib.c =================================================================== RCS file: /home/cvs/kernel-acme/net/decnet/dn_fib.c,v retrieving revision 1.1.1.1 diff -u -r1.1.1.1 dn_fib.c --- net/decnet/dn_fib.c 2001/06/26 17:29:34 1.1.1.1 +++ net/decnet/dn_fib.c 2001/07/18 01:13:00 @@ -56,7 +56,7 @@ static struct dn_fib_info *dn_fib_info_list; -static rwlock_t dn_fib_info_lock = RW_LOCK_UNLOCKED; +static spinlock_t dn_fib_info_lock = SPIN_LOCK_UNLOCKED; int dn_fib_info_cnt; static struct @@ -96,7 +96,7 @@ void dn_fib_release_info(struct dn_fib_info *fi) { - write_lock(&dn_fib_info_lock); + spin_lock(&dn_fib_info_lock); if (fi && --fi->fib_treeref == 0) { if (fi->fib_next) fi->fib_next->fib_prev = fi->fib_prev; @@ -107,7 +107,7 @@ fi->fib_dead = 1; dn_fib_info_put(fi); } - write_unlock(&dn_fib_info_lock); + spin_unlock(&dn_fib_info_lock); } static __inline__ int dn_fib_nh_comp(const struct dn_fib_info *fi, const struct dn_fib_info *ofi) @@ -345,14 +345,14 @@ fi->fib_treeref++; atomic_inc(&fi->fib_clntref); - write_lock(&dn_fib_info_lock); + spin_lock(&dn_fib_info_lock); fi->fib_next = dn_fib_info_list; fi->fib_prev = NULL; if (dn_fib_info_list) dn_fib_info_list->fib_prev = fi; dn_fib_info_list = fi; dn_fib_info_cnt++; - write_unlock(&dn_fib_info_lock); + spin_unlock(&dn_fib_info_lock); return fi; err_inval: From owner-netdev@oss.sgi.com Tue Jul 17 21:00:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6I40li27091 for netdev-outgoing; Tue, 17 Jul 2001 21:00:47 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6I40cV27083 for ; Tue, 17 Jul 2001 21:00:38 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id UAA09533; Tue, 17 Jul 2001 20:58:32 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15189.2408.59953.395204@pizda.ninka.net> Date: Tue, 17 Jul 2001 20:58:32 -0700 (PDT) To: Andrew Friedley Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, mostrows@styx.uwaterloo.ca, prefect_@gmx.net, moritz@chaosdorf.de, egger@suse.de, srwalter@yahoo.com, kuznet@ms2.inr.ac.ru, rusty@rustcorp.com.au Subject: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) In-Reply-To: <005f01c10e69$28273e60$0200a8c0@loki> References: <005f01c10e69$28273e60$0200a8c0@loki> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 10756 Lines: 446 Andrew Friedley writes: > >>EIP; c01d6fc3 <===== > Trace; c01d706b > Trace; c01d7653 > Trace; c01da0f7 > Trace; c01ddfed This report has been plagueing us for a month or two now, from different people. But we hadn't been able to track it down. Initially we believed it might be some obscure bug in netfilter which got triggered more easily when the zerocopy stuff went in. But all of our code audits turned up nothing. Then I began to notice that "pppoe" was showing up in all the reports where the user actually bothered to mention what net devices the machine was using when it crashed. I spent the past few days auditing the driver and I think I figured out what was causing the OOPS (along with some other bugs I found along the way). I have no way to test out these changes, so can folks do that for me? If I didn't screw something else up, then I'm pretty sure the OOPS will go away. Let me know if something goes wrong due to these changes. Thanks. --- drivers/net/pppoe.c.~1~ Mon Jul 9 17:52:06 2001 +++ drivers/net/pppoe.c Tue Jul 17 20:46:24 2001 @@ -5,7 +5,7 @@ * PPPoE --- PPP over Ethernet (RFC 2516) * * - * Version: 0.6.6 + * Version: 0.6.7 * * 030700 : Fixed connect logic to allow for disconnect. * 270700 : Fixed potential SMP problems; we must protect against @@ -20,10 +20,16 @@ * 111100 : Fix recvmsg. * 050101 : Fix PADT procesing. * 140501 : Use pppoe_rcv_core to handle all backlog. (Alexey) + * 170701 : Do not lock_sock with rwlock held. (DaveM) + * Ignore discovery frames if user has socket + * locked. (DaveM) + * Ignore return value of dev_queue_xmit in __pppoe_xmit + * or else we may kfree an SKB twice. (DaveM) * * Author: Michal Ostrowski * Contributors: * Arnaldo Carvalho de Melo + * David S. Miller (davem@redhat.com) * * License: * This program is free software; you can redistribute it and/or @@ -100,10 +106,11 @@ static int hash_item(unsigned long sid, unsigned char *addr) { - char hash=0; - int i,j; - for (i = 0; i < ETH_ALEN ; ++i){ - for (j = 0; j < 8/PPPOE_HASH_BITS ; ++j){ + char hash = 0; + int i, j; + + for (i = 0; i < ETH_ALEN ; ++i) { + for (j = 0; j < 8/PPPOE_HASH_BITS ; ++j) { hash ^= addr[i] >> ( j * PPPOE_HASH_BITS ); } } @@ -188,7 +195,7 @@ read_lock_bh(&pppoe_hash_lock); po = __get_item(sid, addr); - if(po) + if (po) sock_hold(po->sk); read_unlock_bh(&pppoe_hash_lock); @@ -233,62 +240,83 @@ * Certain device events require that sockets be unconnected. * **************************************************************************/ + +static void pppoe_flush_dev(struct net_device *dev) +{ + int hash; + + if (dev == NULL) + BUG(); + + read_lock_bh(&pppoe_hash_lock); + for (hash = 0; hash < PPPOE_HASH_SIZE; hash++) { + struct pppox_opt *po = item_hash_table[hash]; + + while (po != NULL) { + if (po->pppoe_dev == dev) { + struct sock *sk = po->sk; + + sock_hold(sk); + po->pppoe_dev = NULL; + + /* We hold a reference to SK, now drop the + * hash table lock so that we may attempt + * to lock the socket (which can sleep). + */ + read_unlock_bh(&pppoe_hash_lock); + + lock_sock(sk); + + if (sk->state & (PPPOX_CONNECTED | PPPOX_BOUND)) { + pppox_unbind_sock(sk); + dev_put(dev); + sk->state = PPPOX_DEAD; + sk->state_change(sk); + } + + release_sock(sk); + + sock_put(sk); + + read_lock_bh(&pppoe_hash_lock); + + /* Now restart from the beginning of this + * hash chain. We always NULL out pppoe_dev + * so we are guarenteed to make forward + * progress. + */ + po = item_hash_table[hash]; + continue; + } + po = po->next; + } + } + read_unlock_bh(&pppoe_hash_lock); +} + static int pppoe_device_event(struct notifier_block *this, unsigned long event, void *ptr) { - int error = NOTIFY_DONE; struct net_device *dev = (struct net_device *) ptr; - struct pppox_opt *po = NULL; - int hash = 0; /* Only look at sockets that are using this specific device. */ switch (event) { case NETDEV_CHANGEMTU: - /* A change in mtu is a bad thing, requiring - * LCP re-negotiation. - */ + /* A change in mtu is a bad thing, requiring + * LCP re-negotiation. + */ + case NETDEV_GOING_DOWN: case NETDEV_DOWN: - /* Find every socket on this device and kill it. */ - read_lock_bh(&pppoe_hash_lock); - - while (!po && hash < PPPOE_HASH_SIZE){ - po = item_hash_table[hash]; - ++hash; - } - - while (po && hash < PPPOE_HASH_SIZE){ - if(po->pppoe_dev == dev){ - lock_sock(po->sk); - if (po->sk->state & (PPPOX_CONNECTED|PPPOX_BOUND)){ - pppox_unbind_sock(po->sk); - - dev_put(po->pppoe_dev); - po->pppoe_dev = NULL; - - po->sk->state = PPPOX_DEAD; - po->sk->state_change(po->sk); - } - release_sock(po->sk); - } - if (po->next) { - po = po->next; - } else { - po = NULL; - while (!po && hash < PPPOE_HASH_SIZE){ - po = item_hash_table[hash]; - ++hash; - } - } - } - read_unlock_bh(&pppoe_hash_lock); + pppoe_flush_dev(dev); break; + default: break; }; - return error; + return NOTIFY_DONE; } @@ -304,40 +332,39 @@ * Do the real work of receiving a PPPoE Session frame. * ***********************************************************************/ -int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb){ - struct pppox_opt *po=sk->protinfo.pppox; +int pppoe_rcv_core(struct sock *sk, struct sk_buff *skb) +{ + struct pppox_opt *po = sk->protinfo.pppox; struct pppox_opt *relay_po = NULL; if (sk->state & PPPOX_BOUND) { skb_pull(skb, sizeof(struct pppoe_hdr)); - ppp_input(&po->chan, skb); - } else if( sk->state & PPPOX_RELAY ){ - - relay_po = get_item_by_addr( &po->pppoe_relay ); + } else if (sk->state & PPPOX_RELAY) { + relay_po = get_item_by_addr(&po->pppoe_relay); - if( relay_po == NULL || - !( relay_po->sk->state & PPPOX_CONNECTED ) ){ - goto abort; - } + if (relay_po == NULL) + goto abort_kfree; + + if ((relay_po->sk->state & PPPOX_CONNECTED) == 0) + goto abort_put; skb_pull(skb, sizeof(struct pppoe_hdr)); - if( !__pppoe_xmit( relay_po->sk , skb) ){ - goto abort; - } + if (!__pppoe_xmit( relay_po->sk , skb)) + goto abort_put; } else { sock_queue_rcv_skb(sk, skb); } - return 1; -abort: - if(relay_po) - sock_put(relay_po->sk); - return 0; - -} + return NET_RX_SUCCESS; +abort_put: + sock_put(relay_po->sk); +abort_kfree: + kfree_skb(skb); + return NET_RX_DROP; +} /************************************************************************ * @@ -356,24 +383,25 @@ po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source); - if(!po){ + if (!po) { kfree_skb(skb); - return 0; + return NET_RX_DROP; } sk = po->sk; bh_lock_sock(sk); /* Socket state is unknown, must put skb into backlog. */ - if( sk->lock.users != 0 ){ - sk_add_backlog( sk, skb); - ret = 1; - }else{ + if (sk->lock.users != 0) { + sk_add_backlog(sk, skb); + ret = NET_RX_SUCCESS; + } else { ret = pppoe_rcv_core(sk, skb); } bh_unlock_sock(sk); sock_put(sk); + return ret; } @@ -390,24 +418,31 @@ { struct pppoe_hdr *ph = (struct pppoe_hdr *) skb->nh.raw; struct pppox_opt *po; - struct sock *sk = NULL; if (ph->code != PADT_CODE) goto abort; po = get_item((unsigned long) ph->sid, skb->mac.ethernet->h_source); + if (po) { + struct sock *sk = po->sk; - if (!po) - goto abort; + bh_lock_sock(sk); - sk = po->sk; + /* If the user has locked the socket, just ignore + * the packet. With the way two rcv protocols hook into + * one socket family type, we cannot (easily) distinguish + * what kind of SKB it is during backlog rcv. + */ + if (sk->lock.users == 0) + pppox_unbind_sock(sk); - pppox_unbind_sock(sk); + bh_unlock_sock(sk); + sock_put(sk); + } - sock_put(sk); - abort: +abort: kfree_skb(skb); - return 0; + return NET_RX_SUCCESS; /* Lies... :-) */ } struct packet_type pppoes_ptype = { @@ -524,7 +559,7 @@ struct sock *sk = sock->sk; struct net_device *dev = NULL; struct sockaddr_pppox *sp = (struct sockaddr_pppox *) uservaddr; - struct pppox_opt *po=sk->protinfo.pppox; + struct pppox_opt *po = sk->protinfo.pppox; int error; lock_sock(sk); @@ -569,8 +604,9 @@ po->pppoe_dev = dev; - if( ! (dev->flags & IFF_UP) ) + if (!(dev->flags & IFF_UP)) goto err_put; + memcpy(&po->pppoe_pa, &sp->sa_addr.pppoe, sizeof(struct pppoe_addr)); @@ -687,7 +723,7 @@ /* PPPoE address from the user specifies an outbound PPPoE address to which frames are forwarded to */ err = -EFAULT; - if( copy_from_user(&po->pppoe_relay, + if (copy_from_user(&po->pppoe_relay, (void*)arg, sizeof(struct sockaddr_pppox))) break; @@ -752,7 +788,7 @@ dev = sk->protinfo.pppox->pppoe_dev; error = -EMSGSIZE; - if(total_len > dev->mtu+dev->hard_header_len) + if (total_len > (dev->mtu + dev->hard_header_len)) goto end; @@ -775,7 +811,7 @@ ph = (struct pppoe_hdr *) skb_put(skb, total_len + sizeof(struct pppoe_hdr)); start = (char *) &ph->tag[0]; - error = memcpy_fromiovec( start, m->msg_iov, total_len); + error = memcpy_fromiovec(start, m->msg_iov, total_len); if (error < 0) { kfree_skb(skb); @@ -793,7 +829,7 @@ dev_queue_xmit(skb); - end: +end: release_sock(sk); return error; } @@ -811,9 +847,8 @@ int headroom = skb_headroom(skb); int data_len = skb->len; - if (sk->dead || !(sk->state & PPPOX_CONNECTED)) { + if (sk->dead || !(sk->state & PPPOX_CONNECTED)) goto abort; - } hdr.ver = 1; hdr.type = 1; @@ -821,9 +856,8 @@ hdr.sid = sk->num; hdr.length = htons(skb->len); - if (!dev) { + if (!dev) goto abort; - } /* Copy the skb if there is no space for the header. */ if (headroom < (sizeof(struct pppoe_hdr) + dev->hard_header_len)) { @@ -844,6 +878,12 @@ skb = skb2; } + /* We may not return error beyond this point. Potentially this + * is a new SKB and in such a case we've already freed the + * original SKB. So if we were to return error, our caller would + * double free that original SKB. -DaveM + */ + ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr)); memcpy(ph, &hdr, sizeof(struct pppoe_hdr)); skb->protocol = __constant_htons(ETH_P_PPP_SES); @@ -856,11 +896,10 @@ sk->protinfo.pppox->pppoe_pa.remote, NULL, data_len); - if (dev_queue_xmit(skb) < 0) - goto abort; - + dev_queue_xmit(skb); return 1; - abort: + +abort: return 0; } From owner-netdev@oss.sgi.com Wed Jul 18 01:11:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6I8B5P11285 for netdev-outgoing; Wed, 18 Jul 2001 01:11:05 -0700 Received: from gw.chygwyn.com (IDENT:root@gw.chygwyn.com [62.172.158.50]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6I8B2V11267 for ; Wed, 18 Jul 2001 01:11:02 -0700 Received: (from steve@localhost) by gw.chygwyn.com (8.9.3/8.9.3) id JAA13666; Wed, 18 Jul 2001 09:20:47 +0100 From: Steven Whitehouse Message-Id: <200107180820.JAA13666@gw.chygwyn.com> Subject: Re: [PATCH + RFC] decnet fib s/rwlock/spinlock/g To: acme@conectiva.com.br (Arnaldo Carvalho de Melo) Date: Wed, 18 Jul 2001 09:20:47 +0100 (BST) Cc: davem@redhat.com (David S.Miller), netdev@oss.sgi.com In-Reply-To: <20010717221857.H10373@conectiva.com.br> from "Arnaldo Carvalho de Melo" at Jul 17, 2001 10:18:57 PM Organization: ChyGywn Limited X-RegisteredOffice: 7, New Yatt Road, Witney, Oxfordshire. OX28 1NU England X-RegisteredNumber: 03887683 Reply-To: Steve Whitehouse X-Mailer: ELM [version 2.5 PL1] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2108 Lines: 73 Hi, I think thats probably Ok. I'm sure it landed up with a write_lock() to match the IP functions, but I don't think DECnet has a use for a version of ip_fib_check_default() so there is no reason not to make it a normal spinlock, Steve. > > Hi, > > Please take a look if this is worth applying, from a quick look it > just uses write_lock and no read_lock, maybe its needed to protect some > other list usage sections, but as it seems we can just use plain spinlocks > for this. patch against 2.4.5, but it should apply to newer kernels, I > think. > > - Arnaldo > > Index: net/decnet/dn_fib.c > =================================================================== > RCS file: /home/cvs/kernel-acme/net/decnet/dn_fib.c,v > retrieving revision 1.1.1.1 > diff -u -r1.1.1.1 dn_fib.c > --- net/decnet/dn_fib.c 2001/06/26 17:29:34 1.1.1.1 > +++ net/decnet/dn_fib.c 2001/07/18 01:13:00 > @@ -56,7 +56,7 @@ > > > static struct dn_fib_info *dn_fib_info_list; > -static rwlock_t dn_fib_info_lock = RW_LOCK_UNLOCKED; > +static spinlock_t dn_fib_info_lock = SPIN_LOCK_UNLOCKED; > int dn_fib_info_cnt; > > static struct > @@ -96,7 +96,7 @@ > > void dn_fib_release_info(struct dn_fib_info *fi) > { > - write_lock(&dn_fib_info_lock); > + spin_lock(&dn_fib_info_lock); > if (fi && --fi->fib_treeref == 0) { > if (fi->fib_next) > fi->fib_next->fib_prev = fi->fib_prev; > @@ -107,7 +107,7 @@ > fi->fib_dead = 1; > dn_fib_info_put(fi); > } > - write_unlock(&dn_fib_info_lock); > + spin_unlock(&dn_fib_info_lock); > } > > static __inline__ int dn_fib_nh_comp(const struct dn_fib_info *fi, const struct dn_fib_info *ofi) > @@ -345,14 +345,14 @@ > > fi->fib_treeref++; > atomic_inc(&fi->fib_clntref); > - write_lock(&dn_fib_info_lock); > + spin_lock(&dn_fib_info_lock); > fi->fib_next = dn_fib_info_list; > fi->fib_prev = NULL; > if (dn_fib_info_list) > dn_fib_info_list->fib_prev = fi; > dn_fib_info_list = fi; > dn_fib_info_cnt++; > - write_unlock(&dn_fib_info_lock); > + spin_unlock(&dn_fib_info_lock); > return fi; > > err_inval: > From owner-netdev@oss.sgi.com Wed Jul 18 07:23:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6IENxX10864 for netdev-outgoing; Wed, 18 Jul 2001 07:23:59 -0700 Received: from igw3.watson.ibm.com (igw3.watson.ibm.com [198.81.209.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6IENwV10860 for ; Wed, 18 Jul 2001 07:23:58 -0700 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.4/8.11.4) with ESMTP id f6IENC710820; Wed, 18 Jul 2001 10:23:12 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id KAA19750; Wed, 18 Jul 2001 10:23:11 -0400 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.245.153]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with ESMTP id KAA74194; Wed, 18 Jul 2001 10:23:08 -0400 Reply-To: mostrows@speakeasy.net To: "David S. Miller" Cc: Andrew Friedley , linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, prefect_@gmx.net, moritz@chaosdorf.de, egger@suse.de, srwalter@yahoo.com, kuznet@ms2.inr.ac.ru, rusty@rustcorp.com.au Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) References: <005f01c10e69$28273e60$0200a8c0@loki> <15189.2408.59953.395204@pizda.ninka.net> From: Michal Ostrowski Date: 18 Jul 2001 10:23:36 -0400 In-Reply-To: <15189.2408.59953.395204@pizda.ninka.net> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Copyleft) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1964 Lines: 46 "David S. Miller" writes: > > This report has been plagueing us for a month or two now, from > different people. But we hadn't been able to track it down. > > Initially we believed it might be some obscure bug in netfilter > which got triggered more easily when the zerocopy stuff went in. > But all of our code audits turned up nothing. This brings up an interesting point. One of the initial issues in developing PPPoE support was how to ensure that the layers passing packets to PPPoE allocated the correct amount of header space in the skb (so as to avoid a copy of the skb to create space for PPPoE headers). This issue was resolved by having the PPP layer respect the header lengths specified by the underlying transport layers (PPPoE) when defining dev->hard_header_len. However, just to be on the safe side, this code that copied the packet was left in place. My guess is that before zerocopy netfilter, netfilter made an skb that conformed to the header requirements of PPPoE. Once netfilter stopped making copies PPPoE was passed skb's without space for PPPoE headers and thus invoked the code path you've just fixed. Is it possible for netfilter to pass to the PPP device a packet that respect's the PPP device's hard_header_len? (This would avoid the copy in PPPoE.) More generally, I'm concerned (without having seen the code) that we may have problems when passing skb's between devices via zerocopy-netfilter when those devices have varying hard_header_len requirements. > > I have no way to test out these changes, so can folks do that for me? > If I didn't screw something else up, then I'm pretty sure the OOPS > will go away. Let me know if something goes wrong due to these > changes. > As far as I can tell it all seems fine. (I'd like to hear some success stories from some of the people using netfilter heavily with this though who have experienced the oops'es.) Michal Ostrowski mostrows@speakeasy.net From owner-netdev@oss.sgi.com Wed Jul 18 18:40:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6J1eOS22743 for netdev-outgoing; Wed, 18 Jul 2001 18:40:24 -0700 Received: from grok.yi.org (IDENT:root@cx97923-a.phnx3.az.home.com [24.9.112.194]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6J1eNV22740 for ; Wed, 18 Jul 2001 18:40:23 -0700 Received: from candelatech.com (IDENT:greear@localhost.localdomain [127.0.0.1]) by grok.yi.org (8.11.2/8.11.2) with ESMTP id f6J1eJ918779; Wed, 18 Jul 2001 18:40:20 -0700 Message-ID: <3B563A83.3D4A7773@candelatech.com> Date: Wed, 18 Jul 2001 18:40:19 -0700 From: Ben Greear Organization: Candela Technologies X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: eepro list , "netdev@oss.sgi.com" Subject: PCMCIA adapter suggestions (need good performance & stability) Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 609 Lines: 15 Does anyone have any suggestions for a good PCMCIA 10/100 full-duplex adapter for Linux 2.4 kernel? I need to fit two into a machine, so they have to be the dongle kind.... I bought two cheap D-LINK ones, but they seem to crap out after about 5 minutes of continious 1.54Mbps traffic. I remember seeing some Netgear ones too, anyone have any experience with them? Thanks for your time, Ben -- Ben Greear President of Candela Technologies Inc http://www.candelatech.com ScryMUD: http://scry.wanfear.com http://scry.wanfear.com/~greear From owner-netdev@oss.sgi.com Thu Jul 19 05:31:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JCVB200665 for netdev-outgoing; Thu, 19 Jul 2001 05:31:11 -0700 Received: from igw3.watson.ibm.com (igw3.watson.ibm.com [198.81.209.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JCV7V00658 for ; Thu, 19 Jul 2001 05:31:07 -0700 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.4/8.11.4) with ESMTP id f6JCUB708636; Thu, 19 Jul 2001 08:30:11 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id IAA19322; Thu, 19 Jul 2001 08:30:11 -0400 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.244.235]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with ESMTP id IAA80240; Thu, 19 Jul 2001 08:30:08 -0400 Reply-To: mostrows@speakeasy.net To: "David S. Miller" Cc: Andrew Friedley , linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, prefect_@gmx.net, moritz@chaosdorf.de, egger@suse.de, srwalter@yahoo.com, kuznet@ms2.inr.ac.ru, rusty@rustcorp.com.au Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) References: <005f01c10e69$28273e60$0200a8c0@loki> <15189.2408.59953.395204@pizda.ninka.net> From: Michal Ostrowski Date: 19 Jul 2001 08:30:37 -0400 In-Reply-To: <15189.2408.59953.395204@pizda.ninka.net> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Copyleft) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3893 Lines: 131 After sleeping on it a bit I've come to the realization that are still some issues outstanding. Essentially, if __pppoe_xmit has been forced to make a copy of the skb (because netfilter did not leave enough room for PPPoE headers), and dev_queue_xmit fails, the copy of the skb is not freed and we have a memory leak. The generic PPP layer assumes that if a PPP-channel's xmit routine fails then the skb is still available to it (for retransmission) and otherwise the skb is gone -- handled by the channel. These are the semantics that must be implemented by __pppoe_xmit. The patch below (which requires David Miller's patch from 17/07/01) implements these semantics. Michal Ostrowski mostrows@speakeasy.net --- drivers/net/pppoe.c~ Wed Jul 18 10:24:25 2001 +++ drivers/net/pppoe.c Thu Jul 19 08:28:56 2001 @@ -5,7 +5,7 @@ * PPPoE --- PPP over Ethernet (RFC 2516) * * - * Version: 0.6.7 + * Version: 0.6.8 * * 030700 : Fixed connect logic to allow for disconnect. * 270700 : Fixed potential SMP problems; we must protect against @@ -25,8 +25,12 @@ * locked. (DaveM) * Ignore return value of dev_queue_xmit in __pppoe_xmit * or else we may kfree an SKB twice. (DaveM) + * 190701 : When doing copies of skb's in __pppoe_xmit, always delete + * the original skb that was passed in on success, never on + * failure. Delete the copy of the skb on failure to avoid + * a memory leak. * - * Author: Michal Ostrowski + * Author: Michal Ostrowski * Contributors: * Arnaldo Carvalho de Melo * David S. Miller (davem@redhat.com) @@ -849,6 +853,7 @@ struct pppoe_hdr *ph; int headroom = skb_headroom(skb); int data_len = skb->len; + struct sk_buff *old_skb = NULL; if (sk->dead || !(sk->state & PPPOX_CONNECTED)) goto abort; @@ -876,17 +881,12 @@ skb_reserve(skb2, dev->hard_header_len + sizeof(struct pppoe_hdr)); memcpy(skb_put(skb2, skb->len), skb->data, skb->len); - skb_unlink(skb); - kfree_skb(skb); + /* Keep a reference to the original */ + old_skb = skb; + skb = skb2; } - /* We may not return error beyond this point. Potentially this - * is a new SKB and in such a case we've already freed the - * original SKB. So if we were to return error, our caller would - * double free that original SKB. -DaveM - */ - ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr)); memcpy(ph, &hdr, sizeof(struct pppoe_hdr)); skb->protocol = __constant_htons(ETH_P_PPP_SES); @@ -899,7 +899,34 @@ sk->protinfo.pppox->pppoe_pa.remote, NULL, data_len); - dev_queue_xmit(skb); + /* The skb we are to transmit may be a copy (see above). If + * this fails, then the caller is responsible for the original + * skb, otherwise we must free it. Also if this fails we must + * free the copy that we made. + */ + + if (dev_queue_xmit(skb)<0) { + if (old_skb) { + /* The skb we tried to send was a copy. We + * have to free it (the copy) and let the + * caller deal with the original one. + */ + skb_unlink(skb); + kfree_skb(skb); + } + goto abort; + } + + /* Free original skb now that we know dev_queue_xmit + * succeeded. We free only "old_skb" because dev_queue_xmit + * actually sent a copy, not the original. + */ + + if (old_skb) { + skb_unlink(old_skb); + kfree_skb(old_skb); + } + return 1; abort: @@ -1049,7 +1076,6 @@ int err = register_pppox_proto(PX_PROTO_OE, &pppoe_proto); if (err == 0) { - printk(KERN_INFO "Registered PPPoE v0.6.5\n"); dev_add_pack(&pppoes_ptype); dev_add_pack(&pppoed_ptype); --- drivers/net/pppox.c~ Tue Feb 13 16:15:05 2001 +++ drivers/net/pppox.c Wed Jul 18 10:27:25 2001 @@ -148,10 +148,6 @@ err = sock_register(&pppox_proto_family); - if (err == 0) { - printk(KERN_INFO "Registered PPPoX v0.5\n"); - } - return err; } From owner-netdev@oss.sgi.com Thu Jul 19 10:28:16 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JHSGa15665 for netdev-outgoing; Thu, 19 Jul 2001 10:28:16 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JHRrV15560 for ; Thu, 19 Jul 2001 10:27:53 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA30738; Thu, 19 Jul 2001 21:27:05 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107191727.VAA30738@ms2.inr.ac.ru> Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) To: mostrows@speakeasy.net Date: Thu, 19 Jul 2001 21:27:05 +0400 (MSK DST) Cc: davem@redhat.com, saai@swbell.net, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, prefect_@gmx.net, moritz@chaosdorf.de, egger@suse.de, srwalter@yahoo.com, rusty@rustcorp.com.au In-Reply-To: from "Michal Ostrowski" at Jul 19, 1 08:30:37 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 830 Lines: 31 Hello! SOme short comment on the patch: > - dev_queue_xmit(skb); > + /* The skb we are to transmit may be a copy (see above). If > + * this fails, then the caller is responsible for the original > + * skb, otherwise we must free it. Also if this fails we must > + * free the copy that we made. > + */ > + > + if (dev_queue_xmit(skb)<0) { dev_queue_xmit _frees_ frame, not depending on return value. Return value is not a criterium to assume anything. > + if (old_skb) { > + /* The skb we tried to send was a copy. We > + * have to free it (the copy) and let the > + * caller deal with the original one. > + */ > + skb_unlink(skb); So, do you pass to dev_queue_xmit some skb, which is on some list? Not a good idea. Please, clone it and submit clone, if you need to hold original in some list. Alexey From owner-netdev@oss.sgi.com Thu Jul 19 11:00:40 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JI0eb25177 for netdev-outgoing; Thu, 19 Jul 2001 11:00:40 -0700 Received: from igw3.watson.ibm.com (igw3.watson.ibm.com [198.81.209.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JI0cV25166 for ; Thu, 19 Jul 2001 11:00:38 -0700 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.4/8.11.4) with ESMTP id f6JI0W709158; Thu, 19 Jul 2001 14:00:32 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id OAA10448; Thu, 19 Jul 2001 14:00:32 -0400 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.244.235]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with ESMTP id OAA53926; Thu, 19 Jul 2001 14:00:30 -0400 Reply-To: mostrows@speakeasy.net To: kuznet@ms2.inr.ac.ru Cc: davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) References: <200107191727.VAA30738@ms2.inr.ac.ru> From: Michal Ostrowski Date: 19 Jul 2001 14:00:54 -0400 In-Reply-To: <200107191727.VAA30738@ms2.inr.ac.ru> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Copyleft) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1378 Lines: 39 kuznet@ms2.inr.ac.ru writes: > Hello! > > SOme short comment on the patch: > > > > - dev_queue_xmit(skb); > > + /* The skb we are to transmit may be a copy (see above). If > > + * this fails, then the caller is responsible for the original > > + * skb, otherwise we must free it. Also if this fails we must > > + * free the copy that we made. > > + */ > > + > > + if (dev_queue_xmit(skb)<0) { > > dev_queue_xmit _frees_ frame, not depending on return value. > Return value is not a criterium to assume anything. > My mistake. It seemed perfectly reasonable at 6:00 am. :-) However, could we not have dev_queue_xmit behave as such (not free frame on failure)? That is, could we extend dev_queue_xmit to tell it (optionally) that we want the skb back in case of failure? dev_queue_xmit unconditionally frees the skb in any failure mode, which is I would venture to say that we could do this. The reason why I'm proposing this is that ppp_generic.c assumes that the skb is still available after a transmission failure via pppoe. To support the semantics of dev_queue_xmit and ppp_generic we would have to always copy skb's inside pppoe_xmit. Then, if dev_queue_xmit fails the original is deleted. In the common case dev_queue_xmit will not fail, and so in that case I'd like to have to avoid making a copy of the skb. Michal Ostrowski mostrows@speakeasy.net From owner-netdev@oss.sgi.com Thu Jul 19 11:39:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JIdZR05540 for netdev-outgoing; Thu, 19 Jul 2001 11:39:35 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JIdYV05533 for ; Thu, 19 Jul 2001 11:39:34 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via SMTP id LAA06883 for ; Thu, 19 Jul 2001 11:39:17 -0700 (PDT) mail_from (kuznet@ms2.inr.ac.ru) From: kuznet@ms2.inr.ac.ru Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA32355; Thu, 19 Jul 2001 22:17:52 +0400 Message-Id: <200107191817.WAA32355@ms2.inr.ac.ru> Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) To: mostrows@speakeasy.net Date: Thu, 19 Jul 2001 22:17:52 +0400 (MSK DST) Cc: davem@redhat.com, linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com In-Reply-To: from "Michal Ostrowski" at Jul 19, 1 02:00:54 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 829 Lines: 24 Hello! > However, could we not have dev_queue_xmit behave as such (not free > frame on failure)? If you need to hold original skb, you may hold its refcnt. However, this feature inevitably results in big troubles: dev_queue_xmit() is allowed to change skb and you cannot assume anything about this. So that resuing skb dev_queue_xmit() is fatal bug not depending on anything. > The reason why I'm proposing this is that ppp_generic.c assumes that > the skb is still available after a transmission failure via pppoe. To > support the semantics of dev_queue_xmit and ppp_generic we would have > to always copy skb's inside pppoe_xmit. Then, if dev_queue_xmit fails > the original is deleted. You need not copy it. I said "clone". Nobody is allowed to touch _data_ part of skb, so that you need not to copy data. Alexey From owner-netdev@oss.sgi.com Thu Jul 19 11:57:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JIvae11423 for netdev-outgoing; Thu, 19 Jul 2001 11:57:36 -0700 Received: from igw3.watson.ibm.com (igw3.watson.ibm.com [198.81.209.18]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JIvXV11403 for ; Thu, 19 Jul 2001 11:57:33 -0700 Received: from sp1n189at0.watson.ibm.com (sp1n189at0.watson.ibm.com [9.2.104.62]) by igw3.watson.ibm.com (8.11.4/8.11.4) with ESMTP id f6JIus709210; Thu, 19 Jul 2001 14:56:55 -0400 Received: from kitch0.watson.ibm.com (kitch0.watson.ibm.com [9.2.251.57]) by sp1n189at0.watson.ibm.com (8.9.3/Feb-20-98) with ESMTP id OAA18280; Thu, 19 Jul 2001 14:56:55 -0400 Received: from slug.watson.ibm.com (slug.watson.ibm.com [9.2.244.235]) by kitch0.watson.ibm.com (AIX4.3/8.9.3/8.9.3/01-10-2000) with ESMTP id OAA36364; Thu, 19 Jul 2001 14:56:53 -0400 Reply-To: mostrows@us.ibm.com To: "David S. Miller" Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) References: <005f01c10e69$28273e60$0200a8c0@loki> <15189.2408.59953.395204@pizda.ninka.net> From: Michal Ostrowski Date: 19 Jul 2001 14:57:21 -0400 In-Reply-To: <15189.2408.59953.395204@pizda.ninka.net> Message-ID: User-Agent: Gnus/5.0808 (Gnus v5.8.8) XEmacs/21.4 (Copyleft) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4101 Lines: 137 Alexey replied to my last post with some valuable comments and in response I have a new patch (that goes on top of David Miller's patch from yesterday). The approach here is that in case we don't have room in the skb for PPPoE headers, we create a new one (skb2) and copy the entire thing. If we do have space, we clone it. We always give dev_queue_xmit the copy/clone pointer and let it free it. We then kfree_skb the original skb depending on the return value of dev_queue_xmit (to conform to the expectations of ppp_generic). Michal Ostrowski mostrows@speakeasy.net --- drivers/net/pppoe.c~ Wed Jul 18 10:24:25 2001 +++ drivers/net/pppoe.c Thu Jul 19 14:49:36 2001 @@ -5,7 +5,7 @@ * PPPoE --- PPP over Ethernet (RFC 2516) * * - * Version: 0.6.7 + * Version: 0.6.8 * * 030700 : Fixed connect logic to allow for disconnect. * 270700 : Fixed potential SMP problems; we must protect against @@ -25,8 +25,12 @@ * locked. (DaveM) * Ignore return value of dev_queue_xmit in __pppoe_xmit * or else we may kfree an SKB twice. (DaveM) + * 190701 : When doing copies of skb's in __pppoe_xmit, always delete + * the original skb that was passed in on success, never on + * failure. Delete the copy of the skb on failure to avoid + * a memory leak. * - * Author: Michal Ostrowski + * Author: Michal Ostrowski * Contributors: * Arnaldo Carvalho de Melo * David S. Miller (davem@redhat.com) @@ -837,6 +841,7 @@ return error; } + /************************************************************************ * * xmit function for internal use. @@ -849,6 +854,7 @@ struct pppoe_hdr *ph; int headroom = skb_headroom(skb); int data_len = skb->len; + struct sk_buff *skb2; if (sk->dead || !(sk->state & PPPOX_CONNECTED)) goto abort; @@ -864,7 +870,6 @@ /* Copy the skb if there is no space for the header. */ if (headroom < (sizeof(struct pppoe_hdr) + dev->hard_header_len)) { - struct sk_buff *skb2; skb2 = dev_alloc_skb(32+skb->len + sizeof(struct pppoe_hdr) + @@ -876,30 +881,36 @@ skb_reserve(skb2, dev->hard_header_len + sizeof(struct pppoe_hdr)); memcpy(skb_put(skb2, skb->len), skb->data, skb->len); - skb_unlink(skb); - kfree_skb(skb); - skb = skb2; + } else { + /* Make a clone so as to not disturb the original skb, + * give dev_queue_xmit something it can free. + */ + skb2 = skb_clone(skb, GFP_ATOMIC); } - /* We may not return error beyond this point. Potentially this - * is a new SKB and in such a case we've already freed the - * original SKB. So if we were to return error, our caller would - * double free that original SKB. -DaveM - */ - - ph = (struct pppoe_hdr *) skb_push(skb, sizeof(struct pppoe_hdr)); + ph = (struct pppoe_hdr *) skb_push(skb2, sizeof(struct pppoe_hdr)); memcpy(ph, &hdr, sizeof(struct pppoe_hdr)); - skb->protocol = __constant_htons(ETH_P_PPP_SES); + skb2->protocol = __constant_htons(ETH_P_PPP_SES); - skb->nh.raw = skb->data; + skb2->nh.raw = skb2->data; - skb->dev = dev; + skb2->dev = dev; - dev->hard_header(skb, dev, ETH_P_PPP_SES, + dev->hard_header(skb2, dev, ETH_P_PPP_SES, sk->protinfo.pppox->pppoe_pa.remote, NULL, data_len); - dev_queue_xmit(skb); + /* We're transmitting skb2, and assuming that dev_queue_xmit + * will free it. The generic ppp layer however, is expecting + * that we give back the skb in case of failure, + * but free it in case of success. + */ + + if (dev_queue_xmit(skb2)<0) { + goto abort; + } + + kfree_skb(skb); return 1; abort: @@ -1049,7 +1060,6 @@ int err = register_pppox_proto(PX_PROTO_OE, &pppoe_proto); if (err == 0) { - printk(KERN_INFO "Registered PPPoE v0.6.5\n"); dev_add_pack(&pppoes_ptype); dev_add_pack(&pppoed_ptype); --- drivers/net/pppox.c~ Tue Feb 13 16:15:05 2001 +++ drivers/net/pppox.c Wed Jul 18 10:27:25 2001 @@ -148,10 +148,6 @@ err = sock_register(&pppox_proto_family); - if (err == 0) { - printk(KERN_INFO "Registered PPPoX v0.5\n"); - } - return err; } From owner-netdev@oss.sgi.com Thu Jul 19 16:13:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6JNDuM27796 for netdev-outgoing; Thu, 19 Jul 2001 16:13:56 -0700 Received: from pizda.ninka.net (IDENT:root@pizda.ninka.net [216.101.162.242]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6JNDsV27793 for ; Thu, 19 Jul 2001 16:13:55 -0700 Received: (from davem@localhost) by pizda.ninka.net (8.9.3/8.9.3) id QAA26244; Thu, 19 Jul 2001 16:13:03 -0700 From: "David S. Miller" MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-ID: <15191.27007.837441.266822@pizda.ninka.net> Date: Thu, 19 Jul 2001 16:13:03 -0700 (PDT) To: mostrows@us.ibm.com Cc: linux-kernel@vger.kernel.org, linux-net@vger.kernel.org, netdev@oss.sgi.com, kuznet@ms2.inr.ac.ru Subject: Re: [PATCH] PPPOE can kfree SKB twice (was Re: kernel panic problem. (smp, iptables?)) In-Reply-To: References: <005f01c10e69$28273e60$0200a8c0@loki> <15189.2408.59953.395204@pizda.ninka.net> X-Mailer: VM 6.75 under 21.1 (patch 13) "Crater Lake" XEmacs Lucid Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 258 Lines: 11 Michal Ostrowski writes: > Alexey replied to my last post with some valuable comments and in > response I have a new patch (that goes on top of David Miller's patch > from yesterday). Applied to my tree, thanks. Later, David S. Miller davem@redhat.com From owner-netdev@oss.sgi.com Fri Jul 20 04:30:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6KBUSF17426 for netdev-outgoing; Fri, 20 Jul 2001 04:30:28 -0700 Received: from c0mailgw10.prontomail.com (mailgw.prontomail.com [216.163.180.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6KBURV17415 for ; Fri, 20 Jul 2001 04:30:27 -0700 Received: from c6web109 (216.163.178.10) by c0mailgw10.prontomail.com (NPlex 5.5.029) id 3B56F87D00024970; Fri, 20 Jul 2001 04:24:18 -0700 X-Version: beer 6.3.3353.0 From: "william fitzgerald" Message-Id: <27403D6AABC75D115AE50005B83D1402@william.fitzgerald.beer.com> Date: Fri, 20 Jul 2001 12:33:15 +0200 X-Priority: Normal Content-Type: text/plain; charset=iso-8859-1 To: gparrott@lucent.com Subject: urgent advice on my net ip forwarding diagram please CC: netdev@oss.sgi.com X-Mailer: Web Based Pronto Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 799 Lines: 20 hi gregory, it's me again. i need your advice on a diagram i have drawn up in regard to the network ip forwarding stack for the linux 2.4.0-test9 kernel. does it make logical sence? (especially where i placed the dev_queue_xmit function).our network is being shut down for a few hours (hope not too long) aprox 1:40 pm today so i you get a chance could you reply before then.don't mean to rush you i know your busy. many thanks in advance,and thanks for previous help.i hope from the info i recieved last and the reading of the code i have learnt something!!!!! i had to add the gif as alink to my web site because attachments don't work. http://www.cs.may.ie/~williamf/linux2.4_net.gif regards william A beer.com Beer Mail fanatic Beer Mail, brought to you by your friends at beer.com. From owner-netdev@oss.sgi.com Fri Jul 20 09:41:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6KGfQA15037 for netdev-outgoing; Fri, 20 Jul 2001 09:41:26 -0700 Received: from c0mailgw05.prontomail.com (mailgw.prontomail.com [216.163.180.10]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6KGfQV15034 for ; Fri, 20 Jul 2001 09:41:26 -0700 Received: from c6web103 (216.163.178.10) by c0mailgw05.prontomail.com (NPlex 5.5.029) id 3B56EF720002E57F for netdev@oss.sgi.com; Fri, 20 Jul 2001 09:36:10 -0700 X-Version: beer 6.3.3353.0 From: "william fitzgerald" Message-Id: Date: Fri, 20 Jul 2001 17:45:07 +0200 X-Priority: Normal Content-Type: text/plain; charset=iso-8859-1 To: netdev@oss.sgi.com Subject: 2.4 routing diagram to be verified please X-Mailer: Web Based Pronto Mime-Version: 1.0 Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 448 Lines: 15 hi all, is there anyone who can verify if my diagram for the linux 2.4.0-test9 kernel in regard to routing is correct or at least half right.sorry about this i did mail this earlier but our network crashed so i'm not sure if it got to you. please find the diagram at: http://www.cs.may.ie/~williamf/linux2.4_net.gif many thanks in advance, regards william. A beer.com Beer Mail fanatic Beer Mail, brought to you by your friends at beer.com. From owner-netdev@oss.sgi.com Fri Jul 20 19:00:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6L20LN13177 for netdev-outgoing; Fri, 20 Jul 2001 19:00:21 -0700 Received: from horus.its.uow.edu.au (horus.its.uow.edu.au [130.130.68.25]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6L20GV13173 for ; Fri, 20 Jul 2001 19:00:16 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by horus.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id LAA08946; Sat, 21 Jul 2001 11:59:28 +1000 (EST) Message-ID: <3B58E262.29833960@uow.edu.au> Date: Sat, 21 Jul 2001 12:01:06 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: alex@foogod.com CC: Jeff Garzik , netdev@oss.sgi.com, Andrey Savochkin Subject: Re: [PATCH] eepro100 and IFF_RUNNING under 2.4 References: <20010720143247.A6596@draco.foogod.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1767 Lines: 48 alex@foogod.com wrote: > > Hiho.. While working on some code which tries to monitor physical link status > for ethernet interfaces, I noticed that apparently under the 2.4 kernel, the > eepro100 driver does not reflect link status with the IFF_RUNNING flag as it > used to under 2.2.x. > > It looks like a bit of the code in eepro100.c didn't get updated to reflect > some interface changes that happened somewhere in 2.3, so here is a patch which > should fix things (It also adds a bit of code to set things properly on startup as well, patch is against kernel 2.4.6). > > (I'm not quite sure who to send this to, so I'm sending it to the list) Interesting. I don't think the driver should have been manipulating IFF_RUNNING in response to link status changes in the first place. You were missing an `& 0x1f' when using the PHY index. Not sure it's necessary, but... --- linux-2.4.7-pre6/drivers/net/eepro100.c Wed Jul 4 18:21:26 2001 +++ lk-ext3/drivers/net/eepro100.c Sat Jul 21 12:04:28 2001 @@ -976,6 +976,11 @@ speedo_open(struct net_device *dev) if ((sp->phy[0] & 0x8000) == 0) sp->advertising = mdio_read(ioaddr, sp->phy[0] & 0x1f, 4); + if (mdio_read(ioaddr, sp->phy[0] & 0x1f, 1) & 0x0004) + netif_carrier_on(dev); + else + netif_carrier_off(dev); + if (speedo_debug > 2) { printk(KERN_DEBUG "%s: Done speedo_open(), status %8.8x.\n", dev->name, inw(ioaddr + SCBStatus)); @@ -1088,9 +1093,9 @@ static void speedo_timer(unsigned long d mdio_read(ioaddr, phy_num, 1); /* If link beat has returned... */ if (mdio_read(ioaddr, phy_num, 1) & 0x0004) - dev->flags |= IFF_RUNNING; + netif_carrier_on(dev); else - dev->flags &= ~IFF_RUNNING; + netif_carrier_off(dev); } } if (speedo_debug > 3) { From owner-netdev@oss.sgi.com Sat Jul 21 08:12:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6LFC9c22073 for netdev-outgoing; Sat, 21 Jul 2001 08:12:09 -0700 Received: from wiprom2mx1.wipro.com (wiprom2mx1.wipro.com [203.197.164.41]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6LFC6V22070 for ; Sat, 21 Jul 2001 08:12:07 -0700 Received: from m2vwall3.wipro.com (m2vwall3.wipro.com [192.168.2.23]) by wiprom2mx1.wipro.com (8.10.2+Sun/8.11.3) with SMTP id f6LKouu15980 for ; Sat, 21 Jul 2001 20:50:56 GMT Received: from Wipro.com ([192.168.191.219]) by lvlmail.mail.wipro.com (Netscape Messaging Server 4.15) with ESMTP id GGTXC800.CDW; Sat, 21 Jul 2001 20:51:44 +0530 Message-ID: <3B599DB7.7050602@Wipro.com> Date: Sat, 21 Jul 2001 20:50:23 +0530 From: "Parag Warudkar" User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010701 X-Accept-Language: en-us MIME-Version: 1.0 To: andrewm@uow.edu.au, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: 3c59x Problems Content-Type: multipart/mixed; boundary="------------InterScan_NT_MIME_Boundary" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 3334 Lines: 81 This is a multi-part message in MIME format. --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Hello, I have a 3c905C and get following errors under medium network loads. The hub, network, and all other stuff is quite good. Also on same machine Win2K gave no problems. When the error occurs, I get transfer speeds of 0.43 Kbps. I never experienced the same with Win2K. DMESG OUTPUT 3c59x.c:LK1.1.15 6 June 2001 Donald Becker and others. http://www.scyld.com/network/vortex.html See Documentation/networking/vortex.txt 01:0c.0: 3Com PCI 3c905C Tornado at 0xec80, 00:b0:d0:69:77:71, IRQ 5 product code 0000 rev 00.14 date 07-16-104 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. 01:0c.0: scatter/gather disabled. h/w checksums enabled eth0: using NWAY device table, not 8 eth0: Transmit error, Tx status register 82. Probably a duplex mismatch. See Documentation/networking/vortex.txt Flags; bus-master 1, dirty 343441(1) current 343441(1) Transmit list 00000000 vs. dedfe240. 0: @dedfe200 length 800002f2 status 800102f2 1: @dedfe240 length 80000042 status 00010042 2: @dedfe280 length 80000042 status 00010042 3: @dedfe2c0 length 80000042 status 00010042 4: @dedfe300 length 80000042 status 00010042 5: @dedfe340 length 80000042 status 00010042 6: @dedfe380 length 80000042 status 00010042 7: @dedfe3c0 length 8000004a status 0001004a 8: @dedfe400 length 80000042 status 00010042 9: @dedfe440 length 800002d0 status 000102d0 10: @dedfe480 length 80000042 status 00010042 11: @dedfe4c0 length 80000042 status 00010042 12: @dedfe500 length 80000042 status 00010042 13: @dedfe540 length 80000042 status 00010042 14: @dedfe580 length 8000004a status 0001004a 15: @dedfe5c0 length 80000042 status 00010042 *******************Keeps repeating.. After some time the network becomes unusable. The problem is always reproducible. I run Kernel-2.4.6-2 on RH7.1 (From Rawhide). Is there any update for the 3c59x driver for 2.4.x kernels? Parag --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; name="InterScan_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="InterScan_Disclaimer.txt" The Information contained and transmitted by this E-MAIL is proprietary to Wipro Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail & notify us immediately at mailadmin@wipro.com --------------InterScan_NT_MIME_Boundary-- From owner-netdev@oss.sgi.com Sat Jul 21 08:32:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6LFWR222989 for netdev-outgoing; Sat, 21 Jul 2001 08:32:27 -0700 Received: from isis.its.uow.edu.au (isis.its.uow.edu.au [130.130.68.21]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6LFWGV22977 for ; Sat, 21 Jul 2001 08:32:17 -0700 Received: from uow.edu.au (wumpus.its.uow.edu.au [130.130.68.12]) by isis.its.uow.edu.au (8.9.3/8.9.3) with ESMTP id BAA15960; Sun, 22 Jul 2001 01:31:59 +1000 (EST) Message-ID: <3B59A0D7.29CA7768@uow.edu.au> Date: Sun, 22 Jul 2001 01:33:43 +1000 From: Andrew Morton X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.7-pre6 i686) X-Accept-Language: en MIME-Version: 1.0 To: Parag Warudkar CC: netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: 3c59x Problems References: <3B599DB7.7050602@Wipro.com> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 24964 Lines: 714 Parag Warudkar wrote: > > Hello, > I have a 3c905C and get following errors under medium > network loads. > The hub, network, and all other stuff is quite good. Also on same > machine Win2K gave > no problems. When the error occurs, I get transfer speeds of 0.43 Kbps. > I never experienced the > same with Win2K. You have a duplex mismatch. Perhaps win2k is running at half duplex. > DMESG OUTPUT > > 3c59x.c:LK1.1.15 6 June 2001 Donald Becker and others. > http://www.scyld.com/network/vortex.html > See Documentation/networking/vortex.txt > 01:0c.0: 3Com PCI 3c905C Tornado at 0xec80, 00:b0:d0:69:77:71, IRQ 5 > product code 0000 rev 00.14 date 07-16-104 > 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. > MII transceiver found at address 24, status 782d. > Enabling bus-master transmits and whole-frame receives. > 01:0c.0: scatter/gather disabled. h/w checksums enabled > eth0: using NWAY device table, not 8 > > eth0: Transmit error, Tx status register 82. > Probably a duplex mismatch. See Documentation/networking/vortex.txt Was this not helpful? > Is there any update for the 3c59x driver for 2.4.x kernels? Yes, there is. It's below. It could just conceivably help - it changes the driver so that we no longer reset the transceiver at open/close time. This resetting has the potential to confuse autonegotiation, and it can happen quite frequently as the dhcp client software likes to open and close the interface rapidly. The concept was borrowed from Donald Becker's latest 3c59x.c. Please test - if it helps, please let me know. If it doesn't, please send me a full report as per the final section of Documentation/networking/vortex.txt. Thanks. --- linux-2.4.7/drivers/net/3c59x.c Wed Jul 4 18:21:26 2001 +++ linux-akpm/drivers/net/3c59x.c Sun Jul 22 01:36:21 2001 @@ -9,9 +9,13 @@ Members of the series include Fast EtherLink 3c590/3c592/3c595/3c597 and the EtherLink XL 3c900 and 3c905 cards. + Problem reports and questions should be directed to + vortex@scyld.com + The author may be reached as becker@scyld.com, or C/O - Center of Excellence in Space Data and Information Sciences - Code 930.5, Goddard Space Flight Center, Greenbelt MD 20771 + Scyld Computing Corporation + 410 Severn Ave., Suite 210 + Annapolis MD 21403 Linux Kernel Additions: @@ -50,7 +54,7 @@ - Put vortex_info_tbl into __devinitdata - In the vortex_error StatsFull HACK, disable stats in vp->intr_enable as well as in the hardware. - - Increased the loop counter in wait_for_completion from 2,000 to 4,000. + - Increased the loop counter in issue_and_wait from 2,000 to 4,000. LK1.1.5 28 April 2000, andrewm - Added powerpc defines (John Daniel said these work...) @@ -121,7 +125,7 @@ LK1.1.12 1 Jan 2001 andrewm (2.4.0-pre1) - Call pci_enable_device before we request our IRQ (Tobias Ringstrom) - Add 3c590 PCI latency timer hack to vortex_probe1 (from 0.99Ra) - - Added extended wait_for_completion for the 3c905CX. + - Added extended issue_and_wait for the 3c905CX. - Look for an MII on PHY index 24 first (3c905CX oddity). - Add HAS_NWAY to 3cSOHO100-TX (Brett Frankenberger) - Don't free skbs we don't own on oom path in vortex_open(). @@ -149,6 +153,19 @@ - Implemented alloc_etherdev() API - Special-case the 'Tx error 82' message. + LK1.1.16 18 July 2001 akpm + - Make NETIF_F_SG dependent upon nr_free_highpages(), not on CONFIG_HIGHMEM + - Lessen verbosity of bootup messages + - Fix WOL - use new PM API functions. + - Use netif_running() instead of vp->open in suspend/resume. + - Don't reset the interface logic on open/close/rmmod. It upsets + autonegotiation, and hence DHCP (from 0.99T). + - Back out EEPROM_NORESET flag because of the above (we do it for all + NICs). + - Correct 3c982 identification string + - Rename wait_for_completion() to issue_and_wait() to avoid completion.h + clash. + - See http://www.uow.edu.au/~andrewm/linux/#3c59x-2.3 for more details. - Also see Documentation/networking/vortex.txt */ @@ -164,8 +181,8 @@ #define DRV_NAME "3c59x" -#define DRV_VERSION "LK1.1.15" -#define DRV_RELDATE "6 June 2001" +#define DRV_VERSION "LK1.1.16" +#define DRV_RELDATE "19 July 2001" @@ -230,6 +247,7 @@ static int vortex_debug = 1; #include #include #include +#include #include /* For NR_IRQS only. */ #include #include @@ -244,10 +262,11 @@ static int vortex_debug = 1; static char version[] __devinitdata = -DRV_NAME ".c:" DRV_VERSION " " DRV_RELDATE " Donald Becker and others. http://www.scyld.com/network/vortex.html\n"; +DRV_NAME ": Donald Becker and others. www.scyld.com/network/vortex.html\n"; MODULE_AUTHOR("Donald Becker "); -MODULE_DESCRIPTION("3Com 3c59x/3c90x/3c575 series Vortex/Boomerang/Cyclone driver"); +MODULE_DESCRIPTION("3Com 3c59x/3c9xx ethernet driver " + DRV_VERSION " " DRV_RELDATE); MODULE_PARM(debug, "i"); MODULE_PARM(options, "1-" __MODULE_STRING(8) "i"); MODULE_PARM(full_duplex, "1-" __MODULE_STRING(8) "i"); @@ -379,7 +398,7 @@ enum { IS_VORTEX=1, IS_BOOMERANG=2, IS_C EEPROM_8BIT=0x10, /* AKPM: Uses 0x230 as the base bitmaps for EEPROM reads */ HAS_PWR_CTRL=0x20, HAS_MII=0x40, HAS_NWAY=0x80, HAS_CB_FNS=0x100, INVERT_MII_PWR=0x200, INVERT_LED_PWR=0x400, MAX_COLLISION_RESET=0x800, - EEPROM_OFFSET=0x1000, EEPROM_NORESET=0x2000, HAS_HWCKSM=0x4000 }; + EEPROM_OFFSET=0x1000, HAS_HWCKSM=0x2000 }; enum vortex_chips { CH_3C590 = 0, @@ -475,7 +494,7 @@ static struct vortex_chip_info { PCI_USES_IO|PCI_USES_MASTER, IS_TORNADO|HAS_NWAY|HAS_HWCKSM, 128, }, {"3c980 Cyclone", PCI_USES_IO|PCI_USES_MASTER, IS_CYCLONE|HAS_HWCKSM, 128, }, - {"3c980 10/100 Base-TX NIC(Python-T)", + {"3c982 Dual Port Server Cyclone", PCI_USES_IO|PCI_USES_MASTER, IS_CYCLONE|HAS_HWCKSM, 128, }, {"3cSOHO100-TX Hurricane", @@ -487,7 +506,7 @@ static struct vortex_chip_info { HAS_HWCKSM, 128, }, {"3c556B Laptop Hurricane", PCI_USES_IO|PCI_USES_MASTER, IS_TORNADO|HAS_NWAY|EEPROM_OFFSET|HAS_CB_FNS|INVERT_MII_PWR| - EEPROM_NORESET|HAS_HWCKSM, 128, }, + HAS_HWCKSM, 128, }, {"3c575 [Megahertz] 10/100 LAN CardBus", PCI_USES_IO|PCI_USES_MASTER, IS_BOOMERANG|HAS_MII|EEPROM_8BIT, 128, }, @@ -752,6 +771,7 @@ struct vortex_private { partner_flow_ctrl:1, /* Partner supports flow control */ has_nway:1, enable_wol:1, /* Wake-on-LAN is enabled */ + pm_state_valid:1, /* power_state[] has sane contents */ open:1, medialock:1, must_free_region:1; /* Flag: if zero, Cardbus owns the I/O region */ @@ -767,6 +787,7 @@ struct vortex_private { u16 io_size; /* Size of PCI region (for release_region) */ spinlock_t lock; /* Serialise access to device & its vortex_private */ spinlock_t mdio_lock; /* Serialise access to mdio hardware */ + u32 power_state[16]; }; /* The action to take with a media selection timer tick. @@ -847,11 +868,8 @@ static int vortex_suspend (struct pci_de { struct net_device *dev = pdev->driver_data; - printk(KERN_DEBUG "vortex_suspend(%s)\n", dev->name); - if (dev && dev->priv) { - struct vortex_private *vp = (struct vortex_private *)dev->priv; - if (vp->open) { + if (netif_running(dev)) { netif_device_detach(dev); vortex_down(dev); } @@ -863,11 +881,8 @@ static int vortex_resume (struct pci_dev { struct net_device *dev = pdev->driver_data; - printk(KERN_DEBUG "vortex_resume(%s)\n", dev->name); - if (dev && dev->priv) { - struct vortex_private *vp = (struct vortex_private *)dev->priv; - if (vp->open) { + if (netif_running(dev)) { vortex_up(dev); netif_device_attach(dev); } @@ -958,13 +973,12 @@ static int __devinit vortex_probe1(struc int i, step; struct net_device *dev; static int printed_version; - int retval; + int retval, print_info; struct vortex_chip_info * const vci = &vortex_info_tbl[chip_idx]; char *print_name; if (!printed_version) { printk (KERN_INFO "%s", version); - printk (KERN_INFO "See Documentation/networking/vortex.txt\n"); printed_version = 1; } @@ -977,14 +991,40 @@ static int __devinit vortex_probe1(struc goto out; } SET_MODULE_OWNER(dev); + vp = dev->priv; + + /* The lower four bits are the media type. */ + if (dev->mem_start) { + /* + * The 'options' param is passed in as the third arg to the + * LILO 'ether=' argument for non-modular use + */ + option = dev->mem_start; + } + else if (card_idx < MAX_UNITS) + option = options[card_idx]; + else + option = -1; - printk(KERN_INFO "%s: 3Com %s %s at 0x%lx, ", + if (option > 0) { + if (option & 0x8000) + vortex_debug = 7; + if (option & 0x4000) + vortex_debug = 2; + if (option & 0x0400) + vp->enable_wol = 1; + } + + print_info = (vortex_debug > 1); + if (print_info) + printk (KERN_INFO "See Documentation/networking/vortex.txt\n"); + + printk(KERN_INFO "%s: 3Com %s %s at 0x%lx. Vers " DRV_VERSION "\n", print_name, pdev ? "PCI" : "EISA", vci->name, ioaddr); - vp = dev->priv; dev->base_addr = ioaddr; dev->irq = irq; dev->mtu = mtu; @@ -1048,19 +1088,6 @@ static int __devinit vortex_probe1(struc if (pdev) pci_set_drvdata(pdev, dev); - /* The lower four bits are the media type. */ - if (dev->mem_start) { - /* - * AKPM: ewww.. The 'options' param is passed in as the third arg to the - * LILO 'ether=' argument for non-modular use - */ - option = dev->mem_start; - } - else if (card_idx < MAX_UNITS) - option = options[card_idx]; - else - option = -1; - vp->media_override = 7; if (option >= 0) { vp->media_override = ((option & 7) == 2) ? 0 : option & 15; @@ -1117,27 +1144,33 @@ static int __devinit vortex_probe1(struc printk(" ***INVALID CHECKSUM %4.4x*** ", checksum); for (i = 0; i < 3; i++) ((u16 *)dev->dev_addr)[i] = htons(eeprom[i + 10]); - for (i = 0; i < 6; i++) - printk("%c%2.2x", i ? ':' : ' ', dev->dev_addr[i]); + if (print_info) { + for (i = 0; i < 6; i++) + printk("%c%2.2x", i ? ':' : ' ', dev->dev_addr[i]); + } EL3WINDOW(2); for (i = 0; i < 6; i++) outb(dev->dev_addr[i], ioaddr + i); #ifdef __sparc__ - printk(", IRQ %s\n", __irq_itoa(dev->irq)); + if (print_info) + printk(", IRQ %s\n", __irq_itoa(dev->irq)); #else - printk(", IRQ %d\n", dev->irq); + if (print_info) + printk(", IRQ %d\n", dev->irq); /* Tell them about an invalid IRQ. */ - if (vortex_debug && (dev->irq <= 0 || dev->irq >= NR_IRQS)) + if (dev->irq <= 0 || dev->irq >= NR_IRQS) printk(KERN_WARNING " *** Warning: IRQ %d is unlikely to work! ***\n", dev->irq); #endif EL3WINDOW(4); step = (inb(ioaddr + Wn4_NetDiag) & 0x1e) >> 1; - printk(KERN_INFO " product code %02x%02x rev %02x.%d date %02d-" - "%02d-%02d\n", eeprom[6]&0xff, eeprom[6]>>8, eeprom[0x14], - step, (eeprom[4]>>5) & 15, eeprom[4] & 31, eeprom[4]>>9); + if (print_info) { + printk(KERN_INFO " product code %02x%02x rev %02x.%d date %02d-" + "%02d-%02d\n", eeprom[6]&0xff, eeprom[6]>>8, eeprom[0x14], + step, (eeprom[4]>>5) & 15, eeprom[4] & 31, eeprom[4]>>9); + } if (pdev && vci->drv_flags & HAS_CB_FNS) { @@ -1151,8 +1184,10 @@ static int __devinit vortex_probe1(struc if (!vp->cb_fn_base) goto free_ring; } - printk(KERN_INFO "%s: CardBus functions mapped %8.8lx->%p\n", - print_name, fn_st_addr, vp->cb_fn_base); + if (print_info) { + printk(KERN_INFO "%s: CardBus functions mapped %8.8lx->%p\n", + print_name, fn_st_addr, vp->cb_fn_base); + } EL3WINDOW(2); n = inw(ioaddr + Wn2_ResetOptions) & ~0x4010; @@ -1170,7 +1205,8 @@ static int __devinit vortex_probe1(struc if (vp->info1 & 0x8000) { vp->full_duplex = 1; - printk(KERN_INFO "Full duplex capable\n"); + if (print_info) + printk(KERN_INFO "Full duplex capable\n"); } { @@ -1181,16 +1217,17 @@ static int __devinit vortex_probe1(struc if ((vp->available_media & 0xff) == 0) /* Broken 3c916 */ vp->available_media = 0x40; config = inl(ioaddr + Wn3_Config); - if (vortex_debug > 1) + if (print_info) { printk(KERN_DEBUG " Internal config register is %4.4x, " "transceivers %#x.\n", config, inw(ioaddr + Wn3_Options)); - printk(KERN_INFO " %dK %s-wide RAM %s Rx:Tx split, %s%s interface.\n", - 8 << RAM_SIZE(config), - RAM_WIDTH(config) ? "word" : "byte", - ram_split[RAM_SPLIT(config)], - AUTOSELECT(config) ? "autoselect/" : "", - XCVR(config) > XCVR_ExtMII ? "" : - media_tbl[XCVR(config)].name); + printk(KERN_INFO " %dK %s-wide RAM %s Rx:Tx split, %s%s interface.\n", + 8 << RAM_SIZE(config), + RAM_WIDTH(config) ? "word" : "byte", + ram_split[RAM_SPLIT(config)], + AUTOSELECT(config) ? "autoselect/" : "", + XCVR(config) > XCVR_ExtMII ? "" : + media_tbl[XCVR(config)].name); + } vp->default_media = XCVR(config); if (vp->default_media == XCVR_NWAY) vp->has_nway = 1; @@ -1198,8 +1235,9 @@ static int __devinit vortex_probe1(struc } if (vp->media_override != 7) { - printk(KERN_INFO " Media override to transceiver type %d (%s).\n", - vp->media_override, media_tbl[vp->media_override].name); + printk(KERN_INFO "%s: Media override to transceiver type %d (%s).\n", + print_name, vp->media_override, + media_tbl[vp->media_override].name); dev->if_port = vp->media_override; } else dev->if_port = vp->default_media; @@ -1226,8 +1264,10 @@ static int __devinit vortex_probe1(struc mii_status = mdio_read(dev, phyx, 1); if (mii_status && mii_status != 0xffff) { vp->phys[phy_idx++] = phyx; - printk(KERN_INFO " MII transceiver found at address %d," - " status %4x.\n", phyx, mii_status); + if (print_info) { + printk(KERN_INFO " MII transceiver found at address %d," + " status %4x.\n", phyx, mii_status); + } if ((mii_status & 0x0040) == 0) mii_preamble_required++; } @@ -1246,13 +1286,12 @@ static int __devinit vortex_probe1(struc } } - if (pdev && vp->enable_wol && (vp->capabilities & CapPwrMgmt)) - acpi_set_WOL(dev); - if (vp->capabilities & CapBusMaster) { vp->full_bus_master_tx = 1; - printk(KERN_INFO" Enabling bus-master transmits and %s receives.\n", - (vp->info2 & 1) ? "early" : "whole-frame" ); + if (print_info) { + printk(KERN_INFO " Enabling bus-master transmits and %s receives.\n", + (vp->info2 & 1) ? "early" : "whole-frame" ); + } vp->full_bus_master_rx = (vp->info2 & 1) ? 1 : 2; vp->bus_master = 0; /* AKPM: vortex only */ } @@ -1261,10 +1300,10 @@ static int __devinit vortex_probe1(struc dev->open = vortex_open; if (vp->full_bus_master_tx) { dev->hard_start_xmit = boomerang_start_xmit; -#ifndef CONFIG_HIGHMEM - /* Actually, it still should work with iommu. */ - dev->features |= NETIF_F_SG; -#endif + if (nr_free_highpages() == 0) { + /* Actually, it still should work with iommu. */ + dev->features |= NETIF_F_SG; + } if (((hw_checksums[card_idx] == -1) && (vp->drv_flags & HAS_HWCKSM)) || (hw_checksums[card_idx] == 1)) { dev->features |= NETIF_F_IP_CSUM; @@ -1273,7 +1312,7 @@ static int __devinit vortex_probe1(struc dev->hard_start_xmit = vortex_start_xmit; } - if (vortex_debug > 0) { + if (print_info) { printk(KERN_INFO "%s: scatter/gather %sabled. h/w checksums %sabled\n", print_name, (dev->features & NETIF_F_SG) ? "en":"dis", @@ -1286,6 +1325,11 @@ static int __devinit vortex_probe1(struc dev->set_multicast_list = set_rx_mode; dev->tx_timeout = vortex_tx_timeout; dev->watchdog_timeo = (watchdog * HZ) / 1000; + if (pdev && vp->enable_wol) { + vp->pm_state_valid = 1; + pci_save_state(vp->pdev, vp->power_state); + acpi_set_WOL(dev); + } retval = register_netdev(dev); if (retval == 0) return 0; @@ -1306,7 +1350,7 @@ out: } static void -wait_for_completion(struct net_device *dev, int cmd) +issue_and_wait(struct net_device *dev, int cmd) { int i; @@ -1338,8 +1382,10 @@ vortex_up(struct net_device *dev) unsigned int config; int i; - if (vp->pdev && vp->enable_wol) /* AKPM: test not needed? */ + if (vp->pdev && vp->enable_wol) { pci_set_power_state(vp->pdev, 0); /* Go active */ + pci_restore_state(vp->pdev, vp->power_state); + } /* Before initializing select the active media port. */ EL3WINDOW(3); @@ -1352,19 +1398,23 @@ vortex_up(struct net_device *dev) dev->if_port = vp->media_override; } else if (vp->autoselect) { if (vp->has_nway) { - printk(KERN_INFO "%s: using NWAY device table, not %d\n", dev->name, dev->if_port); + if (vortex_debug > 1) + printk(KERN_INFO "%s: using NWAY device table, not %d\n", + dev->name, dev->if_port); dev->if_port = XCVR_NWAY; } else { /* Find first available media type, starting with 100baseTx. */ dev->if_port = XCVR_100baseTx; while (! (vp->available_media & media_tbl[dev->if_port].mask)) dev->if_port = media_tbl[dev->if_port].next; - printk(KERN_INFO "%s: first available media type: %s\n", + if (vortex_debug > 1) + printk(KERN_INFO "%s: first available media type: %s\n", dev->name, media_tbl[dev->if_port].name); } } else { dev->if_port = vp->default_media; - printk(KERN_INFO "%s: using default media %s\n", + if (vortex_debug > 1) + printk(KERN_INFO "%s: using default media %s\n", dev->name, media_tbl[dev->if_port].name); } @@ -1420,8 +1470,11 @@ vortex_up(struct net_device *dev) dev->name, config); } - wait_for_completion(dev, TxReset); - wait_for_completion(dev, RxReset); + issue_and_wait(dev, TxReset); + /* + * Don't reset the PHY - that upsets autonegotiation during DHCP operations. + */ + issue_and_wait(dev, RxReset|0x04); outw(SetStatusEnb | 0x00, ioaddr + EL3_CMD); @@ -1494,7 +1547,7 @@ vortex_up(struct net_device *dev) set_rx_mode(dev); outw(StatsEnable, ioaddr + EL3_CMD); /* Turn on statistics. */ -// wait_for_completion(dev, SetTxStart|0x07ff); +// issue_and_wait(dev, SetTxStart|0x07ff); outw(RxEnable, ioaddr + EL3_CMD); /* Enable the receiver. */ outw(TxEnable, ioaddr + EL3_CMD); /* Enable transmitter. */ /* Allow status bits to be seen. */ @@ -1523,9 +1576,6 @@ vortex_open(struct net_device *dev) int i; int retval; - if (vp->pdev && vp->enable_wol) /* AKPM: test not needed? */ - pci_set_power_state(vp->pdev, 0); /* Go active */ - /* Use the now-standard shared IRQ implementation. */ if ((retval = request_irq(dev->irq, vp->full_bus_master_rx ? &boomerang_interrupt : &vortex_interrupt, SA_SHIRQ, dev->name, dev))) { @@ -1566,7 +1616,6 @@ vortex_open(struct net_device *dev) } vortex_up(dev); - vp->open = 1; return 0; out_free_irq: @@ -1732,7 +1781,7 @@ static void vortex_tx_timeout(struct net if (vortex_debug > 0) dump_tx_ring(dev); - wait_for_completion(dev, TxReset); + issue_and_wait(dev, TxReset); vp->stats.tx_errors++; if (vp->full_bus_master_tx) { @@ -1842,12 +1891,13 @@ vortex_error(struct net_device *dev, int /* In this case, blow the card away */ vortex_down(dev); - wait_for_completion(dev, TotalReset | 0xff); + issue_and_wait(dev, TotalReset | 0xff); vortex_up(dev); /* AKPM: bug. vortex_up() assumes that the rx ring is full. It may not be. */ } else if (fifo_diag & 0x0400) do_tx_reset = 1; if (fifo_diag & 0x3000) { - wait_for_completion(dev, RxReset); + /* Reset Rx fifo and upload logic */ + issue_and_wait(dev, RxReset|0x07); /* Set the Rx filter to the current state. */ set_rx_mode(dev); outw(RxEnable, ioaddr + EL3_CMD); /* Re-enable the receiver. */ @@ -1856,7 +1906,7 @@ vortex_error(struct net_device *dev, int } if (do_tx_reset) { - wait_for_completion(dev, TxReset|reset_mask); + issue_and_wait(dev, TxReset|reset_mask); outw(TxEnable, ioaddr + EL3_CMD); if (!vp->full_bus_master_tx) netif_wake_queue(dev); @@ -1908,7 +1958,7 @@ vortex_start_xmit(struct sk_buff *skb, s if (tx_status & 0x04) vp->stats.tx_fifo_errors++; if (tx_status & 0x38) vp->stats.tx_aborted_errors++; if (tx_status & 0x30) { - wait_for_completion(dev, TxReset); + issue_and_wait(dev, TxReset); } outw(TxEnable, ioaddr + EL3_CMD); } @@ -1985,7 +2035,7 @@ boomerang_start_xmit(struct sk_buff *skb spin_lock_irqsave(&vp->lock, flags); /* Wait for the stall to complete. */ - wait_for_completion(dev, DownStall); + issue_and_wait(dev, DownStall); prev_entry->next = cpu_to_le32(vp->tx_ring_dma + entry * sizeof(struct boom_tx_desc)); if (inl(ioaddr + DownListPtr) == 0) { outl(vp->tx_ring_dma + entry * sizeof(struct boom_tx_desc), ioaddr + DownListPtr); @@ -2306,7 +2356,7 @@ static int vortex_rx(struct net_device * "size %d.\n", dev->name, pkt_len); } vp->stats.rx_dropped++; - wait_for_completion(dev, RxDiscard); + issue_and_wait(dev, RxDiscard); } return 0; @@ -2459,8 +2509,10 @@ vortex_down(struct net_device *dev) if (vp->full_bus_master_tx) outl(0, ioaddr + DownListPtr); - if (vp->pdev && vp->enable_wol && (vp->capabilities & CapPwrMgmt)) + if (vp->pdev && vp->enable_wol) { + pci_save_state(vp->pdev, vp->power_state); acpi_set_WOL(dev); + } } static int @@ -2522,7 +2574,6 @@ vortex_close(struct net_device *dev) } } - vp->open = 0; return 0; } @@ -2544,7 +2595,7 @@ dump_tx_ring(struct net_device *dev) printk(KERN_ERR " Transmit list %8.8x vs. %p.\n", inl(ioaddr + DownListPtr), &vp->tx_ring[vp->dirty_tx % TX_RING_SIZE]); - wait_for_completion(dev, DownStall); + issue_and_wait(dev, DownStall); for (i = 0; i < TX_RING_SIZE; i++) { printk(KERN_ERR " %d: @%p length %8.8x status %8.8x\n", i, &vp->tx_ring[i], @@ -2821,8 +2872,10 @@ static void acpi_set_WOL(struct net_devi /* The RxFilter must accept the WOL frames. */ outw(SetRxFilter|RxStation|RxMulticast|RxBroadcast, ioaddr + EL3_CMD); outw(RxEnable, ioaddr + EL3_CMD); + /* Change the power state to D3; RxEnable doesn't take effect. */ - pci_set_power_state(vp->pdev, 0x8103); + pci_enable_wake(vp->pdev, 0, 1); + pci_set_power_state(vp->pdev, 3); } @@ -2843,8 +2896,15 @@ static void __devexit vortex_remove_one * here */ unregister_netdev(dev); - /* Should really use wait_for_completion() here */ - outw((vp->drv_flags & EEPROM_NORESET) ? (TotalReset|0x10) : TotalReset, dev->base_addr + EL3_CMD); + /* Should really use issue_and_wait() here */ + outw(TotalReset|0x14, dev->base_addr + EL3_CMD); + + if (vp->pdev && vp->enable_wol) { + pci_set_power_state(vp->pdev, 0); /* Go active */ + if (vp->pm_state_valid) + pci_restore_state(vp->pdev, vp->power_state); + } + pci_free_consistent(pdev, sizeof(struct boom_rx_desc) * RX_RING_SIZE + sizeof(struct boom_tx_desc) * TX_RING_SIZE, --- linux-2.4.7/drivers/pci/pci.ids Wed Jul 4 18:21:27 2001 +++ linux-akpm/drivers/pci/pci.ids Sun Jul 22 01:36:21 2001 @@ -1561,8 +1561,8 @@ 10b7 1000 3C905C-TX Fast Etherlink for PC Management NIC 9800 3c980-TX [Fast Etherlink XL Server Adapter] 10b7 9800 3c980-TX Fast Etherlink XL Server Adapter - 9805 3c980-TX 10/100baseTX NIC [Python-T] - 10b7 9805 3c980 10/100baseTX NIC [Python-T] + 9805 3c982 Dual Port Server Cyclone + 10b7 9805 3c982 Dual Port Server Cyclone 10b8 Standard Microsystems Corp [SMC] 0005 83C170QF 1055 e000 LANEPIC --- linux-2.4.7/Documentation/networking/vortex.txt Wed Jul 4 18:21:24 2001 +++ linux-akpm/Documentation/networking/vortex.txt Sun Jul 22 01:37:40 2001 @@ -88,7 +88,7 @@ options=N1,N2,N3,... The individual options are composed of a number of bitfields which have the following meanings: - ssible media type settings + Possible media type settings 0 10baseT 1 10Mbs AUI 2 undefined @@ -104,8 +104,11 @@ options=N1,N2,N3,... When generating a value for the 'options' setting, the above media selection values may be OR'ed (or added to) the following: - 512 (0x200) Force full duplex mode. - 16 (0x10) Bus-master enable bit (Old Vortex cards only) + 0x8000 Set driver debugging level to 7 + 0x4000 Set driver debugging level to 2 + 0x0400 Enable Wake-on-LAN + 0x0200 Force full duplex mode. + 0x0010 Bus-master enable bit (Old Vortex cards only) For example: @@ -359,6 +362,11 @@ steps you should take: 8K byte-wide RAM 5:3 Rx:Tx split, autoselect/Autonegotiate interface. MII transceiver found at address 24, status 782d. Enabling bus-master transmits and whole-frame receives. + + NOTE: You must provide the `debug=2' modprobe option to generate + a full detection message. Please do this: + + modprobe 3c59x debug=2 o If it is a PCI device, the relevant output from 'lspci -vx', eg: From owner-netdev@oss.sgi.com Mon Jul 23 04:28:22 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NBSMw22885 for netdev-outgoing; Mon, 23 Jul 2001 04:28:22 -0700 Received: from shell.cyberus.ca (shell.cyberus.ca [209.195.95.7]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NBSKV22882 for ; Mon, 23 Jul 2001 04:28:20 -0700 Received: from localhost (hadi@localhost) by shell.cyberus.ca (8.9.3/666/Cyberus Online Inc.) with ESMTP id HAA05409; Mon, 23 Jul 2001 07:26:08 -0400 (EDT) X-Authentication-Warning: shell.cyberus.ca: hadi owned process doing -bs Date: Mon, 23 Jul 2001 07:26:08 -0400 (EDT) From: jamal To: Andi Kleen cc: Yann Dupont , Subject: Re: dst cache overflow on 2.2.16 kernel. In-Reply-To: <20010717173005.A23851@fred.local> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1193 Lines: 40 He might be having problems with route table not getting enough nh entries because of small neigh tables. obviously FW1 is doing something weird: Yann, try to increment the sizes of the arp tables, example: echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3 echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2 echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1 Or use higher values if you want cheers, jamal On Tue, 17 Jul 2001, Andi Kleen wrote: > On Mon, Jul 16, 2001 at 01:59:56PM +0200, Yann Dupont wrote: > > > > Hello. We have a firewall here (Checkpoint FW 1), installed on a RH 6.2 > > > > Every week or so, the FW logs this error : dst cache overflow > > and the routing stops. > > > > Is changing the value of /proc/sys/net/ipv4/route (actually set to 4096) > > a way to prevent this ? OR is this a kernel bug with this 2.2.16 realease ? > > > > I CAN'T change the kernel, nor the distro, as the whole is under > > contract ... and validated for this exact combination :( > > You can increase the /proc/sys/net/ipv4/route/gc_thresh sysctl trying to > work around it, but likely it's a bug in the FW-1 kernel module. > I would talk to Checkpoint. > > > -Andi > From owner-netdev@oss.sgi.com Mon Jul 23 05:23:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NCNRf26155 for netdev-outgoing; Mon, 23 Jul 2001 05:23:27 -0700 Received: from dea.waldorf-gmbh.de (u-123-10.karlsruhe.ipdial.viaginterkom.de [62.180.10.123]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NCNPV26149 for ; Mon, 23 Jul 2001 05:23:25 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f6NBrEj01458; Mon, 23 Jul 2001 13:53:14 +0200 Date: Mon, 23 Jul 2001 13:53:14 +0200 From: Ralf Baechle To: netdev@oss.sgi.com Cc: mm@bofh.de Subject: ping bug Message-ID: <20010723135313.A1439@bacchus.dhis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 459 Lines: 11 Forwarding this from sombody who reported to me in private. The following happened with the ping from Redhat (unknown version). It only happens under rather extreme circumstance on a network simulator: > 1458 packets transmitted, 1000 packets received, 31% packet loss > round-trip min/avg/max/mdev = 4668.765/379.140/4683.875/4658.706 ms ^^^^^^^^ ^^^^^^^^ Seems the computation is a bit on crack. Ralf From owner-netdev@oss.sgi.com Mon Jul 23 06:44:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NDirA31352 for netdev-outgoing; Mon, 23 Jul 2001 06:44:53 -0700 Received: from muscan (muscan.mahidol.ac.th [202.28.162.15]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NDipV31348 for ; Mon, 23 Jul 2001 06:44:52 -0700 Message-Id: <200107231344.f6NDipV31348@oss.sgi.com> Date: Mon, 23 Jul 2001 20:33:32 +0100 From: postmaster@oss.sgi.com To: Subject: InterScan NT Alert Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 260 Lines: 9 Receiver, InterScan has detected virus(es) in the e-mail attachment. Date: Mon, 23 Jul 2001 20:33:32 +0100 Method: Mail From: To: File: TABLAS.doc.com Action: clean failed - deleted Virus: TROJ_SIRCAM.A From owner-netdev@oss.sgi.com Mon Jul 23 07:01:46 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NE1kx00545 for netdev-outgoing; Mon, 23 Jul 2001 07:01:46 -0700 Received: from mx3out.umbc.edu (mx3out.umbc.edu [130.85.253.53]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NE1jV00542 for ; Mon, 23 Jul 2001 07:01:46 -0700 Received: from porty (andy@proxima.cs.umbc.edu [130.85.100.17]) by mx3out.umbc.edu (8.11.3/8.11.3) with SMTP id f6NE1hb10799 for ; Mon, 23 Jul 2001 10:01:44 -0400 (EDT) Date: Mon, 23 Jul 2001 10:01:43 -0400 From: "Lego Andy" To: netdev@oss.sgi.com Subject: Re: InterScan NT Alert Message-Id: <20010723100143.45c80434.me@andy.cx> In-Reply-To: <200107231344.f6NDipV31348@oss.sgi.com> References: <200107231344.f6NDipV31348@oss.sgi.com> Organization: X0 X-Mailer: Sylpheed version 0.5.1 (GTK+ 1.2.10; i386-debian-linux-gnu) Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 520 Lines: 14 > Receiver, InterScan has detected virus(es) in the e-mail attachment. > File: TABLAS.doc.com > Action: clean failed - deleted He... Do not use Windows, I guess... Andy -- /\ | | |~\ \ / ------------------------------------------------ / \ |\ | | | \/ / e-mail: andy@x0.org )\._.,--....,'``. |--| | \| | | / / ICQ: 27889915 /, _.. \ _\ (`._ ,. | | | | |_/ / / http://andy.x0.org `._.-(,_..'--(,_..'`-.;.' ---------------------- From owner-netdev@oss.sgi.com Mon Jul 23 10:12:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NHCDv11670 for netdev-outgoing; Mon, 23 Jul 2001 10:12:13 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NHBuV11659 for ; Mon, 23 Jul 2001 10:11:57 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA06292; Mon, 23 Jul 2001 21:11:47 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id BAA07086; Mon, 23 Jul 2001 01:15:39 +0400 Message-Id: <200107222115.BAA07086@mops.inr.ac.ru> Subject: Re: static routes and dead gateway detection To: ja@ssi.bg (Julian Anastasov) Date: Mon, 23 Jul 2001 01:15:39 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: from "Julian Anastasov" at Jun 27, 1 03:03:52 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1535 Lines: 53 Hello! > In the current kernels I see that all routes are deleted when > a device goes down (!IFF_UP -> NETDEV_DOWN). Yes. > I can't find a place > where the proto static routes are used. proto is not used by kernel (except for marking routes created by itself with proto "kernel"), it is used by routing daemons, namely, gated. > So, I implemented a way to make the proto static routes permanent. Not so bad idea. Only pretty useless one, gated and brothers do this nicely. Implementation is wrong, but you will get this effect using code under #ifdef CONFIG_IP_ROUTE_MULTIPATH for normal routes. > kernel(s). It is for 2.2 and can be ported to 2.4 too. How these RTPROT > codes are really used in the routing daemons and do they use static > routes too? Look into gated manual, it explains diffrence of routes with RTF_STATIC (BSD term). > What I see as problem even in the plain 2.2.19 kernel is that > when one device for one of the nexthops (when the prefsrc is not from > this device) is removed and Sorry, you did something wrong here. On unregister you must destroy all the references to this device. Being unregistered, the device disappears forever and cannot return. > added again it can receive another dev index Full non-sense. "This" device cannot get another index, index is the only thing distinguishing devices. > The patch contains a fix in fib_sync_up() about similar problem, i.e. > not to touch nh_dev for DEAD routes. Do not leave undefined crap there, that's answer. :-) Alexey From owner-netdev@oss.sgi.com Mon Jul 23 13:17:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NKHwa01378 for netdev-outgoing; Mon, 23 Jul 2001 13:17:58 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NKHqX01372 for ; Mon, 23 Jul 2001 13:17:52 -0700 Received: from nimue.bos.bindview.com ([192.233.133.108]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id MAA08300 for ; Mon, 23 Jul 2001 12:40:17 -0700 (PDT) mail_from (lcamtuf@gis.net) Received: from localhost (lcamtuf@localhost) by nimue.bos.bindview.com (8.11.0/8.11.0) with ESMTP id f6NJawo02428; Mon, 23 Jul 2001 15:36:59 -0400 X-Authentication-Warning: nimue.bos.bindview.com: lcamtuf owned process doing -bs Date: Mon, 23 Jul 2001 15:36:58 -0400 (EDT) From: Michal Zalewski X-Sender: lcamtuf@nimue.bos.bindview.com To: jjciarla@raiz.uncu.edu.ar, netdev@oss.sgi.com, davem@redhat.com Subject: Linux IP masquerading helper vulnerability Message-ID: X-Nmymbofr: Nir Orb Buk MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5394 Lines: 122 Hi folks, [ sorry for multiple recipients, just wanted to reach anybody ] [ doing Linux networking stuff as soon as possible... ] It seems to me there's a vulnerability in Linux 2.2 (2.0 as well, I presume) IP masquerading IRC DCC helper, which does not check for acceptable port ranges or IPs when examining DCC command packet. Other helpers do (e.g. ip_masq_ftp), which raises the bar for potential attacks a little bit. Short reminder why it is Bad Thing, plus an advisory draft addressing this issue (yes, I actually do it for living ;) is attached below. I'd appreciate your acknowledgement of this problem, and information on eventual fixes or, khem, what we consider, "vendor response statement" (basically, it is good to go for it if you disagree with some points). This is only a draft, and it isn't public for now, so no need to hurry and release it tonight, if confirmed :) I'd feel better if you can ping me before, just to have the advisory ready to go and in shape. Thanks in advance, -- _____________________________________________________ Michal Zalewski [lcamtuf@bos.bindview.com] [security] [http://lcamtuf.coredump.cx] <=-=> bash$ :(){ :|:&};: =-=> Did you know that clones never use mirrors? <=-= Title: Linux kernel IP masquerading vulnerability Author: Michal Zalewski Date: July 23, 2001 Affected platforms: Linux 2.0, Linux 2.2, possibly other systems Brief description: Remotely exploitable IP masquerading vulnerability in Linux kernel can be used to penetrate protected private network. Details: Last year, there was a discussion on exploiting NAT packet inspection mechanisms on Linux and other operating systems in order to open an inbound TCP port on the firewall, by forcing client HTTP client or MUA software to send specific data pattern without user's knowledge and will (see http://www.securityfocus.com/archive/82/50226). The original advisory by Mikael Olsson discussed FTP masquerading helper vulnerability. This specific pattern sent by client software, when found in outbound traffic, is interpreted by the firewall as legitimate, user-initiated transfer request, and certain external machine is temporarily allowed to initiate inbound connections to the location specified in malicious packet by using the firewall as packet forwarder. Appropriate (but not necessarily sufficient - see later explainations) workarounds were incorporated in Linux kernels released later and are present in numerous firewall operating systems. Unfortunately, protocols other than mentioned in original discussions seem to be vulnerable, as well. We found that IRC DCC helper (the Linux 2.2 ip_masq_irc module, and modules shipped with some other operating systems / firewalling software) can be exploited with or similar pattern, depending on helper implementation details ("addr" is the internal machine's IP address as a decimal integer). There is no port or IP matching. This sequence can be crafted in HTML e-mail or on visited webpage. The attacker should listen on remote host on port 6667, and generate valid FTP protocol responses. Then, he will receive information about port number on the firewall that will be forwarded into protected network. See discussion in original advisory for more details on this attack. Workarounds: This new NAT server vulnerability related to DCC simply adds to the collection of similar vulnerabilities in other protocols, none of which have been fixed in a comprehensive way. In general, the following five types of workarounds might be used: 1) Have the NAT server allow only a certain range of ports in processed requests. This workaround (only ports above 1024 are allowed) is currently implemented by Linux and other vendors. Unfortunately, this does not stop attacks or scans against thousands of high-port services - most popular of them are msrdp, iad1-3, lotusnotes, jetdirect, afs3-bos, ms-sql-s, venus-se, instl_bootc, nfa, sun-answerbook, "undocumented" rpc on Solaris, xaudio, nfs, lockd, X11, dtspc, font-service, callbook and many others - database engines, management subsystems, etc. 2) Have the firewall do more careful inspection of protocol traffic. This could identify and block noncompliant IRC client behavior, such as the behavior of an HTML e-mail client when accessing an ftp URL. Unfortunately, this requires very careful protocol tracking, and can be fooled by careful URL construction (e.g. passing the following string as ftp server username: "evilhacker%20+iw%20evilhacker%20evilhacker%0d" "%0anick%20hacker") and response fragmentation, or using a Java applet. 3) Use a personal firewall (e.g., ZoneAlarm) on the internal machine that asks the user before connecting to an unusual port (6667) or before accepting suspected forwarded connection. Suitable personal firewalls may not be available for every OS. 4) Research, design, and develop some way for the NAT server to ask the internal user whether he really requested an inbound port (e.g. one-time challenge-response authentication). 5) Don't have any helper modules on your NAT server. For any protocol that needs a helper, require users to deploy a tunnel instead. From owner-netdev@oss.sgi.com Mon Jul 23 13:30:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NKU0602058 for netdev-outgoing; Mon, 23 Jul 2001 13:30:00 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NKTxX02047 for ; Mon, 23 Jul 2001 13:30:00 -0700 Received: from dea.waldorf-gmbh.de (u-223-19.karlsruhe.ipdial.viaginterkom.de [62.180.19.223]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id LAA08323 for ; Mon, 23 Jul 2001 11:07:03 -0700 (PDT) mail_from (ralf@dea.waldorf-gmbh.de) Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f6NI6ik02910; Mon, 23 Jul 2001 20:06:44 +0200 Date: Mon, 23 Jul 2001 20:06:44 +0200 From: Ralf Baechle To: Lego Andy Cc: netdev@oss.sgi.com Subject: Re: InterScan NT Alert Message-ID: <20010723200644.A2878@bacchus.dhis.org> References: <200107231344.f6NDipV31348@oss.sgi.com> <20010723100143.45c80434.me@andy.cx> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <20010723100143.45c80434.me@andy.cx>; from me@andy.cx on Mon, Jul 23, 2001 at 10:01:43AM -0400 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 469 Lines: 13 On Mon, Jul 23, 2001 at 10:01:43AM -0400, Lego Andy wrote: > > Receiver, InterScan has detected virus(es) in the e-mail attachment. > > File: TABLAS.doc.com > > Action: clean failed - deleted > > He... Do not use Windows, I guess... Sigh... That's what you get for not babysitting the list for a few hours. There was a huge number of copies of this virus still in the queue. I've zapped them so hopefully many subscribers shouldn't see all of this junk. Ralf From owner-netdev@oss.sgi.com Mon Jul 23 15:12:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NMCNM13404 for netdev-outgoing; Mon, 23 Jul 2001 15:12:23 -0700 Received: from titan.bieringer.de (mail.bieringer.de [195.226.187.51]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NMCL713401 for ; Mon, 23 Jul 2001 15:12:21 -0700 Received: (qmail 10401 invoked from network); 23 Jul 2001 22:12:14 -0000 Received: from pd950f522.dip.t-dialin.net (HELO worker.muc.bieringer.de) (217.80.245.34) by mail.bieringer.de with SMTP; 23 Jul 2001 22:12:14 -0000 Date: Tue, 24 Jul 2001 00:13:03 +0200 From: Peter Bieringer To: netdev@oss.sgi.com Subject: Re: Octava Clase Message-ID: <140100000.995926383@localhost> In-Reply-To: <200107232059.f6NKxOT17303@ultra5.unlm.edu.ar> References: <200107232059.f6NKxOT17303@ultra5.unlm.edu.ar> X-Mailer: Mulberry/2.1.0b2 (Linux/x86) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Content-Disposition: inline Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 589 Lines: 20 Hi, please set "ultra5.unlm.edu.ar" immediatly on your incoming blocklist. Received: from ultra5.unlm.edu.ar (ultra5.unlm.edu.ar [170.210.32.160]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NL25b03645 for ; Mon, 23 Jul 2001 14:02:05 -0700 --On Monday, July 23, 2001 06:00:53 PM -0300 T wrote: This server is used as outgoing relay (two stage) during this day not only on this list, also on my personal e-mailserver. It's currently only listed on following blocklist: 553 Open relay. Please see http://orbz.org/?170.210.32.160 Peter From owner-netdev@oss.sgi.com Mon Jul 23 16:09:40 2001 Received: (from root@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NN9eq22618 for netdev-outgoing@oss.sgi.com; Mon, 23 Jul 2001 16:09:40 -0700 Resent-Message-Id: <200107232309.f6NN9eq22618@oss.sgi.com> Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6NMQxj15424 for netdev-outgoing; Mon, 23 Jul 2001 15:26:59 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6NMQw715420 for ; Mon, 23 Jul 2001 15:26:58 -0700 Received: from u.domain.uli (ja.ssi.bg [212.95.166.64]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id NAA01269 for ; Mon, 23 Jul 2001 13:42:49 -0700 (PDT) mail_from (ja@ssi.bg) Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id f6NNKr501250; Mon, 23 Jul 2001 23:20:53 GMT Date: Mon, 23 Jul 2001 23:20:53 +0000 (GMT) From: Julian Anastasov X-X-Sender: To: Alexey Kuznetsov cc: Subject: Re: static routes and dead gateway detection In-Reply-To: <200107222115.BAA07086@mops.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Resent-From: root@oss.sgi.com Resent-Date: Mon, 23 Jul 2001 16:09:40 -0700 Resent-To: netdev-outgoing@oss.sgi.com Content-Length: 4331 Lines: 107 Hello Alexey, On Mon, 23 Jul 2001, Alexey Kuznetsov wrote: > > I can't find a place > > where the proto static routes are used. > > proto is not used by kernel (except for marking routes created > by itself with proto "kernel"), it is used by routing daemons, > namely, gated. Yes, while writing the patch I found it in gated, I looked in zebra too and I see that the static routes are not changed from these daemons. > > So, I implemented a way to make the proto static routes permanent. > > Not so bad idea. Only pretty useless one, gated and brothers do this nicely. I have setups with many rules and routes, in different tables, I have to check whether all known daemons have multiple tables support... But with such patch the kernel becomes a nice healthchecking daemon :)) > Implementation is wrong, but you will get this effect using code under > #ifdef CONFIG_IP_ROUTE_MULTIPATH for normal routes. This patch works nicely but may be I'm missing something ... When I delete devices the routes are flushed successfully (the dead ones particulary). So far, I didn't found any problems (month or so). > > kernel(s). It is for 2.2 and can be ported to 2.4 too. How these RTPROT > > codes are really used in the routing daemons and do they use static > > routes too? > > Look into gated manual, it explains diffrence of routes with > RTF_STATIC (BSD term). Yes, I have read some docs on this issue. > > What I see as problem even in the plain 2.2.19 kernel is that > > when one device for one of the nexthops (when the prefsrc is not from > > this device) is removed and > > > Sorry, you did something wrong here. On unregister you must destroy all the > references to this device. Being unregistered, the device disappears > forever and cannot return. In this patch the dead gateways are flushed exactly at the same time when all other (non-static) dead routes are flushed, i.e. when the device is removed (fib_num_down_nh_devs, sorry for the bad func name, it is incorrect, may be the comments too). This ugly function tries to distinguish whether a prefsrc was deleted or the device was marked down. We try to preserve the dead routes only when the device is marked down. And not only for multipath routes. The real problem is that the multipath routes can conatin dead paths. This is not mine, I see it as a 2.2 (may be 2.4 too) problem with non-permanent devices. When the device is removed I can see that the route contains dead nexthop looking like: default nexthop via A.A.A.A dev ifXXX weight 1 dead nexthop via B.B.B.B dev eth0 weight 1 these "ifXXX" device names printed from the iproute's ll_idx_n2a function mean the device does not exist and this is true. Later (after adding the new device) tcpdump shows these ifXXX names :) This is funny and I have to check the reason. May be the old dev index is inherited from the route. In any case, currently, the multipath route recreating is mandatory after a device from this route is removed. I do this in my setup: the routes are recreated after the devices are recreated. > > added again it can receive another dev index > > Full non-sense. "This" device cannot get another index, index > is the only thing distinguishing devices. Yes, the device is deleted but the nexthops remain because the multipath route is autodeleted when all nexthops are dead. I don't claim that when the device is created again, the "ifXXX" is replaced with the previous name. It remains "ifXXX". This is the reason I'm talking about nh_ifname or similar solution. Then the new device can replace the old one, by name. Then the route not need to be recreated. But this can open another discussion, may be. > > The patch contains a fix in fib_sync_up() about similar problem, i.e. > > not to touch nh_dev for DEAD routes. > > Do not leave undefined crap there, that's answer. :-) You'll correct me if I'm wrong :) For such devs, nh_dev points to a crap (when the device is removed). This is a multipath route. If the crap nh_dev->flags&IFF_UP is true we reach to nh->nh_dev != dev. nh_dev != dev can fail only when a new device allocates the same space. Can't happen but 0.0001% is possible :) This is the reason I'm comparing the if indexes but you can argue that ifindex can wrap :) > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Tue Jul 24 01:46:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6O8kRM25834 for netdev-outgoing; Tue, 24 Jul 2001 01:46:27 -0700 Received: from popeye.ipv6.univ-nantes.fr (postfix@popeye.ipv6.univ-nantes.fr [193.52.101.20]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6O8kPO25830 for ; Tue, 24 Jul 2001 01:46:25 -0700 Received: from olive.ipv6.univ-nantes.fr (olive.ipv6.univ-nantes.fr [193.52.101.22]) by popeye.ipv6.univ-nantes.fr (Postfix) with ESMTP id DD556668; Tue, 24 Jul 2001 10:46:23 +0200 (CEST) Subject: Re: dst cache overflow on 2.2.16 kernel. From: Yann Dupont To: jamal Cc: Andi Kleen , Yann Dupont , netdev@oss.sgi.com In-Reply-To: References: Content-Type: text/plain Content-Transfer-Encoding: 7bit X-Mailer: Evolution/0.11 (Beta Release) Date: 24 Jul 2001 10:46:23 +0200 Message-Id: <995964383.31608.42.camel@olive> Mime-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk On 23 Jul 2001 07:26:08 -0400, jamal wrote: > Thanks to all for all the advices. > He might be having problems with route table not getting enough nh entries > because of small neigh tables. > > obviously FW1 is doing something weird: > Yann, try to increment the sizes of the arp tables, example: > > echo 8192 > /proc/sys/net/ipv4/neigh/default/gc_thresh3 > echo 4096 > /proc/sys/net/ipv4/neigh/default/gc_thresh2 > echo 1024 > /proc/sys/net/ipv4/neigh/default/gc_thresh1 > I'd like to understand the meaning of gc_thresh1,2,3 ... gc_thresh is something like garbage collection threshold ? So what's the meaning of 1,2,3 ? > Or use higher values if you want > For the moment, the increase of /proc/sys/net/ipv4/route/max_size and /proc/sys/net/ipv4/route/gc_thresh seems to work OK. But as we're on holiday now, there's not lot of students to stress the firewall.Anyway, strictly speaking of the kernel ; What's annoying me is that I don't really uderstand the meanings of the values I manipulate ; I'm not sure those modifications are of any help. route --cache show a really important number of entries - 95% of those entries show a same destination network - IN fact, a lot of subnets of a class C network. Is there any other command that can show me the values/saturation of the netkork tables ? This leads me to believe there's something broken in the firewall configuration.(and I can't do anything for that) Anyway, the 3rd party who installed the firewall saw the checkpoint module is not up to date ... So now I just have to wait them for an action... > cheers, Thanks again... Yann Dupont. -- \|/ ____ \|/ Fac. des sciences de Nantes-Linux-Python-IPv6-ATM-BONOM.... "@'/ ,. \@" Tel :(+33) [0]251125865 [0]251125868(Fax) /_| \__/ |_\ Yann.Dupont@sciences.univ-nantes.fr \__U_/ http://www.unantes.univ-nantes.fr/~dupont From owner-netdev@oss.sgi.com Tue Jul 24 10:46:24 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6OHkOX26779 for netdev-outgoing; Tue, 24 Jul 2001 10:46:24 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OHkLO26776 for ; Tue, 24 Jul 2001 10:46:21 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29796; Tue, 24 Jul 2001 21:43:40 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107241743.VAA29796@ms2.inr.ac.ru> Subject: Re: static routes and dead gateway detection To: ja@ssi.bg (Julian Anastasov) Date: Tue, 24 Jul 2001 21:43:39 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Julian Anastasov" at Jul 23, 1 11:20:53 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > But with such patch the kernel becomes a nice healthchecking daemon :)) Moving gated to kernel is not a nice idea. > In this patch the dead gateways are flushed exactly at the same > time when all other (non-static) dead routes are flushed, i.e. when the > device is removed (fib_num_down_nh_devs, sorry for the bad func name, it is > incorrect, may be the comments too). This ugly function tries to > distinguish whether a prefsrc was deleted or the device was marked down. > We try to preserve the dead routes only when the device is marked down. > And not only for multipath routes. fib_sync_down() makes exactly the same thing. Just remove check for multipath, uncomment fib_sync_up() and live in peace. > The real problem is that the multipath routes can conatin dead > paths. This is not mine, I see it as a 2.2 (may be 2.4 too) I see, this is bug and this bug is fatal for 2.4. Sigh... need to fix. > about nh_ifname or similar solution. Then the new device can replace the > old one, by name. "name" is not something distinguishing devices. Index is. > You'll correct me if I'm wrong :) You are right, unfortunately. nh_dev must be cleared on device unregister, otherwise kernel will lockup at module unload. Alexey From owner-netdev@oss.sgi.com Tue Jul 24 11:09:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6OI93p27696 for netdev-outgoing; Tue, 24 Jul 2001 11:09:03 -0700 Received: from galactica.it (mail2.galactica.it [212.41.208.19] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OI91O27693 for ; Tue, 24 Jul 2001 11:09:01 -0700 Received: from mail pickup service by galactica.it with Microsoft SMTPSVC; Tue, 24 Jul 2001 20:06:12 +0200 Received: from csai.unipa.it ([147.163.26.7]) by galactica.it with Microsoft SMTPSVC(5.5.1877.537.53); Tue, 24 Jul 2001 20:00:22 +0200 Received: from oss.sgi.com (oss.sgi.com [216.32.174.27]) by csai.unipa.it (8.9.2/8.9.2) with ESMTP id TAA26233 for ; Tue, 24 Jul 2001 19:53:42 +0200 Received: from localhost (mail@localhost) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OHnn526986; Tue, 24 Jul 2001 10:49:49 -0700 X-Authentication-Warning: oss.sgi.com: mail owned process doing -bs Received: by oss.sgi.com (bulk_mailer v1.13); Tue, 24 Jul 2001 10:46:24 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6OHkOX26779 for netdev-outgoing; Tue, 24 Jul 2001 10:46:24 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OHkLO26776 for ; Tue, 24 Jul 2001 10:46:21 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA29796; Tue, 24 Jul 2001 21:43:40 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107241743.VAA29796@ms2.inr.ac.ru> Subject: Re: static routes and dead gateway detection To: ja@ssi.bg (Julian Anastasov) Date: Tue, 24 Jul 2001 21:43:39 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Julian Anastasov" at Jul 23, 1 11:20:53 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > But with such patch the kernel becomes a nice healthchecking daemon :)) Moving gated to kernel is not a nice idea. > In this patch the dead gateways are flushed exactly at the same > time when all other (non-static) dead routes are flushed, i.e. when the > device is removed (fib_num_down_nh_devs, sorry for the bad func name, it is > incorrect, may be the comments too). This ugly function tries to > distinguish whether a prefsrc was deleted or the device was marked down. > We try to preserve the dead routes only when the device is marked down. > And not only for multipath routes. fib_sync_down() makes exactly the same thing. Just remove check for multipath, uncomment fib_sync_up() and live in peace. > The real problem is that the multipath routes can conatin dead > paths. This is not mine, I see it as a 2.2 (may be 2.4 too) I see, this is bug and this bug is fatal for 2.4. Sigh... need to fix. > about nh_ifname or similar solution. Then the new device can replace the > old one, by name. "name" is not something distinguishing devices. Index is. > You'll correct me if I'm wrong :) You are right, unfortunately. nh_dev must be cleared on device unregister, otherwise kernel will lockup at module unload. Alexey From owner-netdev@oss.sgi.com Tue Jul 24 13:10:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6OKAZ000429 for netdev-outgoing; Tue, 24 Jul 2001 13:10:35 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OKAXO00423 for ; Tue, 24 Jul 2001 13:10:34 -0700 Received: from u.domain.uli (ja.ssi.bg [212.95.166.64]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id NAA00473 for ; Tue, 24 Jul 2001 13:10:14 -0700 (PDT) mail_from (ja@ssi.bg) Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id f6ON1KK01567; Tue, 24 Jul 2001 23:01:20 GMT Date: Tue, 24 Jul 2001 23:01:20 +0000 (GMT) From: Julian Anastasov X-X-Sender: To: cc: Subject: Re: static routes and dead gateway detection In-Reply-To: <200107241743.VAA29796@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello Alexey, On Tue, 24 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > But with such patch the kernel becomes a nice healthchecking daemon :)) > > Moving gated to kernel is not a nice idea. Agreed. The patch size is different, though :) > > In this patch the dead gateways are flushed exactly at the same > > time when all other (non-static) dead routes are flushed, i.e. when the > > device is removed (fib_num_down_nh_devs, sorry for the bad func name, it is > > incorrect, may be the comments too). This ugly function tries to > > distinguish whether a prefsrc was deleted or the device was marked down. > > We try to preserve the dead routes only when the device is marked down. > > And not only for multipath routes. > > fib_sync_down() makes exactly the same thing. Just remove check > for multipath, uncomment fib_sync_up() and live in peace. Hm, I can't understand this trick. fib_flush always follows fib_sync_down (where the DEAD flag is correctly set). But fib_flush makes its decisions based on the same flag (fn_flush_list). I can't see how fib_sync_up will detect such routes, they are already flushed. The patch only adds checks in fn_flush_list, I don't see another solution, for now. It seems only in fn_flush_list the dead routes can be preserved if they are marked as result of device state change. > > about nh_ifname or similar solution. Then the new device can replace the > > old one, by name. > > "name" is not something distinguishing devices. Index is. I understand, I only note that similar trick works for ip rules :) I understand also that the implementation for routes (if reasonable) can lead to big changes ... > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Tue Jul 24 15:00:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6OM0ID06584 for netdev-outgoing; Tue, 24 Jul 2001 15:00:18 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6OM0GO06581 for ; Tue, 24 Jul 2001 15:00:17 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f6OM0Ag14126 for ; Tue, 24 Jul 2001 15:00:10 -0700 (PDT) Received: from jacoba1nt (jacoba-isdn-home.cisco.com [10.49.113.226]) by kaspit.cisco.com (Mirapoint) with SMTP id AKI03882; Wed, 25 Jul 2001 01:00:07 +0300 (GMT-3) From: "Jacob Avraham" To: "Network Development List" Subject: conflicting alignment requirements Date: Wed, 25 Jul 2001 00:57:37 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="windows-1255" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4522.1200 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk I noticed that some ethernet drivers, like the tulip, require that the receive buffers be 4 byte align (I believe due to h/w constrains). On the other hand, some upper layer code, like tc (for non x86/68k), checks if the IP header is 4 byte align, and if not, doesn't handle the packet. So it looks like tc and tulip can not be used on other architectures. Has this been discussed before and if yes what was the outcome? In the short term, would that be OK to remove this restriction from tc? Thanks, Jacob From owner-netdev@oss.sgi.com Tue Jul 24 23:43:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6P6hBt31419 for netdev-outgoing; Tue, 24 Jul 2001 23:43:11 -0700 Received: from u.domain.uli (ja.ssi.bg [212.95.166.64]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6P6h8O31411 for ; Tue, 24 Jul 2001 23:43:09 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id f6P9OUu00738; Wed, 25 Jul 2001 09:24:32 GMT Date: Wed, 25 Jul 2001 09:24:30 +0000 (GMT) From: Julian Anastasov X-X-Sender: To: cc: Subject: Re: static routes and dead gateway detection In-Reply-To: <200107241743.VAA29796@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello Alexey, On Tue, 24 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > fib_sync_down() makes exactly the same thing. Just remove check > for multipath, uncomment fib_sync_up() and live in peace. Oh, yes, the multipath check must be removed. I forgot it. Then fib_sync_up must be called for single-path routes too. > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Wed Jul 25 06:17:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PDHnA02065 for netdev-outgoing; Wed, 25 Jul 2001 06:17:49 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PDHlO02060 for ; Wed, 25 Jul 2001 06:17:47 -0700 Received: by colin.muc.de id <140628-3>; Wed, 25 Jul 2001 15:18:05 +0200 Message-ID: <20010725151707.44709@colin.muc.de> Date: Wed, 25 Jul 2001 15:17:08 +0200 From: Andi Kleen To: Jacob Avraham Cc: Network Development List Subject: Re: conflicting alignment requirements References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Jacob Avraham on Wed, Jul 25, 2001 at 12:57:37AM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Jul 25, 2001 at 12:57:37AM +0200, Jacob Avraham wrote: > I noticed that some ethernet drivers, like the tulip, > require that the receive buffers be 4 byte align > (I believe due to h/w constrains). > On the other hand, some upper layer code, like tc > (for non x86/68k), checks if the IP header is 4 byte align, > and if not, doesn't handle the packet. > So it looks like tc and tulip can not be used on other architectures. > > Has this been discussed before and if yes what was the outcome? > In the short term, would that be OK to remove this restriction > from tc? If some network stack code checks the alignment and reject unaligned packets it is a bug. Could you tell exactly which code does that? -Andi From owner-netdev@oss.sgi.com Wed Jul 25 06:39:10 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PDdAb04802 for netdev-outgoing; Wed, 25 Jul 2001 06:39:10 -0700 Received: from lust.cs.ohiou.edu (adsl-64-109-147-41.cleveland.oh.ameritech.net [64.109.147.41]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PDd8O04798 for ; Wed, 25 Jul 2001 06:39:08 -0700 Received: (from elb@localhost) by lust.cs.ohiou.edu (8.11.2/8.11.2) id f6PDd2Z02918; Wed, 25 Jul 2001 09:39:02 -0400 Date: Wed, 25 Jul 2001 09:39:02 -0400 From: Ethan Blanton To: linuxppc-dev@lists.linuxppc.org, netdev@oss.sgi.com Subject: airport reset on iBook2 Message-ID: <20010725093902.E2864@localhost.localdomain> Mail-Followup-To: linuxppc-dev@lists.linuxppc.org, netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="11Y7aswkeuHtSBEs" Content-Disposition: inline User-Agent: Mutt/1.2.5i X-Operating-System: Linux X-GnuPG-Fingerprint: A290 14A8 C682 5C88 AE51 4787 AFD9 00F4 883C 1C14 Sender: owner-netdev@oss.sgi.com Precedence: bulk --11Y7aswkeuHtSBEs Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hey, I don't know if this is a known problem (I haven't seen it mentioned) or not, but I'm getting the following error from my Airport card reasonably often: Jul 25 09:27:50 localhost kernel: NETDEV WATCHDOG: eth1: transmit timed out Jul 25 09:27:50 localhost kernel: eth1: Tx timeout! Resetting card. It carries with it the undesirable consequence that sometimes all network traffic stalls for anywhere from 1-10 (subjectively) seconds. This is probably happening on the order of every 3-5 hours, but when it happens it often happens several times in the span of a few minutes. I am sending this to linuxppc-dev and netdev, as I don't know whether this is airport-specific, or orinoco/hermes related. Ethan --=20 If I've told you once, I've told you once And once is all that you needed. -- The Refreshments, "Carefree" --11Y7aswkeuHtSBEs Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7Xsv2r9kA9Ig8HBQRAhAEAJ9UXrNfNr2OBGexj7dgRxO28QuK8ACcDnof KdUiYTu/5RO86Wi587s5nGg= =JPw2 -----END PGP SIGNATURE----- --11Y7aswkeuHtSBEs-- From owner-netdev@oss.sgi.com Wed Jul 25 07:39:23 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PEdNx09877 for netdev-outgoing; Wed, 25 Jul 2001 07:39:23 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PEdMO09874 for ; Wed, 25 Jul 2001 07:39:22 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f6PEdKg15631 for ; Wed, 25 Jul 2001 07:39:20 -0700 (PDT) Received: from lab200w2k (dhcp-64-103-121-189.cisco.com [64.103.121.189]) by kaspit.cisco.com (Mirapoint) with SMTP id AKI09627; Wed, 25 Jul 2001 17:39:14 +0300 (GMT-3) From: "Jacob Avraham" To: "Network Development List" Cc: "Network Development List" Subject: RE: conflicting alignment requirements Date: Wed, 25 Jul 2001 17:40:17 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <20010725151707.44709@colin.muc.de> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk > -----Original Message----- > From: Andi Kleen [mailto:ak@muc.de] > Sent: Wednesday, July 25, 2001 3:17 PM > To: Jacob Avraham > Cc: Network Development List > Subject: Re: conflicting alignment requirements > > > On Wed, Jul 25, 2001 at 12:57:37AM +0200, Jacob Avraham wrote: > > I noticed that some ethernet drivers, like the tulip, > > require that the receive buffers be 4 byte align > > (I believe due to h/w constrains). > > On the other hand, some upper layer code, like tc > > (for non x86/68k), checks if the IP header is 4 byte align, > > and if not, doesn't handle the packet. > > So it looks like tc and tulip can not be used on other architectures. > > > > Has this been discussed before and if yes what was the outcome? > > In the short term, would that be OK to remove this restriction > > from tc? > > If some network stack code checks the alignment and reject unaligned > packets it is a bug. Could you tell exactly which code does that? > > -Andi > > From owner-netdev@oss.sgi.com Wed Jul 25 08:39:38 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PFdcJ14241 for netdev-outgoing; Wed, 25 Jul 2001 08:39:38 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PFdbO14238 for ; Wed, 25 Jul 2001 08:39:37 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f6PFdVg26603; Wed, 25 Jul 2001 08:39:32 -0700 (PDT) Received: from lab200w2k (dhcp-64-103-121-189.cisco.com [64.103.121.189]) by kaspit.cisco.com (Mirapoint) with SMTP id AKI10271; Wed, 25 Jul 2001 18:39:24 +0300 (GMT-3) From: "Jacob Avraham" To: "Andi Kleen" Cc: "Network Development List" Subject: RE: conflicting alignment requirements Date: Wed, 25 Jul 2001 18:40:27 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <20010725151707.44709@colin.muc.de> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk > On Wed, Jul 25, 2001 at 12:57:37AM +0200, Jacob Avraham wrote: > > I noticed that some ethernet drivers, like the tulip, > > require that the receive buffers be 4 byte align > > (I believe due to h/w constrains). > > On the other hand, some upper layer code, like tc > > (for non x86/68k), checks if the IP header is 4 byte align, > > and if not, doesn't handle the packet. > > So it looks like tc and tulip can not be used on other architectures. > > > > Has this been discussed before and if yes what was the outcome? > > In the short term, would that be OK to remove this restriction > > from tc? > > If some network stack code checks the alignment and reject unaligned > packets it is a bug. Could you tell exactly which code does that? > > -Andi > > This is taken from net/sched/cls_u32.c: static int u32_classify(struct sk_buff *skb, struct tcf_proto *tp, struct tcf_result *res) { struct { struct tc_u_knode *knode; u8 *ptr; } stack[TC_U32_MAXDEPTH]; struct tc_u_hnode *ht = (struct tc_u_hnode*)tp->root; u8 *ptr = skb->nh.raw; struct tc_u_knode *n; int sdepth = 0; int off2 = 0; int sel = 0; int i; #if !defined(__i386__) && !defined(__mc68000__) if ((unsigned long)ptr & 3) return -1; #endif From owner-netdev@oss.sgi.com Wed Jul 25 10:18:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PHIhB17945 for netdev-outgoing; Wed, 25 Jul 2001 10:18:43 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PHIdO17941 for ; Wed, 25 Jul 2001 10:18:39 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA08166; Wed, 25 Jul 2001 21:18:09 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107251718.VAA08166@ms2.inr.ac.ru> Subject: Re: static routes and dead gateway detection To: ja@ssi.bg (Julian Anastasov) Date: Wed, 25 Jul 2001 21:18:09 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Julian Anastasov" at Jul 24, 1 11:01:20 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello! > Hm, I can't understand this trick. fib_flush always follows > fib_sync_down (where the DEAD flag is correctly set). But fib_flush > makes its decisions based on the same flag (fn_flush_list). We mark only nexthop as dead, the route itself is not marked dead. Currently this is used only for multipath, but it is easy to change, allowing routes without alive hops, if they are marked with proto static. Note, I am ready to recommend patch doing this for 2.4 just because it is the best tool to fix bug, noticed by you earlier. And that bug is vaery bad, showstopper in fact... > I understand, I only note that similar trick works for > ip rules :) Yes, policy rules are bound to names, like firewall rules. This is bad, but unavoidable, because no automatic systems to establish policy exist and it is even not clear how should it work, taking into account that "policy" in presence of dynamically changing picture is mostly impossible, so that using names remains the only way: bad, unreliable, but better than all the rest. As soon as such one is invented, using names becomes bug. Routes passed through this stage long ago, before linux appeared. :-) Alexey From owner-netdev@oss.sgi.com Wed Jul 25 10:53:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PHrg618444 for netdev-outgoing; Wed, 25 Jul 2001 10:53:42 -0700 Received: from l.himel.bg (IDENT:root@unamed.infotel.bg [212.39.68.18] (may be forged)) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PHreO18441 for ; Wed, 25 Jul 2001 10:53:40 -0700 Received: from linux.himel.bg (IDENT:ja@linux.himel.bg [127.0.0.1]) by l.himel.bg (8.9.3/8.9.3) with ESMTP id UAA14833; Wed, 25 Jul 2001 20:52:31 +0300 Date: Wed, 25 Jul 2001 20:52:31 +0300 (EEST) From: Julian Anastasov X-Sender: ja@l To: kuznet@ms2.inr.ac.ru cc: netdev@oss.sgi.com Subject: Re: static routes and dead gateway detection In-Reply-To: <200107251718.VAA08166@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Hello Alexey, On Wed, 25 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > Hm, I can't understand this trick. fib_flush always follows > > fib_sync_down (where the DEAD flag is correctly set). But fib_flush > > makes its decisions based on the same flag (fn_flush_list). > > We mark only nexthop as dead, the route itself is not marked dead. > Currently this is used only for multipath, but it is easy to change, > allowing routes without alive hops, if they are marked > with proto static. Aha, you want: - fib_sync_down not to set DEAD in fib_flags on dev down for RTPROT_STATIC (change required) - fib_semantic_match currently checks nh_flags for DEAD (no change here) - fn_flush_list will flush only routes with DEAD flag set in fib_flags (no change here) - undef fib_sync_up, i.e. use it for single-path routes too - Do we need changes in fn_hash_select_default? check for all nh_flags&DEAD ? Like in my patch but now fib_flags&DEAD is not enough? (sorry that I don't have test setup, the tested devices are in production) > Note, I am ready to recommend patch doing this for 2.4 just because > it is the best tool to fix bug, noticed by you earlier. And that bug > is vaery bad, showstopper in fact... > > > > I understand, I only note that similar trick works for > > ip rules :) > > Yes, policy rules are bound to names, like firewall rules. > This is bad, but unavoidable, because no automatic systems to establish > policy exist and it is even not clear how should it work, taking > into account that "policy" in presence of dynamically changing picture > is mostly impossible, so that using names remains the only way: > bad, unreliable, but better than all the rest. > As soon as such one is invented, using names becomes bug. > Routes passed through this stage long ago, before linux appeared. :-) I understand ... > Alexey Regards -- Julian Anastasov From owner-netdev@oss.sgi.com Wed Jul 25 12:15:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PJFY120258 for netdev-outgoing; Wed, 25 Jul 2001 12:15:34 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PJFWO20255 for ; Wed, 25 Jul 2001 12:15:32 -0700 Received: by colin.muc.de id <140682-3>; Wed, 25 Jul 2001 21:15:54 +0200 Message-ID: <20010725211552.55576@colin.muc.de> Date: Wed, 25 Jul 2001 21:15:52 +0200 From: Andi Kleen To: Jacob Avraham Cc: Andi Kleen , Network Development List Subject: Re: conflicting alignment requirements References: <20010725151707.44709@colin.muc.de> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: ; from Jacob Avraham on Wed, Jul 25, 2001 at 06:40:27PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Jul 25, 2001 at 06:40:27PM +0200, Jacob Avraham wrote: > #if !defined(__i386__) && !defined(__mc68000__) > if ((unsigned long)ptr & 3) > return -1; > #endif As far as I can see this code is wrong: all architectures should be able to handle unaligned accesses in kernel, otherwise they're remotely exploitable anyways. I guess you can just drop the ifdef, and if it should break anything complain to the maintainer of the architecture. -Andi > From owner-netdev@oss.sgi.com Wed Jul 25 12:44:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PJi4p21755 for netdev-outgoing; Wed, 25 Jul 2001 12:44:04 -0700 Received: from perninha.conectiva.com.br (perninha.conectiva.com.br [200.250.58.156]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PJhxO21746 for ; Wed, 25 Jul 2001 12:44:00 -0700 Received: from burns.conectiva (burns.conectiva [10.0.0.4]) by perninha.conectiva.com.br (Postfix) with SMTP id 9B7F238C15 for ; Wed, 25 Jul 2001 16:43:54 -0300 (EST) Received: (qmail 6381 invoked by uid 0); 25 Jul 2001 19:42:54 -0000 Received: from duckman.distro.conectiva (HELO duckman.conectiva.com.br) (root@10.0.17.2) by burns.conectiva with SMTP; 25 Jul 2001 19:42:54 -0000 Received: from localhost (riel@localhost) by duckman.conectiva.com.br (8.11.4/8.11.3) with ESMTP id f6PJhX907455; Wed, 25 Jul 2001 16:43:33 -0300 X-Authentication-Warning: duckman.distro.conectiva: riel owned process doing -bs Date: Wed, 25 Jul 2001 16:43:33 -0300 (BRST) From: Rik van Riel X-X-Sender: To: Ralf Baechle Cc: Lego Andy , Subject: Re: InterScan NT Alert In-Reply-To: <20010723200644.A2878@bacchus.dhis.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk On Mon, 23 Jul 2001, Ralf Baechle wrote: > On Mon, Jul 23, 2001 at 10:01:43AM -0400, Lego Andy wrote: > > > > Receiver, InterScan has detected virus(es) in the e-mail attachment. > > > File: TABLAS.doc.com > > > Action: clean failed - deleted > > > > He... Do not use Windows, I guess... > > Sigh... That's what you get for not babysitting the list for a few hours. > There was a huge number of copies of this virus still in the queue. I've > zapped them so hopefully many subscribers shouldn't see all of this junk. There's a simple solution to this. You just do the babysitting with a small group of people, who have write access to the set of regexps in CVS: http://spamfilter.nl.linux.org/ I managed to keep NL.linux.org fairly clean while I was away for 3 weeks ;) cheers, Rik -- Executive summary of a recent Microsoft press release: "we are concerned about the GNU General Public License (GPL)" http://www.surriel.com/ http://www.conectiva.com/ http://distro.conectiva.com/ From owner-netdev@oss.sgi.com Wed Jul 25 13:00:52 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PK0qO23498 for netdev-outgoing; Wed, 25 Jul 2001 13:00:52 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PK0jO23479 for ; Wed, 25 Jul 2001 13:00:47 -0700 Received: (qmail 11499 invoked by uid 1000); 25 Jul 2001 20:00:29 -0000 Date: Wed, 25 Jul 2001 22:00:29 +0200 From: clemens To: netdev@oss.sgi.com Subject: missing icmp errors for udp packets Message-ID: <20010725220029.A11360@ghanima.endorphin.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk hi folks! assertation: 2.4.7 omits icmp port unreachable errors for unbound udp ports on non-local interfaces. sounds strange? here are my observations: linux host guardian linux host ghanima [session 1] ghanima:~# tcpdump -i eth0 host guardian and port not ssh tcpdump: listening on eth0 .. [session 2] guardian:/home/guardian/therapy# nmap -sU -p 1-10 ghanima.endorphin.org (nmap is asked to udp scan ghanima port 1-10) Starting nmap V. 2.54BETA22 ( www.insecure.org/nmap/ ) Interesting ports on (62.116.8.197): Port State Service 1/udp open tcpmux 2/udp open compressnet 3/udp open compressnet 4/udp open unknown 5/udp open rje 6/udp open unknown 7/udp open echo 8/udp open unknown 9/udp open discard 10/udp open unknown all ports opened? can't be! let's look at the tcpdump trace on ghanima for it's eth0 output: [session 1] 21:35:57.781649 guardian > ghanima.endorphin.org: icmp: echo request 21:35:57.781683 ghanima.endorphin.org > guardian: icmp: echo reply 21:35:58.096727 guardian.51277 > ghanima.endorphin.org.echo: udp 0 21:35:58.096871 guardian.51277 > ghanima.endorphin.org.8: udp 0 21:35:58.097673 guardian.51277 > ghanima.endorphin.org.discard: udp 0 21:35:58.098479 guardian.51277 > ghanima.endorphin.org.1: udp 0 21:35:58.099285 guardian.51277 > ghanima.endorphin.org.2: udp 0 21:35:58.100029 guardian.51277 > ghanima.endorphin.org.3: udp 0 21:35:58.100721 guardian.51277 > ghanima.endorphin.org.6: udp 0 ..and so on. really there are no icmp destination, port unreachable errors sent. so nmap assumes all ports are opened. some output filters in iptables, you might think? ghanima:~# iptables -L Chain INPUT (policy ACCEPT) target prot opt source destination Chain FORWARD (policy ACCEPT) target prot opt source destination Chain OUTPUT (policy ACCEPT) target prot opt source destination definatly not. let's try something else: [session 1] ghanima:~# tcpdump -i lo tcpdump: listening on lo [session 2] ghanima:~# nmap -sU -p 1-10 localhost Starting nmap V. 2.54BETA22 ( www.insecure.org/nmap/ ) All 10 scanned ports on localhost (127.0.0.1) are: closed Nmap run completed -- 1 IP address (1 host up) scanned in 1 second ghanima:~$ [session 1] 21:43:05.255608 localhost > localhost: icmp: echo request 21:43:05.255648 localhost > localhost: icmp: echo reply 21:43:05.256945 localhost.41478 > localhost.www: . ack 4120283222 win 4096 21:43:05.256978 localhost.www > localhost.41478: R 4120283222:4120283222(0) win 0 (DF) 21:43:05.558663 localhost.41458 > localhost.10: udp 0 21:43:05.558719 localhost > localhost: icmp: localhost udp port 10 unreachable [tos 0xc0] 21:43:05.559668 localhost.41458 > localhost.echo: udp 0 21:43:05.559688 localhost > localhost: icmp: localhost udp port echo unreachable [tos 0xc0] 21:43:05.560165 localhost.41458 > localhost.discard: udp 0 21:43:05.560183 localhost > localhost: icmp: localhost udp port discard unreachable [tos 0xc0] .. and so on. huch, now we have icmp errors replied for the udp probe packets. this problem has been confirmed for at least for 2.4.7-pre8 in a totally different network setup. my debugging attempts so far: for both scans, local and non local, the udp "No Port" counter is increased. (/proc/net/snmp) which means for both scans an icmp packet is tried to be transmitted as far as concerning net/ipv4/udp.c [from udp.c - line 917] UDP_INC_STATS_BH(UdpNoPorts); icmp_send(skb, ICMP_DEST_UNREACH, ICMP_PORT_UNREACH, 0); i tried to use kdb to trace the icmp_send, but got stuck somewhere after ip_output in dev_queue_xmit. so obviously 2.4.7 really tries to send something out to eth0, but fails somewhere in low-level routines. since i'm not that good at kernel debugging, i'm handing this issue over. clemens p.s.: i'm using 8139too for eth0, but that really shouldn't matter. From owner-netdev@oss.sgi.com Wed Jul 25 13:11:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6PKBB524613 for netdev-outgoing; Wed, 25 Jul 2001 13:11:11 -0700 Received: from dea.waldorf-gmbh.de (u-46-20.karlsruhe.ipdial.viaginterkom.de [62.180.20.46]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6PKAoO24586; Wed, 25 Jul 2001 13:10:51 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f6PK9MD06836; Wed, 25 Jul 2001 22:09:22 +0200 Date: Wed, 25 Jul 2001 22:09:22 +0200 From: Ralf Baechle To: Rik van Riel Cc: Ralf Baechle , Lego Andy , netdev@oss.sgi.com Subject: Re: InterScan NT Alert Message-ID: <20010725220922.A6109@bacchus.dhis.org> References: <20010723200644.A2878@bacchus.dhis.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from riel@conectiva.com.br on Wed, Jul 25, 2001 at 04:43:33PM -0300 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk On Wed, Jul 25, 2001 at 04:43:33PM -0300, Rik van Riel wrote: > Date: Wed, 25 Jul 2001 16:43:33 -0300 (BRST) > From: Rik van Riel > To: Ralf Baechle > Cc: Lego Andy , > Subject: Re: InterScan NT Alert > > On Mon, 23 Jul 2001, Ralf Baechle wrote: > > On Mon, Jul 23, 2001 at 10:01:43AM -0400, Lego Andy wrote: > > > > > > Receiver, InterScan has detected virus(es) in the e-mail attachment. > > > > File: TABLAS.doc.com > > > > Action: clean failed - deleted > > > > > > He... Do not use Windows, I guess... > > > > Sigh... That's what you get for not babysitting the list for a few hours. > > There was a huge number of copies of this virus still in the queue. I've > > zapped them so hopefully many subscribers shouldn't see all of this junk. > > There's a simple solution to this. You just do the > babysitting with a small group of people, who have > write access to the set of regexps in CVS: > > http://spamfilter.nl.linux.org/ > > I managed to keep NL.linux.org fairly clean while > I was away for 3 weeks ;) The problem was caused by a bug in majordomo.cf which disable most of the filtering. Spamfilter couldn't have made a difference. Ralf -- "Embrace, Enhance, Eliminate" - it worked for the pope, it'll work for Bill. From owner-netdev@oss.sgi.com Thu Jul 26 02:51:18 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6Q9pIa11812 for netdev-outgoing; Thu, 26 Jul 2001 02:51:18 -0700 Received: from SCAN (scan.nist.gov [129.6.94.100]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6Q9pHV11809 for ; Thu, 26 Jul 2001 02:51:17 -0700 Message-Id: <200107260951.f6Q9pHV11809@oss.sgi.com> InterScan-Notification: yes From: InterScan@scan.nist.gov To: "netdev@oss.sgi.com"@oss.sgi.com Subject: Attachment Stripped in Transaction Date: Thu, 26 Jul 2001 05:51:43 -0500 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="----=_NextPart_000_996141103_B78506032.R82506026" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2297 Lines: 53 This is a multi-part message in MIME format. ------=_NextPart_000_996141103_B78506032.R82506026 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit ************* eManager Notification ************** This is an auto-generated message. There was an attachment to this email message. The NIST firewall has stripped it out due to the policy change that went into effect on May 5, 2000. For further information, please refer to http://www-i.nist.gov/itl/div895/895.02/nist_firewall_email.htm Source mailbox: "kriskotzer@hotmail.com" Destination mailbox(es): "netdev@oss.sgi.com" ******************* End of message ******************* ------=_NextPart_000_996141103_B78506032.R82506026 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Received: from oss.sgi.com (oss.sgi.com [216.32.174.27]) by ridley.nist.gov (8.11.1/8.11.1) with ESMTP id f6Q9p5Q23872; Thu, 26 Jul 2001 05:51:05 -0400 (EDT) Received: from localhost (mail@localhost) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6Q9p0B11765; Thu, 26 Jul 2001 02:51:00 -0700 X-Authentication-Warning: oss.sgi.com: mail owned process doing -bs Received: by oss.sgi.com (bulk_mailer v1.13); Thu, 26 Jul 2001 02:45:22 -0700 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6Q9jLD11169 for netdev-outgoing; Thu, 26 Jul 2001 02:45:21 -0700 Received: from eagle.sasktel.net ([142.165.19.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6Q9h2V10932 for ; Thu, 26 Jul 2001 02:43:02 -0700 Received: from KotzerKris.sk.sypatico.ca (hsdb-yktn-57-38.sasknet.sk.ca [142.165.57.38]) by eagle.sasktel.net (iPlanet Messaging Server 5.1 (built May 7 2001)) with SMTP id <0GH20023DQX2D9@eagle.sasktel.net> for netdev@oss.sgi.com; Thu, 26 Jul 2001 03:42:21 -0600 (GMT) Date: Thu, 26 Jul 2001 03:47:10 -0600 From: Kris Kotzer Subject: ltouin23 To: netdev@oss.sgi.com Message-id: <0GH20023EQX2D9@eagle.sasktel.net> MIME-version: 1.0 X-MIMEOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 X-Mailer: Microsoft Outlook Express 5.50.4133.2400 Content-type: multipart/mixed; boundary="Boundary_(ID_LxqBEjAgCvIJqSgeneGlLw)" Sender: owner-netdev@oss.sgi.com Precedence: bulk ------=_NextPart_000_996141103_B78506032.R82506026-- From owner-netdev@oss.sgi.com Thu Jul 26 04:29:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QBTWD25124 for netdev-outgoing; Thu, 26 Jul 2001 04:29:32 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QBTVV25121 for ; Thu, 26 Jul 2001 04:29:31 -0700 Received: from galileo5.galileo.co.il ([199.203.130.130]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id EAA06208 for ; Thu, 26 Jul 2001 04:29:03 -0700 (PDT) mail_from (rabeeh@galileo.co.il) Received: from galileo.co.il (linux2.galileo.co.il [10.2.40.2]) by galileo.co.il (8.8.5/8.8.5) with ESMTP id OAA04511; Thu, 26 Jul 2001 14:23:33 +0200 (GMT-2) Message-ID: <3B5FFCD7.1070602@galileo.co.il> Date: Thu, 26 Jul 2001 14:19:51 +0300 From: Rabeeh Khoury Organization: Galileo Technology User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010628 X-Accept-Language: en-us MIME-Version: 1.0 To: netdev@oss.sgi.com, ramic@galileo.co.il Subject: Linux TCP/IP stack Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 534 Lines: 19 Hi All, I have noticed (using tcpdump and simple TCP client-server application) that TCP frame are always sent with ACK flag even if the sender doesn't receive new frames (i.e. it send the same Acknowledgment Number). RFC 2581 (section 4.1) states that: "A TCP receiver MUST NOT generate more than one ACK for every incoming segment, other than to update the offered window as the receiving application consumes new data" According to the above statement it seems that LINUX TCP / IP violates this rule. Thanks, Rami From owner-netdev@oss.sgi.com Thu Jul 26 04:55:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QBtYP26776 for netdev-outgoing; Thu, 26 Jul 2001 04:55:34 -0700 Received: from yue.hongo.wide.ad.jp ([203.178.139.94]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QBtWV26771 for ; Thu, 26 Jul 2001 04:55:32 -0700 Received: from localhost (localhost [127.0.0.1]) by yue.hongo.wide.ad.jp (8.9.3+3.2W/8.9.3/Debian 8.9.3-21) with ESMTP id UAA31895; Thu, 26 Jul 2001 20:57:33 +0900 To: rabeeh@galileo.co.il Cc: netdev@oss.sgi.com Subject: Re: Linux TCP/IP stack In-Reply-To: <3B5FFCD7.1070602@galileo.co.il> References: <3B5FFCD7.1070602@galileo.co.il> X-Mailer: Mew version 1.94.2 on Emacs 20.7 / Mule 4.1 (AOI) X-URL: http://www.hongo.wide.ad.jp/%7Eyoshfuji/ X-Fingerprint: F7 31 65 99 5E B2 BB A7 15 15 13 23 18 06 A9 6F 57 00 6B 25 X-Pgp5-Key-Url: http://cerberus.nemoto.ecei.tohoku.ac.jp/%7Eyoshfuji/yoshfuji@ecei.tohoku.ac.jp.asc Mime-Version: 1.0 Content-Type: Text/Plain; charset=us-ascii Content-Transfer-Encoding: 7bit Message-Id: <20010726205733F.yoshfuji@wide.ad.jp> Date: Thu, 26 Jul 2001 20:57:33 +0900 From: YOSHIFUJI Hideaki / =?iso-2022-jp?B?GyRCNUhGIzFRTEAbKEI=?= X-Dispatcher: imput version 991025(IM133) Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 487 Lines: 13 In article <3B5FFCD7.1070602@galileo.co.il> (at Thu, 26 Jul 2001 14:19:51 +0300), Rabeeh Khoury says: > RFC 2581 (section 4.1) states that: > "A TCP receiver MUST NOT generate more than one ACK for every > incoming > segment, other than to update the offered window as the receiving > application consumes new data" > > According to the above statement it seems that LINUX TCP / IP violates this rule. What version of kernel are you using? --yoshfuji From owner-netdev@oss.sgi.com Thu Jul 26 05:02:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QC2d227775 for netdev-outgoing; Thu, 26 Jul 2001 05:02:39 -0700 Received: from galileo5.galileo.co.il (pop3.galileo.co.il [199.203.130.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QC2bV27762 for ; Thu, 26 Jul 2001 05:02:37 -0700 Received: from galileo.co.il (linux2.galileo.co.il [10.2.40.2]) by galileo.co.il (8.8.5/8.8.5) with ESMTP id PAA12333; Thu, 26 Jul 2001 15:01:48 +0200 (GMT-2) Message-ID: <3B6005CE.2060306@galileo.co.il> Date: Thu, 26 Jul 2001 14:58:06 +0300 From: Rabeeh Khoury Organization: Galileo Technology User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.2) Gecko/20010628 X-Accept-Language: en-us MIME-Version: 1.0 To: netdev@oss.sgi.com CC: Rabeeh Khoury , ramic@galileo.co.il Subject: Re: Linux TCP/IP stack References: <3B5FFCD7.1070602@galileo.co.il> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 621 Lines: 30 I'm using kernel 2.2.14 Rabeeh Khoury wrote: > Hi All, > > I have noticed (using tcpdump and simple TCP client-server application) > that TCP frame are always sent with ACK flag even if the sender doesn't > receive new frames (i.e. it send the same Acknowledgment Number). > > RFC 2581 (section 4.1) states that: > "A TCP receiver MUST NOT generate more than one ACK for every > incoming > segment, other than to update the offered window as the receiving > application consumes new data" > > According to the above statement it seems that LINUX TCP / IP violates > this rule. > > Thanks, Rami > > > > > . > From owner-netdev@oss.sgi.com Thu Jul 26 05:40:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QCeQU31871 for netdev-outgoing; Thu, 26 Jul 2001 05:40:26 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QCePV31868 for ; Thu, 26 Jul 2001 05:40:25 -0700 Received: by colin.muc.de id <140574-2>; Thu, 26 Jul 2001 14:40:47 +0200 Message-ID: <20010726144045.56795@colin.muc.de> Date: Thu, 26 Jul 2001 14:40:45 +0200 From: Andi Kleen To: Rabeeh Khoury Cc: netdev@oss.sgi.com, ramic@galileo.co.il Subject: Re: Linux TCP/IP stack References: <3B5FFCD7.1070602@galileo.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <3B5FFCD7.1070602@galileo.co.il>; from Rabeeh Khoury on Thu, Jul 26, 2001 at 01:19:51PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 872 Lines: 23 On Thu, Jul 26, 2001 at 01:19:51PM +0200, Rabeeh Khoury wrote: > Hi All, > > I have noticed (using tcpdump and simple TCP client-server application) > that TCP frame are always sent with ACK flag even if the sender doesn't > receive new frames (i.e. it send the same Acknowledgment Number). > > RFC 2581 (section 4.1) states that: > "A TCP receiver MUST NOT generate more than one ACK for every > incoming > segment, other than to update the offered window as the receiving > application consumes new data" > > According to the above statement it seems that LINUX TCP / IP violates this rule. ACK in this context means a new packet that only contains the ACK. If there are packets send for other reason it is ok to contain an ACK, in fact RFC793 (that's the real TCP standard) even states that every packet in Established should carry an ACK flag. -Andi From owner-netdev@oss.sgi.com Thu Jul 26 06:03:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QD33m01702 for netdev-outgoing; Thu, 26 Jul 2001 06:03:03 -0700 Received: from galileo5.galileo.co.il (pop3.galileo.co.il [199.203.130.130]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QD31V01692 for ; Thu, 26 Jul 2001 06:03:01 -0700 Received: from galileo.co.il ([10.2.1.5]) by galileo.co.il (8.8.5/8.8.5) with ESMTP id QAA09928; Thu, 26 Jul 2001 16:02:05 +0200 (GMT-2) Message-ID: <3B602365.AAF773CF@galileo.co.il> Date: Thu, 26 Jul 2001 16:04:21 +0200 From: Rami Cohen X-Mailer: Mozilla 4.7 [en] (WinNT; I) X-Accept-Language: en MIME-Version: 1.0 To: Andi Kleen CC: Rabeeh Khoury , netdev@oss.sgi.com Subject: Re: Linux TCP/IP stack References: <3B5FFCD7.1070602@galileo.co.il> <20010726144045.56795@colin.muc.de> Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1655 Lines: 44 Hi Andi, Thanks for your comments and your help. I'm aware to RFC 793 statement regarding to ACK. My problem is that these piggyback ACK can cause duplicate ACK and therefor invoke the Fast Retransmit mechanism. Consider the following situation when the sender send TCP segment and the receiver send ACK. Then the sender send another 3 (or more) segments, and meanwhile it receives 3 piggyback ACKs that have been sent by the receiver (i.e. the receive send 3 TCP frame before it receives the 3 packets that has been sent by the Sender). The sender can assume that these ACKs are duplicate ACK and therefor it enters into Fast Retransmit although it shouldn't. Can you help me with this issue. Thanks for your effort, RamiC. Andi Kleen wrote: > On Thu, Jul 26, 2001 at 01:19:51PM +0200, Rabeeh Khoury wrote: > > Hi All, > > > > I have noticed (using tcpdump and simple TCP client-server application) > > that TCP frame are always sent with ACK flag even if the sender doesn't > > receive new frames (i.e. it send the same Acknowledgment Number). > > > > RFC 2581 (section 4.1) states that: > > "A TCP receiver MUST NOT generate more than one ACK for every > > incoming > > segment, other than to update the offered window as the receiving > > application consumes new data" > > > > According to the above statement it seems that LINUX TCP / IP violates this rule. > > ACK in this context means a new packet that only contains the ACK. > If there are packets send for other reason it is ok to contain an ACK, > in fact RFC793 (that's the real TCP standard) even states that every > packet in Established should carry an ACK flag. > > -Andi From owner-netdev@oss.sgi.com Thu Jul 26 06:07:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QD7lJ02357 for netdev-outgoing; Thu, 26 Jul 2001 06:07:47 -0700 Received: from DarkStar.sns.it (root@flat-pool16-mdm140.aruba.it [62.149.155.142]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QD7iV02348 for ; Thu, 26 Jul 2001 06:07:44 -0700 Received: from localhost (venom@localhost) by DarkStar.sns.it (8.11.4/8.11.4) with ESMTP id f6QD7YY00619 for ; Thu, 26 Jul 2001 15:07:35 +0200 Date: Thu, 26 Jul 2001 15:07:34 +0200 (CEST) From: Luigi Genoni To: Subject: Fwd: arpd troubles (fwd) Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2145 Lines: 59 HI, I am mantaining an arpd daemon that take advantages from arpd support in linux kernel. My daemon is working properly with linux 2.2.16, but then with 2.4 kernels it does not work anymore. arpd code has not been changed inside of the kernel, I checked, but anyway something has been changed so that arpd is not working anymore. The real big problem is that also if arpd support is enabled, anyway arp cache grows to more than 256 entries, and that should not be allowed. This way, in fact, i saw that also on a double processor PIII 1 Ghz with around 500 arp entries i start to have performance problems, and two arpd users reported to me that with 1024 entries inside of arp cache, then cache overflows. Alan Cox suggested me to write you about this issue, since it is coming to be a serious problem. I can say that i have report of many nets where it is usual to have more the 512 arp going around. For example also the .sns.it, where i work, domain is two C class and I have to manage more or less 350 arp entries, usually. I thank you for your patience and help Luigi Genoni Unix System Manager Scuola Normale Superiore di Pisa ---------- Forwarded Message ---------- Subject: arpd troubles Date: Wed, 25 Jul 2001 02:22:45 -0500 From: H D Moore To: genoni@cibslogin.sns.it Hi, I have been trying to get your 1.0.4 version of arpd to work on the 2.4 kernels. The problem I am having is that the kernel seems to ignore the CONFIG_ARPD (and netlink, etc) option and never writes data to /dev/arpd. The result is that the arp table overflows after 1024 or so entries. The kernel option states that even if arpd is not running, it will make the arp cache a LIFO and limit it to 256 entries. This isnt happening at all, the kernel keeps filling up the arp table until it runs out of space. I have created the devices (36,0 / 36,8), configured arpd and netlink, and still no go. I have tried 2.4.5, 2.4.6, and 2.4.7, any idea what I am doing wrong? -HD btw, stracing arpd shows it trying to read from /dev/arpd and never getting anything... ------------------------------------------------------- From owner-netdev@oss.sgi.com Thu Jul 26 06:41:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QDflY05291 for netdev-outgoing; Thu, 26 Jul 2001 06:41:47 -0700 Received: from nic.nigdzie (qmailr@pa3.gliwice.sdi.tpnet.pl [213.25.220.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QDeWV05244 for ; Thu, 26 Jul 2001 06:40:34 -0700 Received: (qmail 24167 invoked by uid 500); 26 Jul 2001 13:38:53 -0000 Date: Thu, 26 Jul 2001 15:38:53 +0200 From: Jacek Konieczny To: netdev@oss.sgi.com Subject: Re: Fwd: arpd troubles (fwd) Message-ID: <20010726153852.A24151@nic.gliwice.sdi.tpnet.pl> Mail-Followup-To: netdev@oss.sgi.com Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 769 Lines: 19 On Thu, Jul 26, 2001 at 03:07:34PM +0200, Luigi Genoni wrote: > > HI, > I am mantaining an arpd daemon that take advantages from arpd support in > linux kernel. [..] > This way, in fact, i saw that also on a double processor PIII 1 Ghz with > around 500 arp entries i start to have performance problems, and two > arpd users reported to me that with 1024 entries inside of arp cache, then > cache overflows. I had the same problem on 2.2.19 kernel. The 1025th entry in arp (mostly static-arp entries) killed whole networking on the host. Nearly every network call got "not enough buffer space available". I managed to solve the problem using blackhole routes instead of fake static-arp entries (leaving about 360 arp entries, which work fine). Greets, Jacek From owner-netdev@oss.sgi.com Thu Jul 26 07:14:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QEEDQ07956 for netdev-outgoing; Thu, 26 Jul 2001 07:14:13 -0700 Received: from mail2.lsil.com (mail2.lsil.com [147.145.40.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QEECV07951 for ; Thu, 26 Jul 2001 07:14:12 -0700 Received: from mhbs.lsil.com (mhbs [147.145.31.100]) by mail2.lsil.com (8.9.3+Sun/8.9.1) with ESMTP id HAA08265 for ; Thu, 26 Jul 2001 07:14:11 -0700 (PDT) Received: from [153.79.12.11] by mhbs.lsil.com with ESMTP; Thu, 26 Jul 2001 07:14:01 -0700 Received: from lsil.com (nromernt.ks.lsil.com [153.79.8.107]) by exw-ks.ks.lsil.com with SMTP (Microsoft Exchange Internet Mail Service Version 5.5.2653.13) id PPP32J52; Thu, 26 Jul 2001 09:13:05 -0500 Message-Id: <3B60259F.670E6B10@lsil.com> Date: Thu, 26 Jul 2001 09:13:51 -0500 From: Noah Romer Organization: LSI Logic X-Mailer: Mozilla 4.76 [en] (X11; U; Linux 2.4.2-2 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev CC: "Romer, Noah" Subject: 'NETDEV WATCHDOG: xxx: transmit timed out' question Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1405 Lines: 35 Over the past few weeks, I've seen what looks to me like "odd" behavior from the netdev watchdog. Specifically, if the fusion mpt driver does a "netif_stop_queue(dev);" for a device that has no outgoing packets, I still immediately start getting 'NETDEV WATCHDOG: xxx: transmit timed out' messages on the console. It seems that the netdev watchdog should only start issuing such messages if there are actuall packets waiting to be transmitted. FYI, I'm running the 2.4.5 kernel with the ikd patches, but I've seen this with 2.4.0, 2.4.1 and 2.4.2 (stock and ikd/kdb patched). Here's the most common scenario: 1) boot system 2) load fusion mpt driver modules (say, mptbase and mptlan) 3) `ifup fc0` 4) `mptreset 0` 5) watch the netdev watchdog messages stream across the console while the reset completes Just as a point of information, `mptreset` is an in-house tool (not distributed with the drivers) that forces a diagnostic reset of the adapter, at the start of which the mptlan module does a netif_stop_queue for any of its active network devices. The fusion mpt drivers finally showed up in the stock kernel in the 2.4.7 release (drivers/message/fusion), if anyone is wondering where they are. The driver version is a bit out of date, but that will hopefully be fixed RSN. Thanks. -- Noah Romer Driver Developer, CM gopher and Linux Whipping Boy Storage Components Firmware LSI Logic Corp. From owner-netdev@oss.sgi.com Thu Jul 26 09:40:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QGegC15340 for netdev-outgoing; Thu, 26 Jul 2001 09:40:42 -0700 Received: from e22.nc.us.ibm.com (e22.nc.us.ibm.com [32.97.136.228]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QGeeV15337 for ; Thu, 26 Jul 2001 09:40:40 -0700 Received: from southrelay03.raleigh.ibm.com (southrelay03.raleigh.ibm.com [9.37.3.210]) by e22.nc.us.ibm.com (8.9.3/8.9.3) with ESMTP id LAA40464; Thu, 26 Jul 2001 11:38:23 -0500 Received: from gateway.sequent.com (gateway.sequent.com [138.95.180.1]) by southrelay03.raleigh.ibm.com (8.11.1m3/NCO v4.97) with ESMTP id f6QGeM893320; Thu, 26 Jul 2001 12:40:22 -0400 Received: from eng4.sequent.com (eng4.sequent.com [138.95.7.64]) by gateway.sequent.com (8.10.0.Beta10/8.8.5) with ESMTP id f6QGeKe13177; Thu, 26 Jul 2001 09:40:20 -0700 (PDT) Received: (from nivedita@localhost) by eng4.sequent.com (8.8.5/8.8.5/token.aware-1.2) id JAA00799; Thu, 26 Jul 2001 09:40:19 -0700 (PDT) From: Nivedita Singhvi Message-Id: <200107261640.JAA00799@eng4.sequent.com> Subject: Re: Linux TCP/IP stack (fwd) To: rabeeh@galileo.co.il, netdev@oss.sgi.com Date: Thu, 26 Jul 2001 09:40:18 -0700 (PDT) X-Mailer: ELM [version 2.5 PL3] MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1448 Lines: 34 > Hi Andi, > Thanks for your comments and your help. > I'm aware to RFC 793 statement regarding to ACK. > My problem is that these piggyback ACK can cause duplicate ACK and therefor invoke the > Fast Retransmit mechanism. > Consider the following situation when the sender send TCP segment and the receiver send > ACK. > Then the sender send another 3 (or more) segments, and meanwhile it receives 3 > piggyback ACKs that have been sent by the receiver (i.e. the receive send 3 TCP frame > before it receives the 3 packets that has been sent by the Sender). > The sender can assume that these ACKs are duplicate ACK and therefor it enters into > Fast Retransmit although it shouldn't Fast retransmit is not invoked in that case, since we dont consider it a duplicate ack if it were piggybacked on data. When we process an incoming ack (regardless of fast path or slow, we end up in tcp_ack()). If the incoming frame contained data, or if the ack was a window update, and of course if the ack acknowledged new data or a SYN, we wouldnt consider it a dubious ack and wouldnt fall into some fairly complex processing that leads to fast retransmit... Are you seeing retransmissions and fast retransmits? You can look at /proc/net/netstat and see the TCPFastRetrans counter, for example, and determine if thats happening. There are some other useful counters in /proc/net/netstat that might be interesting.. Hope that helps, thanks, Nivedita From owner-netdev@oss.sgi.com Thu Jul 26 10:03:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QH3mA15967 for netdev-outgoing; Thu, 26 Jul 2001 10:03:48 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QH3kV15964 for ; Thu, 26 Jul 2001 10:03:46 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA30153; Thu, 26 Jul 2001 21:03:31 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id KAA00350; Thu, 26 Jul 2001 10:47:01 +0400 Message-Id: <200107260647.KAA00350@mops.inr.ac.ru> Subject: Re: conflicting alignment requirements To: jacoba@cisco.COM (Jacob Avraham) Date: Thu, 26 Jul 2001 10:47:01 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: from "Jacob Avraham" at Jul 25, 1 08:15:00 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 295 Lines: 10 Hello! > > > On the other hand, some upper layer code, like tc > > > (for non x86/68k), checks if the IP header is 4 byte align, Do you know any architecture except for these ones which are able to generate unaligned packets? I do not. And if you do, please, enlighten me about them. Alexey From owner-netdev@oss.sgi.com Thu Jul 26 10:03:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QH3nL15982 for netdev-outgoing; Thu, 26 Jul 2001 10:03:49 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QH3kV15963 for ; Thu, 26 Jul 2001 10:03:46 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA30159; Thu, 26 Jul 2001 21:03:32 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id SAA00576; Thu, 26 Jul 2001 18:57:33 +0400 Message-Id: <200107261457.SAA00576@mops.inr.ac.ru> Subject: Re: static routes and dead gateway detection To: ja@ssi.bg (Julian Anastasov) Date: Thu, 26 Jul 2001 18:57:33 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: from "Julian Anastasov" at Jul 25, 1 08:52:31 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 546 Lines: 20 Hello! > - fib_sync_down not to set DEAD in fib_flags on dev down for > RTPROT_STATIC (change required) Yes. > - Do we need changes in fn_hash_select_default? check for all nh_flags&DEAD ? Seems, yes. Grrr... Now it is not checked at all! Bug when the route happens to be multipath. Ha! No, this place is unreachable, being blocking in route.c. Not a bug, but looks ugly. Heh... The first thing which we need is to fix the bug with holding references to dead devices. All the rest is easy and have to be carefully audited anyway. Alexey From owner-netdev@oss.sgi.com Thu Jul 26 10:47:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QHlQd17181 for netdev-outgoing; Thu, 26 Jul 2001 10:47:26 -0700 Received: from colin.muc.de (root@colin.muc.de [193.149.48.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QHlPV17176 for ; Thu, 26 Jul 2001 10:47:25 -0700 Received: by colin.muc.de id <140648-1>; Thu, 26 Jul 2001 19:47:44 +0200 Message-ID: <20010726194735.05017@colin.muc.de> Date: Thu, 26 Jul 2001 19:47:35 +0200 From: Andi Kleen To: Rami Cohen Cc: Andi Kleen , Rabeeh Khoury , netdev@oss.sgi.com Subject: Re: Linux TCP/IP stack References: <3B5FFCD7.1070602@galileo.co.il> <20010726144045.56795@colin.muc.de> <3B602365.AAF773CF@galileo.co.il> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.88e In-Reply-To: <3B602365.AAF773CF@galileo.co.il>; from Rami Cohen on Thu, Jul 26, 2001 at 04:04:21PM +0200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 870 Lines: 20 On Thu, Jul 26, 2001 at 04:04:21PM +0200, Rami Cohen wrote: > Hi Andi, > Thanks for your comments and your help. > I'm aware to RFC 793 statement regarding to ACK. > My problem is that these piggyback ACK can cause duplicate ACK and therefor invoke the > Fast Retransmit mechanism. > Consider the following situation when the sender send TCP segment and the receiver send > ACK. > Then the sender send another 3 (or more) segments, and meanwhile it receives 3 > piggyback ACKs that have been sent by the receiver (i.e. the receive send 3 TCP frame > before it receives the 3 packets that has been sent by the Sender). > The sender can assume that these ACKs are duplicate ACK and therefor it enters into > Fast Retransmit although it shouldn't. > > Can you help me with this issue. The dupack counter only counts acks which do not carry any data ("pure acks") -Andi From owner-netdev@oss.sgi.com Thu Jul 26 13:54:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QKsLi16935 for netdev-outgoing; Thu, 26 Jul 2001 13:54:21 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QKsIV16930 for ; Thu, 26 Jul 2001 13:54:19 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15Ps9T-0005WJ-00 for netdev@oss.sgi.com; Thu, 26 Jul 2001 22:54:35 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15Pgtq-0000J9-00 for netdev@oss.sgi.com; Thu, 26 Jul 2001 05:53:42 -0300 Date: Thu, 26 Jul 2001 05:53:42 -0300 From: Harald Welte To: netdev@oss.sgi.com Subject: multicast on local host without real networking Message-ID: <20010726055342.B1033@obroa-skai.gnumonks.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7 X-Date: Today is Boomtime, the 61st day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1335 Lines: 31 Hi! While developing some multicast code (related to user-mode-linux) I hit a sort-of strange behaviour of the linux ipv4 multicast code. Let's assume you have a box without any real network devices, but still want to do ipv4 multicast between local processes on that host. Yes, I know, this is not the most intelligent way of doing local IPC between processes, but still it seems valid to me. I mean, you can write the same code which would run distributed over the multicast network or only locally on the same box. The problem is, that if you don't have any multicast capable network device, IP_ADD_MEMBERSHIP returns with ENODEV. 'ifconfig lo multicast' and adding a route to the all-multicast network to loopback doesn't work either. As soon as you have an ethernet device on the system, multicast between local processes starts to work. The question is, if this is desired behaviour. If yes, please explain why. If not, I will have a look on how to solve the problem and send a patch. Thanks. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Thu Jul 26 15:21:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QMLXd31562 for netdev-outgoing; Thu, 26 Jul 2001 15:21:33 -0700 Received: from mail.playnet.com ([208.134.143.150]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QMLVV31556 for ; Thu, 26 Jul 2001 15:21:31 -0700 Received: from vandal ([208.134.143.158]) by mail.playnet.com (8.9.1/8.9.1) with SMTP id RAA12134 for ; Thu, 26 Jul 2001 17:21:14 -0500 (CDT) Message-ID: <01ad01c11621$7ce50440$0b32a8c0@playnet.com> From: "Marty Poulin" To: Subject: Fw: oops/bug in tcp, SACK doesn't work? Date: Thu, 26 Jul 2001 17:22:13 -0500 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1771 Lines: 50 Alan Cox suggested that I forward this here. ----- Original Message ----- From: "Marty Poulin" To: "Linux-Kernel" Sent: Thursday, July 26, 2001 3:19 PM Subject: oops/bug in tcp, SACK doesn't work? > Perhaps this has been covered somewhere before, but for some reason it > doesn't look like the 2.4.7 (and previous 2.4.x?) kernels responds to SACK > correctly. Instead of just resending the missing packet Linux resends the > entire packet stream as if it never received the SACK. > > Only reason I noticed this was that I was debugging connection problems with > our servers that were running 2.4.5. I didn't figure the problem out for > several days, when I exhausted all else I decided it must be the checksum of > the retransmitted packets. With that in hand a simple google search turned > up that there was already a patch for this included in the 2.4.7 kernel. > Doh! > > Hence I am now scanning through 100-200 emails a day with the rest of you > just trying to keep up on the issues and bugs that affect me. There must be > a place to look for current and fixed bugs without pouring over change logs > and the entire mailing list? > > In any case both of these problems were easily duplicated with three > machines. One of the machines was used as a router running NIST net emulator > ( http://snad.ncsl.nist.gov/itg/nistnet/ ) that allows you to set packet > delay, bandwidth and loss. This is a free implementation for Linux that is > currently in useable alpha (yup sometimes it crashes the router when > loaded), but hey it works reliable enough to get some testing done. > > > Marty Poulin > vandal@playnet.com > Lead Programmer > Host/Client Communications > Playnet Inc./Cornered Rat Software > From owner-netdev@oss.sgi.com Thu Jul 26 15:44:36 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6QMiaD03296 for netdev-outgoing; Thu, 26 Jul 2001 15:44:36 -0700 Received: from u.domain.uli (ja.ssi.bg [212.95.166.64]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6QMiVV03278 for ; Thu, 26 Jul 2001 15:44:32 -0700 Received: from localhost (IDENT:ja@localhost [127.0.0.1]) by u.domain.uli (8.11.0/8.11.0) with ESMTP id f6R1jO608815; Fri, 27 Jul 2001 01:45:25 GMT Date: Fri, 27 Jul 2001 01:45:24 +0000 (GMT) From: Julian Anastasov X-X-Sender: To: Alexey Kuznetsov cc: Subject: Re: static routes and dead gateway detection In-Reply-To: <200107261457.SAA00576@mops.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: MULTIPART/MIXED; BOUNDARY="1607745702-820476079-996198324=:8771" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4708 Lines: 96 This message is in MIME format. The first part should be readable text, while the remaining parts are likely unreadable without MIME-aware tools. Send mail to mime@docserver.cac.washington.edu for more info. --1607745702-820476079-996198324=:8771 Content-Type: TEXT/PLAIN; charset=US-ASCII Hello Alexey, On Thu, 26 Jul 2001, Alexey Kuznetsov wrote: > Heh... The first thing which we need is to fix the bug with holding > references to dead devices. All the rest is easy and have to be carefully > audited anyway. OK, I'm attaching a draft version (against 2.4.7) for your backlog :) It works, I tested it with permanent devices (eth0) and with dummy module. nh_dev is now cleared. Now it survives dummy module unload with dead nexthop in multipath route without printing emergency messages from unregister_netdevice. Additionally I touched something for RTN_NAT in fib_semantic_match The funny part: iproute now does not show the "dead" text for single-path routes because the nexthop flag is not checked. Someone have to change the user space if that is fatal :))) > Alexey Regards -- Julian Anastasov --1607745702-820476079-996198324=:8771 Content-Type: TEXT/PLAIN; charset=US-ASCII; name="dead-2.4.7-1.diff" Content-Transfer-Encoding: BASE64 Content-ID: Content-Description: the static routes now survive device state change Content-Disposition: attachment; filename="dead-2.4.7-1.diff" LS0tIHYyLjQuNy9saW51eC9uZXQvaXB2NC9maWJfZnJvbnRlbmQuYwlTYXQg SnVsICA3IDEyOjIwOjEwIDIwMDENCisrKyBsaW51eC9uZXQvaXB2NC9maWJf ZnJvbnRlbmQuYwlUaHUgSnVsIDI2IDIyOjEyOjI0IDIwMDENCkBAIC02MTMs OSArNjEzLDcgQEANCiAJCWZvcl9pZmEoaW5fZGV2KSB7DQogCQkJZmliX2Fk ZF9pZmFkZHIoaWZhKTsNCiAJCX0gZW5kZm9yX2lmYShpbl9kZXYpOw0KLSNp ZmRlZiBDT05GSUdfSVBfUk9VVEVfTVVMVElQQVRIDQogCQlmaWJfc3luY191 cChkZXYpOw0KLSNlbmRpZg0KIAkJcnRfY2FjaGVfZmx1c2goLTEpOw0KIAkJ YnJlYWs7DQogCWNhc2UgTkVUREVWX0RPV046DQotLS0gdjIuNC43L2xpbnV4 L25ldC9pcHY0L2ZpYl9zZW1hbnRpY3MuYwlUdWUgT2N0IDE3IDIwOjQzOjE0 IDIwMDANCisrKyBsaW51eC9uZXQvaXB2NC9maWJfc2VtYW50aWNzLmMJRnJp IEp1bCAyNyAwMToyMzoyNCAyMDAxDQpAQCAtNTgxLDYgKzU4MSwxMCBAQA0K ICNpZmRlZiBDT05GSUdfSVBfUk9VVEVfTkFUDQogCQljYXNlIFJUTl9OQVQ6 DQogCQkJRklCX1JFU19SRVNFVCgqcmVzKTsNCisJCQlpZiAoRklCX1JFU19O SCgqcmVzKS5uaF9mbGFncyAmIFJUTkhfRl9ERUFEKSB7DQorCQkJCXJlcy0+ ZmkgPSBOVUxMOw0KKwkJCQlyZXR1cm4gMTsNCisJCQl9DQogCQkJYXRvbWlj X2luYygmZmktPmZpYl9jbG50cmVmKTsNCiAJCQlyZXR1cm4gMDsNCiAjZW5k aWYNCkBAIC04NjcsMTYgKzg3MSwyOCBAQA0KIAkJCWludCBkZWFkID0gMDsN CiANCiAJCQljaGFuZ2VfbmV4dGhvcHMoZmkpIHsNCi0JCQkJaWYgKG5oLT5u aF9mbGFncyZSVE5IX0ZfREVBRCkNCi0JCQkJCWRlYWQrKzsNCi0JCQkJZWxz ZSBpZiAobmgtPm5oX2RldiA9PSBkZXYgJiYNCisJCQkJaWYgKG5oLT5uaF9m bGFncyZSVE5IX0ZfREVBRCkgew0KKwkJCQkJaWYgKGZpLT5maWJfcHJvdG9j b2whPVJUUFJPVF9TVEFUSUMgfHwNCisJCQkJCQluaC0+bmhfZGV2ID09IE5V TEwgfHwNCisJCQkJCQkhX19pbl9kZXZfZ2V0KG5oLT5uaF9kZXYpIHx8DQor CQkJCQkJbmgtPm5oX2Rldi0+ZmxhZ3MmSUZGX1VQKQ0KKwkJCQkJCWRlYWQr KzsNCisJCQkJfSBlbHNlIGlmIChuaC0+bmhfZGV2ID09IGRldiAmJg0KIAkJ CQkJIG5oLT5uaF9zY29wZSAhPSBzY29wZSkgew0KIAkJCQkJbmgtPm5oX2Zs YWdzIHw9IFJUTkhfRl9ERUFEOw0KICNpZmRlZiBDT05GSUdfSVBfUk9VVEVf TVVMVElQQVRIDQogCQkJCQlmaS0+ZmliX3Bvd2VyIC09IG5oLT5uaF9wb3dl cjsNCiAJCQkJCW5oLT5uaF9wb3dlciA9IDA7DQogI2VuZGlmDQotCQkJCQlk ZWFkKys7DQorCQkJCQlpZiAoZmktPmZpYl9wcm90b2NvbCE9UlRQUk9UX1NU QVRJQyB8fA0KKwkJCQkJCWZvcmNlIHx8DQorCQkJCQkJX19pbl9kZXZfZ2V0 KGRldikgPT0gTlVMTCkNCisJCQkJCQlkZWFkKys7DQorCQkJCX0NCisJCQkJ aWYgKG5oLT5uaF9mbGFncyZSVE5IX0ZfREVBRCAmJiBmb3JjZSAmJg0KKwkJ CQkJbmgtPm5oX2RldiA9PSBkZXYpIHsNCisJCQkJCWRldl9wdXQobmgtPm5o X2Rldik7DQorCQkJCQluaC0+bmhfZGV2ID0gTlVMTDsNCiAJCQkJfQ0KIAkJ CX0gZW5kZm9yX25leHRob3BzKGZpKQ0KIAkJCWlmIChkZWFkID09IGZpLT5m aWJfbmhzKSB7DQpAQCAtODg4LDExICs5MDQsOCBAQA0KIAlyZXR1cm4gcmV0 Ow0KIH0NCiANCi0jaWZkZWYgQ09ORklHX0lQX1JPVVRFX01VTFRJUEFUSA0K LQ0KIC8qDQogICAgRGVhZCBkZXZpY2UgZ29lcyB1cC4gV2Ugd2FrZSB1cCBk ZWFkIG5leHRob3BzLg0KLSAgIEl0IHRha2VzIHNlbnNlIG9ubHkgb24gbXVs dGlwYXRoIHJvdXRlcy4NCiAgKi8NCiANCiBpbnQgZmliX3N5bmNfdXAoc3Ry dWN0IG5ldF9kZXZpY2UgKmRldikNCkBAIC05MDYsMTAgKzkxOSw4IEBADQog CQlpbnQgYWxpdmUgPSAwOw0KIA0KIAkJY2hhbmdlX25leHRob3BzKGZpKSB7 DQotCQkJaWYgKCEobmgtPm5oX2ZsYWdzJlJUTkhfRl9ERUFEKSkgew0KLQkJ CQlhbGl2ZSsrOw0KKwkJCWlmICghKG5oLT5uaF9mbGFncyZSVE5IX0ZfREVB RCkpDQogCQkJCWNvbnRpbnVlOw0KLQkJCX0NCiAJCQlpZiAobmgtPm5oX2Rl diA9PSBOVUxMIHx8ICEobmgtPm5oX2Rldi0+ZmxhZ3MmSUZGX1VQKSkNCiAJ CQkJY29udGludWU7DQogCQkJaWYgKG5oLT5uaF9kZXYgIT0gZGV2IHx8IF9f aW5fZGV2X2dldChkZXYpID09IE5VTEwpDQpAQCAtOTI2LDYgKzkzNyw4IEBA DQogCX0gZW5kZm9yX2ZpYl9pbmZvKCk7DQogCXJldHVybiByZXQ7DQogfQ0K Kw0KKyNpZmRlZiBDT05GSUdfSVBfUk9VVEVfTVVMVElQQVRIDQogDQogLyoN CiAgICBUaGUgYWxnb3JpdGhtIGlzIHN1Ym9wdGltYWwsIGJ1dCBpdCBwcm92 aWRlcyByZWFsbHkNCg== --1607745702-820476079-996198324=:8771-- From owner-netdev@oss.sgi.com Fri Jul 27 08:18:42 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6RFIgx13573 for netdev-outgoing; Fri, 27 Jul 2001 08:18:42 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6RFIcV13569; Fri, 27 Jul 2001 08:18:38 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id TAA13742; Fri, 27 Jul 2001 19:18:30 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id LAA01031; Fri, 27 Jul 2001 11:57:43 +0400 Message-Id: <200107270757.LAA01031@mops.inr.ac.ru> Subject: Re: ping bug To: ralf@oss.sgi.com (Ralf Baechle) Date: Fri, 27 Jul 2001 11:57:43 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <20010723135313.A1439@bacchus.dhis.org> from "Ralf Baechle" at Jul 23, 1 04:45:01 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 374 Lines: 13 Hello! > under rather extreme circumstance on a network simulator: > > > 1458 packets transmitted, 1000 packets received, 31% packet loss > > round-trip min/avg/max/mdev = 4668.765/379.140/4683.875/4658.706 ms This happens after sum of rtts exceed ~4000 seconds. I.e. with rtt of 4 seconds after ~1000 samples. Seems, have to use long long to extend this. Sigh. Alexey From owner-netdev@oss.sgi.com Fri Jul 27 08:18:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6RFIo113584 for netdev-outgoing; Fri, 27 Jul 2001 08:18:50 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6RFImV13581 for ; Fri, 27 Jul 2001 08:18:48 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id TAA13756; Fri, 27 Jul 2001 19:18:33 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id JAA00742; Fri, 27 Jul 2001 09:28:44 +0400 Message-Id: <200107270528.JAA00742@mops.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.ORG (clemens) Date: Fri, 27 Jul 2001 09:28:43 +0400 (MSD) Cc: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu In-Reply-To: <20010725220029.A11360@ghanima.endorphin.org> from "clemens" at Jul 26, 1 00:15:01 am From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 581 Lines: 14 Hello! > 21:35:58.096727 guardian.51277 > ghanima.endorphin.org.echo: udp 0 > 21:35:58.096871 guardian.51277 > ghanima.endorphin.org.8: udp 0 > 21:35:58.097673 guardian.51277 > ghanima.endorphin.org.discard: udp 0 > 21:35:58.098479 guardian.51277 > ghanima.endorphin.org.1: udp 0 > 21:35:58.099285 guardian.51277 > ghanima.endorphin.org.2: udp 0 > 21:35:58.100029 guardian.51277 > ghanima.endorphin.org.3: udp 0 > 21:35:58.100721 guardian.51277 > ghanima.endorphin.org.6: udp 0 > ..and so on. Check ICMP error rate limits: /proc/sys/net/ipv4/icmp_destunreach_rate Alexey From owner-netdev@oss.sgi.com Fri Jul 27 08:24:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6RFOWo13954 for netdev-outgoing; Fri, 27 Jul 2001 08:24:32 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6RFOTV13951; Fri, 27 Jul 2001 08:24:29 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 80CCB51B7; Sat, 28 Jul 2001 03:24:53 +1200 (NZST) Date: Sat, 28 Jul 2001 03:24:53 +1200 From: Chris Wedgwood To: Alexey Kuznetsov Cc: Ralf Baechle , netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010728032453.C852@weta.f00f.org> References: <200107270757.LAA01031@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107270757.LAA01031@mops.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 325 Lines: 12 On Fri, Jul 27, 2001 at 11:57:43AM +0400, Alexey Kuznetsov wrote: This happens after sum of rtts exceed ~4000 seconds. I.e. with rtt of 4 seconds after ~1000 samples. Seems, have to use long long to extend this. Sigh. Is float really that bad? It's not like your doing 50K or these per second... --cw From owner-netdev@oss.sgi.com Fri Jul 27 08:42:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6RFg3x14520 for netdev-outgoing; Fri, 27 Jul 2001 08:42:03 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6RFfxV14511; Fri, 27 Jul 2001 08:41:59 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA15375; Fri, 27 Jul 2001 19:41:46 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107271541.TAA15375@ms2.inr.ac.ru> Subject: Re: ping bug To: cw@f00f.org (Chris Wedgwood) Date: Fri, 27 Jul 2001 19:41:46 +0400 (MSK DST) Cc: ralf@oss.sgi.com, netdev@oss.sgi.com In-Reply-To: <20010728032453.C852@weta.f00f.org> from "Chris Wedgwood" at Jul 28, 1 03:24:53 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 285 Lines: 11 Hello! > Is float really that bad? It's not like your doing 50K or these per > second... :-) Their number is not important at all, they are not slower nowadays. Pardon, I wash hands twice each time when have to write "double". Need to visit psychoanalyst to cure this. :-) Alexey From owner-netdev@oss.sgi.com Fri Jul 27 13:08:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6RK88r23371 for netdev-outgoing; Fri, 27 Jul 2001 13:08:08 -0700 Received: from dea.waldorf-gmbh.de (u-131-18.karlsruhe.ipdial.viaginterkom.de [62.180.18.131]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6RK82V23361 for ; Fri, 27 Jul 2001 13:08:03 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f6RK7AA03194; Fri, 27 Jul 2001 22:07:10 +0200 Date: Fri, 27 Jul 2001 22:07:10 +0200 From: Ralf Baechle To: kuznet@ms2.inr.ac.ru Cc: Chris Wedgwood , netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010727220710.A3011@bacchus.dhis.org> References: <20010728032453.C852@weta.f00f.org> <200107271541.TAA15375@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200107271541.TAA15375@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Fri, Jul 27, 2001 at 07:41:46PM +0400 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 441 Lines: 13 On Fri, Jul 27, 2001 at 07:41:46PM +0400, kuznet@ms2.inr.ac.ru wrote: > > Is float really that bad? It's not like your doing 50K or these per > > second... > > :-) Their number is not important at all, they are not slower nowadays. > > Pardon, I wash hands twice each time when have to write "double". > Need to visit psychoanalyst to cure this. :-) Dr. Kuznesov, I always have this itching in those two unused FP pipelines ;-) Ralf From owner-netdev@oss.sgi.com Fri Jul 27 18:39:31 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6S1dVs29873 for netdev-outgoing; Fri, 27 Jul 2001 18:39:31 -0700 Received: from mail.ocs.com.au (ppp0.ocs.com.au [203.34.97.3]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6S1dSV29863 for ; Fri, 27 Jul 2001 18:39:29 -0700 Received: (qmail 2193 invoked from network); 28 Jul 2001 01:39:25 -0000 Received: from ocs3.ocs-net (192.168.255.3) by mail.ocs.com.au with SMTP; 28 Jul 2001 01:39:25 -0000 X-Mailer: exmh version 2.1.1 10/15/1999 From: Keith Owens To: netdev@oss.sgi.com Subject: Incorrect drivers/net/Makefile for compressed slip Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Date: Sat, 28 Jul 2001 11:39:24 +1000 Message-ID: <1561.996284364@ocs3.ocs-net> Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1195 Lines: 36 drivers/net/Makefile 2.4.7 contains this. obj-$(CONFIG_SLIP) += slip.o ifeq ($(CONFIG_SLIP),y) obj-$(CONFIG_SLIP_COMPRESSED) += slhc.o else ifeq ($(CONFIG_SLIP),m) obj-$(CONFIG_SLIP_COMPRESSED) += slhc.o endif endif CONFIG_SLIP is tristate, CONFIG_SLIP_COMPRESSED is bool. The effect of CONFIG_SLIP=m, CONFIG_SLIP_COMPRESSED=y is to make slip.o a module but slhc.o is linked into vmlinux.o. Since slhc.c has init_module routines, it was obviously intended to be a module. IMHO the Makefile should be Index: 7.1/drivers/net/Makefile --- 7.1/drivers/net/Makefile Tue, 10 Jul 2001 10:31:22 +1000 kaos (linux-2.4/l/c/26_Makefile 1.1.1.1.3.3.1.1.1.2.1.2 644) +++ 7.1(w)/drivers/net/Makefile Sat, 28 Jul 2001 11:37:22 +1000 kaos (linux-2.4/l/c/26_Makefile 1.1.1.1.3.3.1.1.1.2.1.2 644) @@ -137,12 +137,8 @@ obj-$(CONFIG_PPP_BSDCOMP) += bsd_comp.o obj-$(CONFIG_PPPOE) += pppox.o pppoe.o obj-$(CONFIG_SLIP) += slip.o -ifeq ($(CONFIG_SLIP),y) - obj-$(CONFIG_SLIP_COMPRESSED) += slhc.o -else - ifeq ($(CONFIG_SLIP),m) - obj-$(CONFIG_SLIP_COMPRESSED) += slhc.o - endif +ifeq ($(CONFIG_SLIP_COMPRESSED),y) + obj-$(CONFIG_SLIP) += slhc.o endif obj-$(CONFIG_STRIP) += strip.o From owner-netdev@oss.sgi.com Sat Jul 28 07:10:55 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6SEAtk03627 for netdev-outgoing; Sat, 28 Jul 2001 07:10:55 -0700 Received: from mail-verteiler.de ([212.63.136.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6SEAoV03621 for ; Sat, 28 Jul 2001 07:10:51 -0700 Received: from werner-elektrotechnik.com [212.63.156.36] by mail-verteiler.de with ESMTP (SMTPD32-6.06) id A8158410010E; Sat, 28 Jul 2001 16:11:33 +0200 Received: from tux.we.de [217.0.52.137] by werner-elektrotechnik.com with ESMTP (SMTPD32-6.06) id A7F3394302C2; Sat, 28 Jul 2001 16:10:59 +0200 Received: from werner-elektrotechnik.com (pcbwe.we.de [192.168.0.50]) by tux.we.de (8.11.2/8.8.8) with ESMTP id f6SFmrq14409 for ; Sat, 28 Jul 2001 17:48:53 +0200 Message-ID: <3B62C443.82A327C@werner-elektrotechnik.com> Date: Sat, 28 Jul 2001 15:55:15 +0200 From: Bernhard Werner X-Mailer: Mozilla 4.75 [en] (WinNT; U) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: TCP 4way handshake Content-Type: multipart/mixed; boundary="------------B60521D6963ED9D9EE25D945" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 8093 Lines: 148 This is a multi-part message in MIME format. --------------B60521D6963ED9D9EE25D945 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit Hello Alan, Thank you for you quick answer. Sorry again, for contacting you directly, I unfortunately read the linux-kernel faq yesterday night only. here's the tcpdump output taken with kernel 2.4.4 and kernel 2.2.18 both from SuSE distribution, Versionm 7.2 and 7.1 respectively. 2.2.18 didn't show the double ack, 2.4.4 does. I include some C-source that I used to track that down. The programm only connect()s, sleeps(10) and close()s. Hope that helps to isolate the bug. Port 7 (echo) appeared to be quite nice for this trace, since no application uses it, so it's easy to grep with tcpdump. kind regards from Germany. Linux is used here too - I bet you knew already. Thanks to you and Linus and the whole community for that great OPERATING SYSTEM. Bernhard. -- Werner Elektrotechnik GmbH Tel: +49-6123-9076-21 Bernhard Werner Fax: +49-6123-9076-31 Erbacher Straße 17 D-65343 Eltville/Rheingau http://www.werner-elektrotechnik.com Germany bernhard.werner@werner-elektrotechnik.com --------------B60521D6963ED9D9EE25D945 Content-Type: application/x-unknown-content-type-txtfile; name="tcp_2218.log" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="tcp_2218.log" MTc6MTc6MTcuNzM4MTM2IDE5Mi4xNjguMC4yNTQuMTk0MyA+IDE5Mi4xNjguMC4yNTIuZWNo bzogUyAxOTc0NjMwOTM6MTk3NDYzMDkzKDApIHdpbiA1ODQwIDxtc3MgMTQ2MCxzYWNrT0ss dGltZXN0YW1wIDEwODM3NzczIDAsbm9wLHdzY2FsZSAwPiAoREYpCjE3OjE3OjE3LjczODQy OCAxOTIuMTY4LjAuMjUyLmVjaG8gPiAxOTIuMTY4LjAuMjU0LjE5NDM6IFMgMzMyMzkwMjQ2 OTozMzIzOTAyNDY5KDApIGFjayAxOTc0NjMwOTQgd2luIDU3OTIgPG1zcyAxNDYwLHNhY2tP Syx0aW1lc3RhbXAgNDM0NTYxMjkgMTA4Mzc3NzMsbm9wLHdzY2FsZSAwPiAoREYpCjE3OjE3 OjE3LjczODU3NiAxOTIuMTY4LjAuMjU0LjE5NDMgPiAxOTIuMTY4LjAuMjUyLmVjaG86IC4g MToxKDApIGFjayAxIHdpbiA1ODQwIDxub3Asbm9wLHRpbWVzdGFtcCAxMDgzNzc3MyA0MzQ1 NjEyOT4gKERGKQoxNzoxNzoyNy43NDAyMDMgMTkyLjE2OC4wLjI1NC4xOTQzID4gMTkyLjE2 OC4wLjI1Mi5lY2hvOiBGIDE6MSgwKSBhY2sgMSB3aW4gNTg0MCA8bm9wLG5vcCx0aW1lc3Rh bXAgMTA4Mzg3NzQgNDM0NTYxMjk+IChERikKMTc6MTc6MjcuNzQwNzgyIDE5Mi4xNjguMC4y NTIuZWNobyA+IDE5Mi4xNjguMC4yNTQuMTk0MzogRiAxOjEoMCkgYWNrIDIgd2luIDU3OTIg PG5vcCxub3AsdGltZXN0YW1wIDQzNDU3MTI5IDEwODM4Nzc0PiAoREYpCjE3OjE3OjI3Ljc0 MDkzMiAxOTIuMTY4LjAuMjU0LjE5NDMgPiAxOTIuMTY4LjAuMjUyLmVjaG86IC4gMjoyKDAp IGFjayAyIHdpbiA1ODQwIDxub3Asbm9wLHRpbWVzdGFtcCAxMDgzODc3NCA0MzQ1NzEyOT4g KERGKQo= --------------B60521D6963ED9D9EE25D945 Content-Type: application/x-unknown-content-type-cfile; name="main.c" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="main.c" LyoqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKgogICAgICAgICAgICAgICAgICAgICAgICAgIHRjcF9j b25uZWN0LmNwcCAgLSAgZGVzY3JpcHRpb24KICAgICAgICAgICAgICAgICAgICAgICAgICAg ICAtLS0tLS0tLS0tLS0tLS0tLS0tCiAgICBiZWdpbiAgICAgICAgICAgICAgICA6IE1vbiBN YXIgMTIgMjAwMQogICAgY29weXJpZ2h0ICAgICAgICAgICAgOiAoQykgMjAwMSBieSBid2UK ICAgCiAqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioqKioq KioqKioqKioqKioqKioqKioqKioqKioqKiovCgojaW5jbHVkZSA8dW5pc3RkLmg+CiNpbmNs dWRlIDxzdGRpby5oPgojaW5jbHVkZSA8c3RyaW5nLmg+CiNpbmNsdWRlIDxzdGRsaWIuaD4K I2luY2x1ZGUgPGVycm5vLmg+CgojaW5jbHVkZSA8YXJwYS9pbmV0Lmg+CiNpbmNsdWRlIDxu ZXRpbmV0L2luLmg+CiNpbmNsdWRlIDxzeXMvc29ja2V0Lmg+CiNpbmNsdWRlIDxuZXRkYi5o PgoKI2luY2x1ZGUgPHRpbWUuaD4KI2luY2x1ZGUgPHN5cy90aW1lLmg+CgpjaGFyKiAgdGFy Z2V0X3NlcnZlciA9ICIiOwpzaG9ydCAgdGFyZ2V0X3BvcnQ9MDsKCmludCBzb2NrZXRfaGFu ZGxlID0gMDsKc3RydWN0IGluX2FkZHIgdGNwNGFkZHI7CgoKaW50IGdldF9ob3N0X2luZm8o Y29uc3QgY2hhciogc2VydmVyKQp7CiAgICBzdHJ1Y3QgaW5fYWRkcioqIGlwcDsKICAgIHN0 cnVjdCBob3N0ZW50KiBoZXA7CgogICAgaGVwID0gZ2V0aG9zdGJ5bmFtZShzZXJ2ZXIpOwoK ICAgIGlmKCBoZXAgIT0gMCkgewoKCS8qIGlwIGFkZHJlc3MgbGlzdCAqLwoKCWlmKCAoaXBw ID0gKHN0cnVjdCBpbl9hZGRyKiopIGhlcC0+aF9hZGRyX2xpc3QpICE9IDAgKSB7CgkgICAg bWVtY3B5KCAgJnRjcDRhZGRyLCAqaXBwLCAgc2l6ZW9mKCB0Y3A0YWRkcikgKTsKCX0KCWVs c2UgewoJICAgIGZwcmludGYoc3RkZXJyLCAiZ2V0aG9zdGJ5bmFtZSByZXR1cm4gbnVsbC1w b2ludGVyXG4iKTsKCSAgICBleGl0KDEpOwoJfQogICAgfQogICAgZWxzZSB7CglmcHJpbnRm KHN0ZGVyciwgImNvdWxkbnQgcmVzb2x2ZSBzZXJ2ZXIgbmFtZSAlc1xuIiwgc2VydmVyKTsK CXJldHVybiAtaF9lcnJubzsKICAgIH0KICAgIHJldHVybiAwOwp9CgoKCgovKiAtLS0tLS0t LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tICovCgoKaW50IG1haW4oaW50IGFyZ2MsIGNoYXIqIGFyZ3ZbXSkKewogICAg dHlwZWRlZiBzdHJ1Y3Qgc29ja2FkZHIgU0E7CiAgICBzdHJ1Y3Qgc29ja2FkZHJfaW4gc2Vy dmFkZHI7CiAgICBzb2NrbGVuX3Qgc29ja2xlbiA9IHNpemVvZihzZXJ2YWRkcik7CiAKICAg IGlmKCBhcmdjIDwzKSB7CglmcHJpbnRmKHN0ZGVyciwgInVzZSAlcyB0YXJnZXRfbWFjaGlu ZSB0YXJnZXRfcG9ydFxuIiwgYXJndlswXSk7CglleGl0KDEpOwogICAgfQogICAgdGFyZ2V0 X3NlcnZlciA9IGFyZ3ZbMV07CgogICAgaWYoICggdGFyZ2V0X3BvcnQgPSBzdHJ0b2woYXJn dlsyXSwgMCwgMCkpID09IDApIHsKCWZwcmludGYoc3RkZXJyLCAib29wcywgcG9ydCBpcyAw XG4iKTsKCWV4aXQoMik7CiAgICB9CgoKICAgIC8qCiAgICAgKiByZXNvbHZlIGhvc3RuYW1l IGludG8gaXAgYWRkcmVzcwogICAgICovCiAgICBpZiggZ2V0X2hvc3RfaW5mbyh0YXJnZXRf c2VydmVyKSA8IDApIHsKCWZwcmludGYoc3RkZXJyLCAiY291bGQgbm90IGdldGhvc3RieW5h bWUoKSBmb3IgJXNcbiIsCgkJdGFyZ2V0X3NlcnZlcik7CglleGl0KDMpOwogICAgfQoKCiAg ICAvKgogICAgICogY3JlYXRlIGEgc29ja2V0CiAgICAgKi8KICAgIGlmKCAoc29ja2V0X2hh bmRsZSA9IHNvY2tldChBRl9JTkVULCBTT0NLX1NUUkVBTSwgMCkpIDwgMCkgewoJZnByaW50 ZihzdGRlcnIsICJlcnJvciBjcmVhdGluZyBzb2NrZXRcbiIpOwoJZXhpdCg0KTsKICAgIH0K CgogICAgLyoKICAgICAqIHNldHVwIHRhcmdldCBhZGRyZXNzIHN0cnVjdHVyZQogICAgICov CiAgICBiemVybyggJnNlcnZhZGRyLCBzb2NrbGVuKTsKICAgIHNlcnZhZGRyLnNpbl9mYW1p bHkgPSBBRl9JTkVUOwogICAgc2VydmFkZHIuc2luX3BvcnQgICA9IGh0b25zKCB0YXJnZXRf cG9ydCk7CiAgICBtZW1jcHkoICZzZXJ2YWRkci5zaW5fYWRkci5zX2FkZHIsICZ0Y3A0YWRk ciwgc2l6ZW9mKHN0cnVjdCBpbl9hZGRyKSApOwoKICAgIC8qCiAgICAgKiBjb25uZWN0IHRv IHRoZSB0YXJnZXQKICAgICAqLwoKICAgIGlmKCBjb25uZWN0KCBzb2NrZXRfaGFuZGxlLCAo U0EqKSAmc2VydmFkZHIsIHNvY2tsZW4pIDwgMCkgewoJZnByaW50ZihzdGRlcnIsICJjb25u ZWN0KCkgdG8gJXMgcG9ydCAlZCBmYWlsZWRcbiIsCgkJdGFyZ2V0X3NlcnZlciwgdGFyZ2V0 X3BvcnQgKTsKCWV4aXQoNSk7CiAgICB9CgogICAgLyoKICAgICAqIHllcywgd2UgZmluYWxs eSBtYWRlIGl0CiAgICAgKi8KCiAgICBmcHJpbnRmKHN0ZGVyciwgImNvbm5lY3RlZCB0byAl cyBwb3J0PSVkLCBzdGF5IHR1bmVkIVxuIiwKCSAgICB0YXJnZXRfc2VydmVyLCB0YXJnZXRf cG9ydCk7CgogICAgc2xlZXAoMTApOwoKICAgIGlmKCBjbG9zZShzb2NrZXRfaGFuZGxlKSAh PSAwKQoJZnByaW50ZihzdGRlcnIsICJjbG9zZSBmYWlsZWRcbiIpOwogICAgZWxzZQoJZnBy aW50ZihzdGRlcnIsICJzb2NrZXQgY2xvc2VkIHN1Y2Nlc3NmdWxseSwgZXhpdGluZyBub3dc biIpOwoKICAgIHJldHVybiAwOwp9CgoKCg== --------------B60521D6963ED9D9EE25D945 Content-Type: application/x-unknown-content-type-txtfile; name="tcp_244.log" Content-Transfer-Encoding: base64 Content-Disposition: inline; filename="tcp_244.log" MjE6NTY6NDUuNDE5NTg4IDEwLjIwMS4zMy4xMy5hbHRhLWFuYS1sbSA+IDEwLjIwMS4zMy40 MS5lY2hvOiBTIDcyMDE2MjkwNzo3MjAxNjI5MDcoMCkgd2luIDU4NDAgPG1zcyAxNDYwLHNh Y2tPSyx0aW1lc3RhbXAgODAzMjAzOSAwLG5vcCx3c2NhbGUgMD4gKERGKQoyMTo1Njo0NS40 MTk1ODggMTAuMjAxLjMzLjQxLmVjaG8gPiAxMC4yMDEuMzMuMTMuYWx0YS1hbmEtbG06IFMg NjA5MTI2NTQ0OjYwOTEyNjU0NCgwKSBhY2sgNzIwMTYyOTA4IHdpbiA1NzkyIDxtc3MgMTQ2 MCxzYWNrT0ssdGltZXN0YW1wIDQzNjA2Mzc1IDgwMzIwMzksbm9wLHdzY2FsZSAwPiAoREYp CjIxOjU2OjQ1LjQxOTU4OCAxMC4yMDEuMzMuMTMuYWx0YS1hbmEtbG0gPiAxMC4yMDEuMzMu NDEuZWNobzogLiAxOjEoMCkgYWNrIDEgd2luIDU4NDAgPG5vcCxub3AsdGltZXN0YW1wIDgw MzIwMzkgNDM2MDYzNzU+IChERikKMjE6NTY6NDUuNDE5NTg4IDEwLjIwMS4zMy4xMy5hbHRh LWFuYS1sbSA+IDEwLjIwMS4zMy40MS5lY2hvOiAuIDE6MSgwKSBhY2sgMSB3aW4gNTg0MCA8 bm9wLG5vcCx0aW1lc3RhbXAgODAzMjAzOSA0MzYwNjM3NSxub3Asbm9wLCBzYWNrIDEgezA6 MX0gPiAoREYpCjIxOjU2OjU1LjQyOTU4OCAxMC4yMDEuMzMuMTMuYWx0YS1hbmEtbG0gPiAx MC4yMDEuMzMuNDEuZWNobzogRiAxOjEoMCkgYWNrIDEgd2luIDU4NDAgPG5vcCxub3AsdGlt ZXN0YW1wIDgwMzMwNDAgNDM2MDYzNzU+IChERikKMjE6NTY6NTUuNDI5NTg4IDEwLjIwMS4z My40MS5lY2hvID4gMTAuMjAxLjMzLjEzLmFsdGEtYW5hLWxtOiBGIDE6MSgwKSBhY2sgMiB3 aW4gNTc5MiA8bm9wLG5vcCx0aW1lc3RhbXAgNDM2MDczNzUgODAzMzA0MD4gKERGKQoyMTo1 Njo1NS40Mjk1ODggMTAuMjAxLjMzLjEzLmFsdGEtYW5hLWxtID4gMTAuMjAxLjMzLjQxLmVj aG86IC4gMjoyKDApIGFjayAyIHdpbiA1ODQwIDxub3Asbm9wLHRpbWVzdGFtcCA4MDMzMDQw IDQzNjA3Mzc1PiAoREYpCg== --------------B60521D6963ED9D9EE25D945-- From owner-netdev@oss.sgi.com Sat Jul 28 11:46:01 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6SIk1d13295 for netdev-outgoing; Sat, 28 Jul 2001 11:46:01 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6SIjxV13290 for ; Sat, 28 Jul 2001 11:45:59 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15QZ6O-00017v-00 for netdev@oss.sgi.com; Sat, 28 Jul 2001 20:46:16 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15QL1z-0000Nl-00; Sat, 28 Jul 2001 00:44:47 -0300 Date: Sat, 28 Jul 2001 00:44:47 -0300 From: Harald Welte To: "Marty Poulin" Cc: Subject: Re: Fw: oops/bug in tcp, SACK doesn't work? Message-ID: <20010728004447.I1240@obroa-skai.gnumonks.org> References: <01ad01c11621$7ce50440$0b32a8c0@playnet.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <01ad01c11621$7ce50440$0b32a8c0@playnet.com>; from mpoulin@playnet.com on Thu, Jul 26, 2001 at 05:22:13PM -0500 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.5-9cl X-Date: Today is Pungenday, the 62nd day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1134 Lines: 26 On Thu, Jul 26, 2001 at 05:22:13PM -0500, Marty Poulin wrote: > > Perhaps this has been covered somewhere before, but for some reason it > > doesn't look like the 2.4.7 (and previous 2.4.x?) kernels responds to > > SACK correctly. Instead of just resending the missing packet Linux resends > > the entire packet stream as if it never received the SACK. Please note that the netfilter nat protocol helpers for ftp (and irc as well as other protocols in patch-o-matic) delete the SACKPERM option on-the-fly from all packets. It has to, as you run in neverending complications as soon as the nat helper has to alter the tcp sequence numbers, etc. So make sure that your tests are not done with ftp control connections when you do nat and have the helper running. > > Marty Poulin > > vandal@playnet.com -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Sat Jul 28 11:46:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6SIk2113302 for netdev-outgoing; Sat, 28 Jul 2001 11:46:02 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6SIjxV13289 for ; Sat, 28 Jul 2001 11:45:59 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15QZ6O-00017r-00 for netdev@oss.sgi.com; Sat, 28 Jul 2001 20:46:16 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15QKuU-0000NH-00; Sat, 28 Jul 2001 00:37:02 -0300 Date: Sat, 28 Jul 2001 00:37:02 -0300 From: Harald Welte To: Ethan Blanton Cc: linuxppc-dev@lists.linuxppc.org, netdev@oss.sgi.com Subject: Re: airport reset on iBook2 Message-ID: <20010728003701.H1240@obroa-skai.gnumonks.org> References: <20010725093902.E2864@localhost.localdomain> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Q68bSM7Ycu6FN28Q" Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010725093902.E2864@localhost.localdomain>; from eblanton@cs.ohiou.edu on Wed, Jul 25, 2001 at 09:39:02AM -0400 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.5-9cl X-Date: Today is Pungenday, the 62nd day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1812 Lines: 57 --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Jul 25, 2001 at 09:39:02AM -0400, Ethan Blanton wrote: > I don't know if this is a known problem (I haven't seen it mentioned) > or not, but I'm getting the following error from my Airport card > reasonably often: >=20 > Jul 25 09:27:50 localhost kernel: NETDEV WATCHDOG: eth1: transmit timed o= ut > Jul 25 09:27:50 localhost kernel: eth1: Tx timeout! Resetting card. I can confirm that happen from time to time with my Sony VAIO picturebook C1XD and a Lucent WaveLAN/IEEE board. unloading the whole pcmcia subsystem (incl. yenta_socket) and re-loading everything helps. (kernel 2.4.7) > I am sending this to linuxppc-dev and netdev, as I don't know whether > this is airport-specific, or orinoco/hermes related. well, my platform is non-ppc, and non-airport, so this seems to be a hint of it being a generic problem. > Ethan --=20 Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M-= =20 V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) --Q68bSM7Ycu6FN28Q Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.6 (GNU/Linux) Comment: For info see http://www.gnupg.org iD8DBQE7YjNcXaXGVTD0i/8RAvvsAJ9jA91Y/8V2BeOKorAPq4676J9sbwCgl96/ bN09Hd5BKH4gJxzQMTIM25g= =vHL7 -----END PGP SIGNATURE----- --Q68bSM7Ycu6FN28Q-- From owner-netdev@oss.sgi.com Sat Jul 28 14:33:05 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6SLX5i03548 for netdev-outgoing; Sat, 28 Jul 2001 14:33:05 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6SLX3V03542 for ; Sat, 28 Jul 2001 14:33:03 -0700 Received: (qmail 3043 invoked by uid 1000); 28 Jul 2001 21:13:30 -0000 Date: Sat, 28 Jul 2001 23:13:30 +0200 From: clemens To: Alexey Kuznetsov Cc: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu Subject: Re: missing icmp errors for udp packets Message-ID: <20010728231330.A2941@ghanima.endorphin.org> References: <200107270528.JAA00742@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107270528.JAA00742@mops.inr.ac.ru> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2262 Lines: 58 On Fri, Jul 27, 2001 at 09:28:43AM +0400, Alexey Kuznetsov wrote: > > 21:35:58.096727 guardian.51277 > ghanima.endorphin.org.echo: udp 0 > > 21:35:58.096871 guardian.51277 > ghanima.endorphin.org.8: udp 0 > > 21:35:58.097673 guardian.51277 > ghanima.endorphin.org.discard: udp 0 > > 21:35:58.098479 guardian.51277 > ghanima.endorphin.org.1: udp 0 > > 21:35:58.099285 guardian.51277 > ghanima.endorphin.org.2: udp 0 > > 21:35:58.100029 guardian.51277 > ghanima.endorphin.org.3: udp 0 > > 21:35:58.100721 guardian.51277 > ghanima.endorphin.org.6: udp 0 > > ..and so on. > > Check ICMP error rate limits: /proc/sys/net/ipv4/icmp_destunreach_rate ghanima:~$ cat /proc/sys/net/ipv4/icmp_destunreach_rate 100 i want to thank you, for being the first one recognizing this bug report at all, but please do read my description a little bit more careful. i tried to use kdb to trace the icmp_send, but got stuck somewhere after ip_output in dev_queue_xmit. so obviously 2.4.7 really tries to send something out to eth0, but fails somewhere somehow in low-level routines. anyway, i found out something new: for some udp packets a correct icmp error packet _is_ sent out of eth0. look: /usr/bin/host: 20:19:26.410213 guardian.2335 > ghanima.endorphin.org.domain: 19140+ A? blah.htu.tuwien.ac.at. (39) 20:19:26.410264 ghanima.endorphin.org > guardian: icmp: ghanima.endorphin.org udp port domain unreachable [tos 0xc0] apsend: (arbitary udp packet sender) 22:45:04.663056 guardian.14214 > ghanima.endorphin.org.echo: udp 0 (DF) [tos 0x10] 22:45:04.663118 ghanima.endorphin.org > guardian: icmp: ghanima.endorphin.org udp port echo unreachable [tos 0xd0] either if constructed by host or by apsend an icmp error is returned. but not for nmap. if an udp packet is sent by nmap only an icmp error is generated on lo, not for eth0. please note that there are no real difference between apsend and nmap packets and that the kernel is willing to sent a icmp error for an nmap packet since i've followed icmp_send down to dev_queue_xmit with kdb. to anyone who is not convinced, try out yourself: udp scan host A from host B with 'nmap -sU -p 1-10' and 'tcpdump -i eth0' on host A before you do this. clemens From owner-netdev@oss.sgi.com Sat Jul 28 21:58:02 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6T4w2A00383 for netdev-outgoing; Sat, 28 Jul 2001 21:58:02 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6T4w0V00380 for ; Sat, 28 Jul 2001 21:58:00 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6T4vnA02202; Sun, 29 Jul 2001 07:57:49 +0300 Date: Sun, 29 Jul 2001 07:57:48 +0300 (EEST) From: Pekka Savola To: clemens cc: Alexey Kuznetsov , , Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010728231330.A2941@ghanima.endorphin.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2667 Lines: 73 On Sat, 28 Jul 2001, clemens wrote: > to anyone who is not convinced, try out yourself: > udp scan host A from host B with 'nmap -sU -p 1-10' and 'tcpdump -i eth0' on > host A before you do this. I did a little bit of investigation, and I think the reason for this can seen. Not from the dump though: [problem; generated by nmap but no response sent]: 07:34:52.126018 193.94.160.1.5000 > 193.166.3.23.1025: [udp sum ok] udp 0 (ttl 37, id 49936, len 28) 0x0000 4500 001c c310 0000 2511 aca3 c15e a001 E.......%....^.. 0x0010 c1a6 0317 1388 0401 0008 c237 0000 0000 ...........7.... 0x0020 0000 0000 0000 0000 0000 0000 0000 .............. [no problem; generated by nmap]: 07:34:36.426851 193.94.160.1.5000 > 193.166.3.23.1025: [udp sum ok] udp 0 (ttl 35, id 13201, len 28) 0x0000 4500 001c 3391 0000 2311 3e23 c15e a001 E...3...#.>#.^.. 0x0010 c1a6 0317 1388 0401 0008 c237 0000 0000 ...........7.... 0x0020 0000 0000 0000 0000 0000 0000 0000 .............. I've also tried different kinds of payload lengths, DF bits etc. No effect. HOWEVER, I noticed that 'nmap -P0' (ie. don't ping first) always works without problems! Problems occur (if you ping first) if you have net.ipv4.icmp_echoreply_rate=0 (the default). Setting: # sysctl -w net.ipv4.icmp_echoreply_rate=100 (other rates also being 100) will work around the problem. So in conclusion: with net.ipv4.icmp_echoreply_rate=0: 07:46:13.619681 193.94.160.1.5000 > 193.166.3.23.1025: udp 0 07:46:13.619681 193.166.3.23 > 193.94.160.1: icmp: 193.166.3.23 udp port 1025 unreachable [tos 0xc0] 07:46:32.828636 193.94.160.1 > 193.166.3.23: icmp: echo request 07:46:32.828636 193.166.3.23 > 193.94.160.1: icmp: echo reply 07:46:33.138619 193.94.160.1.5000 > 193.166.3.23.1025: udp 0 07:46:33.438603 193.94.160.1.5000 > 193.166.3.23.1025: udp 0 with net.ipv4.icmp_echoreply_rate=100: 07:54:23.543076 193.94.160.1.5000 > 193.166.3.23.1025: udp 0 07:54:23.543076 193.166.3.23 > 193.94.160.1: icmp: 193.166.3.23 udp port 1025 unreachable [tos 0xc0] 07:54:28.832790 193.94.160.1 > 193.166.3.23: icmp: echo request 07:54:28.832790 193.166.3.23 > 193.94.160.1: icmp: echo reply 07:54:29.292765 193.94.160.1.5000 > 193.166.3.23.1025: udp 0 07:54:29.292765 193.166.3.23 > 193.94.160.1: icmp: 193.166.3.23 udp port 1025 unreachable [tos 0xc0] So there does appear to be a rather elusive bug here. Same behaviour with 2.2 series. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sun Jul 29 04:40:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TBe3B07642 for netdev-outgoing; Sun, 29 Jul 2001 04:40:03 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TBe1V07638 for ; Sun, 29 Jul 2001 04:40:01 -0700 Received: (qmail 514 invoked by uid 1000); 29 Jul 2001 11:16:15 -0000 Date: Sun, 29 Jul 2001 13:16:15 +0200 From: clemens To: Pekka Savola Cc: netdev@oss.sgi.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010729131615.A382@ghanima.endorphin.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 584 Lines: 19 On Sun, Jul 29, 2001 at 07:57:48AM +0300, Pekka Savola wrote: > HOWEVER, I noticed that 'nmap -P0' (ie. don't ping first) always works > without problems! probably there is a bug in the icmp reply throttling code? without pinging first, there are icmp replies. with pinging first, there are no icmp replies. with pinging and heavy udp scanning, there are a few (throttled) icmp replies. (try yourself) seems like the icmpsendingrate isn't resetted below the threshold properly? clemens p.s.: does this somehow explain why this whole issue doesn't apply to the loopback devices? From owner-netdev@oss.sgi.com Sun Jul 29 04:49:50 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TBnoH09317 for netdev-outgoing; Sun, 29 Jul 2001 04:49:50 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TBnnV09310 for ; Sun, 29 Jul 2001 04:49:49 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6TBngL03948; Sun, 29 Jul 2001 14:49:42 +0300 Date: Sun, 29 Jul 2001 14:49:42 +0300 (EEST) From: Pekka Savola To: clemens cc: Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010729131615.A382@ghanima.endorphin.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 540 Lines: 18 On Sun, 29 Jul 2001, clemens wrote: > does this somehow explain why this whole issue doesn't apply to the loopback > devices? Loopback is not subject to these restrictions, net/ipv4/icmp.c:icmpv4_xrlim_allow : /* No rate limit on loopback */ if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) return 1; -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Sun Jul 29 08:27:33 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TFRXd31851 for netdev-outgoing; Sun, 29 Jul 2001 08:27:33 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TFRVV31847 for ; Sun, 29 Jul 2001 08:27:31 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA14668; Sun, 29 Jul 2001 19:27:05 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107291527.TAA14668@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.org (clemens) Date: Sun, 29 Jul 2001 19:27:05 +0400 (MSK DST) Cc: netdev@oss.sgi.com, linux-kernel@vger.rutgers.edu In-Reply-To: <20010728231330.A2941@ghanima.endorphin.org> from "clemens" at Jul 28, 1 11:13:30 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 474 Lines: 16 Hello! > i tried to use kdb to trace Alas, I do not believe to any results, obtained with kdb. I believe to plain logic more. :-) > the icmp_send, but got stuck somewhere after > ip_output in dev_queue_xmit. so obviously 2.4.7 really tries to send > something out to eth0, but fails somewhere somehow in low-level routines. Particularly, I do not believe to this at all. 99% - the problem is with rate limiting. Loopback is special only in this respect. :-) Alexey From owner-netdev@oss.sgi.com Sun Jul 29 08:59:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TFxhM02223 for netdev-outgoing; Sun, 29 Jul 2001 08:59:43 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TFxeV02220 for ; Sun, 29 Jul 2001 08:59:40 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id TAA15413; Sun, 29 Jul 2001 19:59:11 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107291559.TAA15413@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: pekkas@netcore.fi (Pekka Savola) Date: Sun, 29 Jul 2001 19:59:11 +0400 (MSK DST) Cc: therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.redhat.com, davem@redhat.com (Dave Miller) In-Reply-To: from "Pekka Savola" at Jul 29, 1 07:57:48 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 835 Lines: 33 Hello! > So in conclusion: > > with net.ipv4.icmp_echoreply_rate=0: Congratulations! That's why I do not see this, forgot to ping before. :-) The patch is enclosed. Alexey --- ../dust/vger3-010728/linux/net/ipv4/icmp.c Thu Jun 14 22:49:44 2001 +++ linux/net/ipv4/icmp.c Sun Jul 29 19:52:55 2001 @@ -240,12 +240,15 @@ int xrlim_allow(struct dst_entry *dst, int timeout) { unsigned long now; + static int burst; now = jiffies; dst->rate_tokens += now - dst->rate_last; dst->rate_last = now; - if (dst->rate_tokens > XRLIM_BURST_FACTOR*timeout) - dst->rate_tokens = XRLIM_BURST_FACTOR*timeout; + if (burst < XRLIM_BURST_FACTOR*timeout) + burst = XRLIM_BURST_FACTOR*timeout; + if (dst->rate_tokens > burst) + dst->rate_tokens = burst; if (dst->rate_tokens >= timeout) { dst->rate_tokens -= timeout; return 1; From owner-netdev@oss.sgi.com Sun Jul 29 09:29:34 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TGTYG02932 for netdev-outgoing; Sun, 29 Jul 2001 09:29:34 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TGTXV02929 for ; Sun, 29 Jul 2001 09:29:33 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id UAA16811; Sun, 29 Jul 2001 20:29:27 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id HAA00326; Sat, 28 Jul 2001 07:02:54 +0400 Message-Id: <200107280302.HAA00326@mops.inr.ac.ru> Subject: Re: Fw: oops/bug in tcp, SACK doesn't work? To: mpoulin@playnet.COM (Marty Poulin) Date: Sat, 28 Jul 2001 07:02:54 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <01ad01c11621$7ce50440$0b32a8c0@playnet.com> from "Marty Poulin" at Jul 27, 1 02:45:02 am From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 292 Lines: 13 Hello! > > Perhaps this has been covered somewhere before, but for some reason it > > doesn't look like the 2.4.7 (and previous 2.4.x?) kernels responds to > SACK > > correctly. tcpdump -s 128 -w result port xxx gzip -9 result And send result.gz to me using your lovely mailer... Alexey From owner-netdev@oss.sgi.com Sun Jul 29 09:36:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TGaU103061 for netdev-outgoing; Sun, 29 Jul 2001 09:36:30 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TGaSV03058 for ; Sun, 29 Jul 2001 09:36:28 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id UAA17181; Sun, 29 Jul 2001 20:36:22 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00441; Sun, 29 Jul 2001 02:06:24 +0400 Message-Id: <200107282206.CAA00441@mops.inr.ac.ru> Subject: Re: TCP 4way handshake To: bernhard.werner@werner-elektrotechnik.COM (Bernhard Werner) Date: Sun, 29 Jul 2001 02:06:24 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <3B62C443.82A327C@werner-elektrotechnik.com> from "Bernhard Werner" at Jul 28, 1 06:45:12 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 153 Lines: 9 Hello! > 2.2.18 didn't show the double ack, 2.4.4 does. Yes, it is known bug in a few of 2.4s. It is supposed to be fixed in kernels > 2.4.5. Alexey From owner-netdev@oss.sgi.com Sun Jul 29 09:53:53 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TGrrB03260 for netdev-outgoing; Sun, 29 Jul 2001 09:53:53 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TGrpV03256 for ; Sun, 29 Jul 2001 09:53:51 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id UAA18260; Sun, 29 Jul 2001 20:53:36 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107291653.UAA18260@ms2.inr.ac.ru> Subject: Re: Fw: oops/bug in tcp, SACK doesn't work? To: laforge@gnumonks.ORG (Harald Welte) Date: Sun, 29 Jul 2001 20:53:36 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20010728004447.I1240@obroa-skai.gnumonks.org> from "Harald Welte" at Jul 28, 1 11:15:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 664 Lines: 19 Hello! > Please note that the netfilter nat protocol helpers for ftp (and irc as well as > other protocols in patch-o-matic) delete the SACKPERM option on-the-fly > from all packets. Then Marty would not see any sacks at all. > It has to, as you run in neverending complications as soon as the nat helper > has to alter the tcp sequence numbers, etc. It is not a valid justification. It is difficult to rewrite sequence numbers. As soon as nat does this, rewriting sacks is easy. Even not easy, trivial. Sad and not expected behaviour. I used to ridicule commercial firewall vendors, sometimes doing shit of this kind without any clear reasons. :-) Alexey From owner-netdev@oss.sgi.com Sun Jul 29 10:16:49 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6THGn703730 for netdev-outgoing; Sun, 29 Jul 2001 10:16:49 -0700 Received: from sj-msg-core-1.cisco.com (sj-msg-core-1.cisco.com [171.71.163.11]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6THGmV03727 for ; Sun, 29 Jul 2001 10:16:48 -0700 Received: from kaspit.cisco.com (kaspit.cisco.com [144.254.91.49]) by sj-msg-core-1.cisco.com (8.11.3/8.9.1) with ESMTP id f6THGkg18611; Sun, 29 Jul 2001 10:16:46 -0700 (PDT) Received: from lab200w2k (dhcp-64-103-121-189.cisco.com [64.103.121.189]) by kaspit.cisco.com (Mirapoint) with SMTP id AKK01025; Sun, 29 Jul 2001 20:16:40 +0300 (GMT-3) From: "Jacob Avraham" To: "Alexey Kuznetsov" Cc: Subject: RE: conflicting alignment requirements Date: Sun, 29 Jul 2001 20:17:41 +0200 Message-ID: MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 (Normal) X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook IMO, Build 9.0.2416 (9.0.2910.0) In-Reply-To: <200107260647.KAA00350@mops.inr.ac.ru> X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Importance: Normal Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 589 Lines: 19 > > Hello! > > > > > On the other hand, some upper layer code, like tc > > > > (for non x86/68k), checks if the IP header is 4 byte align, > > Do you know any architecture except for these ones which are able > to generate unaligned packets? I do not. And if you do, please, > enlighten me about them. > I was referring to packets coming from the Tulip h/w (not generated by the CPU), on which the IP header is alway unalinged. If the driver is configured NOT to copy the packet to a fresh skb (rx_copybreak = 0), the packet will traverse the net layer with unalinged IP header. -Jacob From owner-netdev@oss.sgi.com Sun Jul 29 11:41:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6TIfLH15122 for netdev-outgoing; Sun, 29 Jul 2001 11:41:21 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6TIfJV15118 for ; Sun, 29 Jul 2001 11:41:20 -0700 Received: (qmail 406 invoked by uid 1000); 29 Jul 2001 18:17:33 -0000 Date: Sun, 29 Jul 2001 20:17:33 +0200 From: clemens To: netdev@oss.sgi.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010729201733.A369@ghanima.endorphin.org> References: <200107291527.TAA14668@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107291527.TAA14668@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 824 Lines: 22 On Sun, Jul 29, 2001 at 07:27:05PM +0400, kuznet@ms2.inr.ac.ru wrote: > > i tried to use kdb to trace > Alas, I do not believe to any results, obtained with kdb. > I believe to plain logic more. :-) don't be polemic. > > the icmp_send, but got stuck somewhere after > > ip_output in dev_queue_xmit. so obviously 2.4.7 really tries to send > > something out to eth0, but fails somewhere somehow in low-level routines. > Particularly, I do not believe to this at all. 99% - the problem is > with rate limiting. Loopback is special only in this respect. :-) wrong, i've found 3 lo/!lo conditionals at the first look. anyway, it is the rate limiting. i disabled icmpv4_xrlim_allow, and now every udp packets get it's proper icmp error. now, we need someone to figure out why icmpv4_xrlim_allow is overlimiting. clemens From owner-netdev@oss.sgi.com Sun Jul 29 18:57:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6U1v4G17242 for netdev-outgoing; Sun, 29 Jul 2001 18:57:04 -0700 Received: from hawk.mail.pas.earthlink.net (hawk.mail.pas.earthlink.net [207.217.120.22]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6U1v0V17233 for ; Sun, 29 Jul 2001 18:57:00 -0700 Received: from earthlink.net (dialup-63.208.190.230.Dial1.Baltimore1.Level3.net [63.208.190.230]) by hawk.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id SAA22972 for ; Sun, 29 Jul 2001 18:56:57 -0700 (PDT) Message-ID: <3B64B076.6090709@earthlink.net> Date: Sun, 29 Jul 2001 20:55:18 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: IPv6 fragmentation and IPv6 header parsing Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 7189 Lines: 257 Everyone, I am currently completing a port of the Netfilter connection tracking subsystem from IPv4 to IPv6. Most of the features in this port are complete, except for fragment handling, which is non- existent. I am also not entirely sure about how to properly parse header chains and extract various extension and layer-4 headers for use by the connection tracking subsystem. Enclosed with this message are my current efforts regarding IPv6 fragmentation and IPv6 header chain parsing. I would appreciate any feedback at all regarding this. Thanks, Brad IPv6 fragmentation support /* Sun Jul 29 2001 - Created ip6_conntrack_fragment.c */ #include #include #include #include #include #include #include #include #include #include #include #include #include #define ASSERT_READ_LOCK(x) MUST_BE_READ_LOCKED(&ip6_conntrack_lock) #define ASSERT_WRITE_LOCK(x) MUST_BE_WRITE_LOCKED(&ip6_conntrack_lock) #include #include #include #include #if 0 #define DEBUGP printk #else #define DEBUGP(format, args...) #endif /* Initialize the flowlabel */ static void init_fl(struct sk_buff *skb, struct flowi *flow) { flow->proto = (u_int8_t)get_ipv6_header((*pskb)->nh.ipv6h, 0); ipv6_addr_copy(flow->fl6_dst, (struct in6_addr *)(*pskb)->nh.ipv6h->daddr); ipv6_addr_copy(flow->fl6_src, (struct in6_addr *)(*pskb)->nh.ipv6h->saddr); flow->flowlabel = (*pskb)->nh.ipv6h->flow_lbl[0] + (*pskb)->nh.ipv6h->flow_lbl[1] + (*pskb)->nh.ipv6h->flow_lbl[2]; switch (flow->proto) case IPPROTO_TCP : { flow->uli_u.ports.sport = (*pskb)->h.tcph->sport; flow->uli_u.ports.dport = (*pskb)->h.tcph->dport; } case IPPROTO_UDP : { flow->uli_u.ports.sport = (*pskb)->h.udph->sport; flow->uli_u.ports.dport = (*pskb)->h.udph->dport; } case IPPROTO_ICMPV6 : { struct icmp6hdr *icmpv6h = (struct icmp6hdr *)get_ipv6_header((*pskb)->nh.ipv6h, IPPROTO_ICMPV6); flow->uli_u.icmpt.code = icmpv6h->type; flow->uli_u.icmpt.type = icmpv6h->code; } flow->oif = 0; flow->uli_u.data = 0; return; } /* Initialize the transmission options struct */ static void init_opt(struct sk_buff *skb, struct ipv6_txoptions *opt) { struct frag_hdr *fhdr = (struct frag_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_FRAGMENT); u_int32_t proto = get_ipv6_header((*pskb)->nh.ipv6h, 0); opt->hopopt = (struct ipv6_opt_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_HOP); opt->dst0opt = (struct ipv6_opt_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_DEST); opt->srcrt = (struct ipv6_opt_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_ROUTING); opt->auth = (struct ipv6_opt_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_AUTH); opt->dst1opt = (struct ipv6_opt_hdr *)get_ipv6_header((*pskb)->nh.ipv6h, NEXTHDR_DEST); /* Hmmmm.... we have to find all headers before the fragment header and all after it; a lot of this is duplicated from get_ipv6_header() - kisza _has_ to see this and tell me what's wrong -- BC */ opt->opt_flen += (u_int16_t)(sizeof(proto)); opt->opt_nflen += (u_int16_t)(sizeof((*pskb)->nh.ipv6h); if (fhdr) { u_int32_t hdrptr = (*pskb)->nh.ipv6h->nexthdr; int hdrlen = 0; while (*hdrptr != NEXTHDR_FRAGMENT) { opt->opt_nflen += *hdrptr; hdrlen = sizeof(*hdrptr); hdrptr = (u_int32_t)(hdrptr + hdrlen); } hdrlen = sizeof(fhdr); hdrptr = (u_int32_t)(hdrptr + hdrlen); while (*hdrptr != proto) { opt->opt_flen += hdrptr; hdrlen = sizeof(*hdrptr); hdrptr = (u_int32_t)(hdrptr + hdrlen); } } opt->tot_len = sizeof(opt); return; } static int ip6_refrag_getfrag(const void *data, struct in6_addr *saddr, char *buffer, unsigned int offset, unsigned int len) { /* As far as I can tell, this function is designed to calculate various checksums for various portions of the header and payload of a packet. I may have to copy&paste or duplicate code */ return 0; } unsigned int ip6_refrag(unsigned int hooknum, struct sk_buff **pskb, const struct net_device *in, const struct net_device *out, int (*okfn)(struct sk_buff *)) { struct rtable *rt = (struct rtable *)(*pskb)->dst; struct flowi *flow; struct ipv6_txoptions *opt; int ret, flags = 0; /* We've seen it coming out the other side: confirm it */ if (ip6_conntrack_confirm(*pskb) != NF_ACCEPT) return NF_DROP; if ((*pskb)->len > rt->u.dst.pmtu) { /* Let's get busy */ init_fl(*pskb, flow); init_opt(*pskb, opt); /* Send the packet! */ ret = ip6_build_xmit((*pskb)->sk, ip6_refrag_getfrag, get_ipv6_header((*pskb)->nh.ipv6h, 0), flow, (*pskb)->len, opt, (*pskb)->nh.ipv6h->hoplimit, flags); if (ret < 0) { printk(KERN_WARNING "KABOOM! can't send fragged packet!\n"); return NF_DROP; } return NF_STOLEN; } return NF_ACCEPT; } /* Returns new sk_buff, or NULL */ struct sk_buff * ip6_ct_gather_frags(struct sk_buff *skb) { struct sock *sk = skb->sk; struct frag_hdr *fhdr = (struct frag_hdr *)get_ipv6_header(skb->nh.ipv6h, NEXTHDR_FRAGMENT); #ifdef CONFIG_NETFILTER_DEBUG unsigned int olddebug = skb->nf_debug; #endif if (sk) { sock_hold(sk); skb_orphan(skb); } /* Disable the local bh, reassemble the packet, and reenable the local bh */ local_bh_disable(); if (!ipv6_reassembly(skb, fhdr)) { if (sk) sock_put(sk); local_bh_enable(); return NULL; } local_bh_enable(); if (skb_is_nonlinear(skb) && skb_linearize(skb, GFP_ATOMIC) != 0) { kfree_skb(skb); if (sk) sock_put(sk); return NULL; } if (sk) { skb_set_owner_w(skb, sk); sock_put(sk); } skb->nfcache |= NFC_ALTERED; #ifdef CONFIG_NETFILTER_DEBUG /* Packet path as if nothing had happened. */ skb->nf_debug = olddebug; #endif return skb; } IPv6 header chain parsing /* Locate any available header in an IPv6 header */ u_int32_t get_ipv6_header(const struct ipv6hdr *ipv6h, u_int8_t header) { /* Get the bitmasked list of available headers */ u_int32_t *hdrptr = (u_int32_t *)&ipv6h->nexthdr; int hdrlen, length; u_int32_t headers = 0; length = sizeof(ipv6h); hdrlen = 0; do { headers |= *hdrptr; hdrlen = sizeof((u_int32_t *)hdrptr); hdrptr = (u_int32_t *)(hdrptr + hdrlen); length -= hdrlen; } while (!(length < hdrlen)); if (header == 0) { /* This is a special thing - gets a protocol for an unknown packet type - it uses the protocol_bitmask variable - initialized by default from the builtins, and (hopefully) later initialized by 3rd-party protocol modules */ short int count = 0; for (count = 0; count < MAX_BUILTINS + 1; count++) if (headers & protocol_bitmask[count]) return protocol_bitmask[count]; } /* Is the one we want there? */ if (headers & header) return header; /* NOTREACHED */ return 0; } From owner-netdev@oss.sgi.com Sun Jul 29 23:35:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6U6Z7u27521 for netdev-outgoing; Sun, 29 Jul 2001 23:35:07 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6U6Z5V27518 for ; Sun, 29 Jul 2001 23:35:05 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6U6Yk108606; Mon, 30 Jul 2001 09:34:46 +0300 Date: Mon, 30 Jul 2001 09:34:46 +0300 (EEST) From: Pekka Savola To: Brad Chapman cc: Subject: Re: IPv6 fragmentation and IPv6 header parsing In-Reply-To: <3B64B076.6090709@earthlink.net> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1394 Lines: 28 On Sun, 29 Jul 2001, Brad Chapman wrote: > I am currently completing a port of the Netfilter connection > tracking subsystem from IPv4 to IPv6. Most of the features in this > port are complete, except for fragment handling, which is non- > existent. I am also not entirely sure about how to properly parse > header chains and extract various extension and layer-4 headers > for use by the connection tracking subsystem. Enclosed with this > message are my current efforts regarding IPv6 fragmentation and > IPv6 header chain parsing. > > I would appreciate any feedback at all regarding this. A comment: it appears some code from IPv4 is not applicable; ip6_refrag etc. look a little dubious, for example, as IPv6 fragmentation is always end-to-end with fragmentation header (or just avoiding it with PMTU), and no (de)fragmentation should happen in the routers. If you haven't already, I recomment checking out RFC2460 chapters 4.5 and 5. Hope this helps, I'll leave it to the others to comment on header chaining issues. (This may be complex as the extension headers' order is not fixed; if macros or functions do not exist for handling these, perhaps they should be created.) -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Mon Jul 30 01:01:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6U81wu30649 for netdev-outgoing; Mon, 30 Jul 2001 01:01:58 -0700 Received: from sabre-wulf.nvg.ntnu.no (IDENT:root@sabre-wulf.nvg.ntnu.no [129.241.210.67]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6U81vV30646 for ; Mon, 30 Jul 2001 01:01:57 -0700 Received: from tyrell.nvg.ntnu.no ([IPv6:::ffff:129.241.210.70]:25356 "EHLO tyrell.nvg.ntnu.no" ident: "root" whoson: "-unregistered-") by sabre-wulf.nvg.ntnu.no with ESMTP id ; Mon, 30 Jul 2001 10:01:42 +0200 Received: (from venaas@localhost) by tyrell.nvg.ntnu.no (8.9.3/8.8.4) id KAA03803; Mon, 30 Jul 2001 10:01:42 +0200 Date: Mon, 30 Jul 2001 10:01:42 +0200 From: Stig Venaas To: Pekka Savola Cc: Brad Chapman , netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010730100142.A5064@nvg.ntnu.no> References: <3B64B076.6090709@earthlink.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: ; from pekkas@netcore.fi on Mon, Jul 30, 2001 at 09:34:46AM +0300 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1476 Lines: 29 On Mon, Jul 30, 2001 at 09:34:46AM +0300, Pekka Savola wrote: > On Sun, 29 Jul 2001, Brad Chapman wrote: > > I am currently completing a port of the Netfilter connection > > tracking subsystem from IPv4 to IPv6. Most of the features in this > > port are complete, except for fragment handling, which is non- > > existent. I am also not entirely sure about how to properly parse > > header chains and extract various extension and layer-4 headers > > for use by the connection tracking subsystem. Enclosed with this > > message are my current efforts regarding IPv6 fragmentation and > > IPv6 header chain parsing. > > > > I would appreciate any feedback at all regarding this. > > A comment: it appears some code from IPv4 is not applicable; ip6_refrag > etc. look a little dubious, for example, as IPv6 fragmentation is always > end-to-end with fragmentation header (or just avoiding it with PMTU), and > no (de)fragmentation should happen in the routers. If you haven't > already, I recomment checking out RFC2460 chapters 4.5 and 5. > > Hope this helps, I'll leave it to the others to comment on header chaining > issues. (This may be complex as the extension headers' order is not > fixed; if macros or functions do not exist for handling these, perhaps > they should be created.) Also note that headers that are interesting to end-points mostly are placed after the fragmentation header and are parsed after reassembly. That is also listed in RFC 2460. Stig From owner-netdev@oss.sgi.com Mon Jul 30 03:33:26 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UAXQk01372 for netdev-outgoing; Mon, 30 Jul 2001 03:33:26 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UAXHV01366; Mon, 30 Jul 2001 03:33:17 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 8A42FB4A2; Mon, 30 Jul 2001 22:33:41 +1200 (NZST) Date: Mon, 30 Jul 2001 22:33:41 +1200 From: Chris Wedgwood To: Alexey Kuznetsov Cc: Ralf Baechle , netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010730223341.B5289@weta.f00f.org> References: <200107282351.DAA00187@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107282351.DAA00187@mops.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 625 Lines: 22 On Sun, Jul 29, 2001 at 03:51:10AM +0400, Alexey Kuznetsov wrote: Are you about kernel? Well, if such thoughts visit you, then help me to answer simpler question: what is the simplest method to use FPU from irqs? Run away, run away!!!! Alas, softmodem drivers need this. That which have this part of code open just saves fpu context at each irq. Even not doing clts. :-) But what to change in core to make this correctly (and faster)? How else can you do this? You can think of various lazy save/restore schemes but I can't see how to make any of them work on SMP systems cleanly. --cw From owner-netdev@oss.sgi.com Mon Jul 30 04:44:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UBij002330 for netdev-outgoing; Mon, 30 Jul 2001 04:44:45 -0700 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UBiZV02327; Mon, 30 Jul 2001 04:44:35 -0700 Received: (mea@zmailer.org) by mail.zmailer.org id ; Mon, 30 Jul 2001 14:44:17 +0300 Date: Mon, 30 Jul 2001 14:44:17 +0300 From: Matti Aarnio To: Chris Wedgwood Cc: Alexey Kuznetsov , Ralf Baechle , netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010730144417.I2650@mea-ext.zmailer.org> References: <200107270757.LAA01031@mops.inr.ac.ru> <20010728032453.C852@weta.f00f.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010728032453.C852@weta.f00f.org>; from cw@f00f.org on Sat, Jul 28, 2001 at 03:24:53AM +1200 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2112 Lines: 51 On Sat, Jul 28, 2001 at 03:24:53AM +1200, Chris Wedgwood wrote: > On Fri, Jul 27, 2001 at 11:57:43AM +0400, Alexey Kuznetsov wrote: > > This happens after sum of rtts exceed ~4000 seconds. > I.e. with rtt of 4 seconds after ~1000 samples. > > Seems, have to use long long to extend this. Sigh. > > Is float really that bad? It's not like your doing 50K or these per > second... Like others have noted, doing FP/MMX stuff in kernel needs saving, and restoring a rather massive state dump - preserving lazy-FPU-save/restore! If the cost of that work is worth the effort for the amount of FP/MMX instructions done for whatever purpose, then by all means, do it. Examples of this include, but are not limited into, MD RAID5 parity calculation, and some copy-to/from-userspace routines, I think. In normal operations the penalty of having to save also the FPU for an interrupt is so severe (IRQ called things should be small and fast!) that it makes no sense. In "soft interrupts" a.k.a. "BH:s" things are somewhat different. Also, in most cases what one needs are scaled integers, or "fractions" with constant (and power of two) fractor. That gives you in essense trivially things like 0.01 and 100.00 which you multiply, and get (roughly) 1.0 And by the way, softmodems don't need to do DSP work in the interrupt. The softmodems are (to my knowledge) just glorified full-duplex A/D-D/A codecs with suitable accessories for connecting into the phone line. All you need is to move the PCM samples to userspace softmodem program, and you are done. The "softmodem" driver shrinks a lot, and most of it moves into a daemon server process. I do think that having *soundmodem* stuff in the kernel is an abomination, and it should be pushed out into userspace. It will be trivial to push decoded frames from "modem" to kernel (and reverse), of course. Running the userspace DSP program in pre-emptive RT-priority and doing it all locked into memory is a performance issue, and possibly necessary, but definitely one won't need to do the DSP stuff in kernel. > --cw /Matti Aarnio From owner-netdev@oss.sgi.com Mon Jul 30 05:37:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UCbmq03005 for netdev-outgoing; Mon, 30 Jul 2001 05:37:48 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UCbjV03002 for ; Mon, 30 Jul 2001 05:37:45 -0700 Received: (qmail 757 invoked by uid 1000); 30 Jul 2001 12:13:59 -0000 Date: Mon, 30 Jul 2001 14:13:59 +0200 From: clemens To: netdev@oss.sgi.com Cc: alan@lxorguk.ukuu.org.uk, linux-kernel@vger.kernel.org Subject: final words on udp/ICMP dest unreach issue [+PATCH] Message-ID: <20010730141359.A450@ghanima.endorphin.org> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="lrZ03NoBR/3+SXJZ" Content-Disposition: inline User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 2932 Lines: 92 --lrZ03NoBR/3+SXJZ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline hi! concerning the bug discussed in the "missing icmp errors for udp packets"-thread on netdev a solution has been found. here comes the bug (see net/ipv4/icmp.c): #define XRLIM_BURST_FACTOR 6 int xrlim_allow(struct dst_entry *dst, int timeout) { unsigned long now; now = jiffies; dst->rate_tokens += now - dst->rate_last; dst->rate_last = now; #1: if (dst->rate_tokens > XRLIM_BURST_FACTOR*timeout) #2: dst->rate_tokens = XRLIM_BURST_FACTOR*timeout; #3: if (dst->rate_tokens >= timeout) { dst->rate_tokens -= timeout; return 1; } return 0; } for timeout=0 rate_tokens will be reset to 0 tokens (#2), since #1 always holds. (icmp ping does have timeout=0, for instance) this doesn't cause the packet to be filtered, since in #2 holds, but will cause the following packet to be filtered, if sent before (now - dst->rate_last) < timeout. (note: timeout is not 0 in this inequation, since it's the timeout of the icmp type of the following packet) a patch is attached. thanks to all contributors, especially pekka savola, for discovering that pinging before udp scanning will cause the troubles, and alexey for suppling an alternative patch (for those intrested: http://therapy.endorphin.org/alexey.patch) alan, please take care of that. greets, clemens --lrZ03NoBR/3+SXJZ Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="icmp-xrlim_allow.patch" --- linux/net/ipv4/icmp.c~ Thu Jun 21 06:00:55 2001 +++ linux/net/ipv4/icmp.c Mon Jul 30 13:32:56 2001 @@ -16,6 +16,9 @@ * Other than that this module is a complete rewrite. * * Fixes: + * Clemens Fruhwirth : Fix incorrect limiting in xrlim_allow + * for a packet succedding a packet with + * timeout==0. * Mike Shaver : RFC1122 checks. * Alan Cox : Multicast ping reply as self. * Alan Cox : Fix atomicity lockup in ip_build_xmit @@ -223,11 +226,6 @@ * Note that the same dst_entry fields are modified by functions in * route.c too, but these work for packet destinations while xrlim_allow * works for icmp destinations. This means the rate limiting information - * for one "ip object" is shared. - * - * Note that the same dst_entry fields are modified by functions in - * route.c too, but these work for packet destinations while xrlim_allow - * works for icmp destinations. This means the rate limiting information * for one "ip object" is shared - and these ICMPs are twice limited: * by source and by destination. * @@ -241,9 +239,12 @@ { unsigned long now; + if (0 == timeout) return 1; + now = jiffies; dst->rate_tokens += now - dst->rate_last; dst->rate_last = now; + if (dst->rate_tokens > XRLIM_BURST_FACTOR*timeout) dst->rate_tokens = XRLIM_BURST_FACTOR*timeout; if (dst->rate_tokens >= timeout) { --lrZ03NoBR/3+SXJZ-- From owner-netdev@oss.sgi.com Mon Jul 30 06:03:58 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UD3wv03478 for netdev-outgoing; Mon, 30 Jul 2001 06:03:58 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UD3sV03472 for ; Mon, 30 Jul 2001 06:03:55 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6UD3er10414; Mon, 30 Jul 2001 16:03:41 +0300 Date: Mon, 30 Jul 2001 16:03:40 +0300 (EEST) From: Pekka Savola To: cc: , , , Dave Miller Subject: Re: missing icmp errors for udp packets In-Reply-To: <200107291559.TAA15413@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1419 Lines: 48 On Sun, 29 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > So in conclusion: > > > > with net.ipv4.icmp_echoreply_rate=0: > > Congratulations! That's why I do not see this, forgot to ping before. :-) > > The patch is enclosed. Alexey, there is a tiny problem with your patch. If you reboot the computer, the _first_ ping/scan attempt will not return icmp dest unreachable. All of the rest do. If the network was quiet enough, I guess there might be some circumstances where this could be applicable again.. > --- ../dust/vger3-010728/linux/net/ipv4/icmp.c Thu Jun 14 22:49:44 2001 > +++ linux/net/ipv4/icmp.c Sun Jul 29 19:52:55 2001 > @@ -240,12 +240,15 @@ > int xrlim_allow(struct dst_entry *dst, int timeout) > { > unsigned long now; > + static int burst; > > now = jiffies; > dst->rate_tokens += now - dst->rate_last; > dst->rate_last = now; > - if (dst->rate_tokens > XRLIM_BURST_FACTOR*timeout) > - dst->rate_tokens = XRLIM_BURST_FACTOR*timeout; > + if (burst < XRLIM_BURST_FACTOR*timeout) > + burst = XRLIM_BURST_FACTOR*timeout; > + if (dst->rate_tokens > burst) > + dst->rate_tokens = burst; > if (dst->rate_tokens >= timeout) { > dst->rate_tokens -= timeout; > return 1; > -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Mon Jul 30 06:17:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UDHHn03732 for netdev-outgoing; Mon, 30 Jul 2001 06:17:17 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UDHBV03729; Mon, 30 Jul 2001 06:17:12 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id F114DB589; Tue, 31 Jul 2001 01:17:42 +1200 (NZST) Date: Tue, 31 Jul 2001 01:17:42 +1200 From: Chris Wedgwood To: Matti Aarnio Cc: Alexey Kuznetsov , Ralf Baechle , netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010731011742.A5499@weta.f00f.org> References: <200107270757.LAA01031@mops.inr.ac.ru> <20010728032453.C852@weta.f00f.org> <20010730144417.I2650@mea-ext.zmailer.org> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010730144417.I2650@mea-ext.zmailer.org> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 501 Lines: 16 On Mon, Jul 30, 2001 at 02:44:17PM +0300, Matti Aarnio wrote: Like others have noted, doing FP/MMX stuff in kernel needs saving, and restoring a rather massive state dump - preserving lazy-FPU-save/restore! If the cost of that work is worth the effort for the amount of FP/MMX instructions done for whatever purpose, then by all means, do it. ping is a userland program for softmodems, yes, making this all userland would be very nice, better still death to all modems --cw From owner-netdev@oss.sgi.com Mon Jul 30 07:33:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UEX6805649 for netdev-outgoing; Mon, 30 Jul 2001 07:33:06 -0700 Received: from anagyris.wanadoo.fr (smtp-rt-1.wanadoo.fr [193.252.19.151]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UEX3V05646 for ; Mon, 30 Jul 2001 07:33:05 -0700 Received: from amyris.wanadoo.fr (193.252.19.150) by anagyris.wanadoo.fr; 30 Jul 2001 16:32:56 +0200 Received: from [10.0.1.53] (193.253.250.33) by amyris.wanadoo.fr; 30 Jul 2001 16:32:40 +0200 From: Benjamin Herrenschmidt To: Harald Welte , , Subject: Re: airport reset on iBook2 Date: Mon, 30 Jul 2001 16:32:36 +0200 Message-Id: <20010730143236.16877@smtp.wanadoo.fr> In-Reply-To: <20010728003701.H1240@obroa-skai.gnumonks.org> References: <20010728003701.H1240@obroa-skai.gnumonks.org> X-Mailer: CTM PowerMail 3.0.8 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 486 Lines: 14 >I can confirm that happen from time to time with my Sony VAIO picturebook >C1XD and a Lucent WaveLAN/IEEE board. > >unloading the whole pcmcia subsystem (incl. yenta_socket) and re-loading >everything helps. What is your card's firmware version ? (displayed by the driver during boot). You may want to go to MacOS, upgrade to Apple's latest airport software, and back to linux. This will upgrade your card's firmware to the latest version provided by Apple, which might help. Ben. From owner-netdev@oss.sgi.com Mon Jul 30 08:24:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UFOlX06842 for netdev-outgoing; Mon, 30 Jul 2001 08:24:47 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UFOjV06839 for ; Mon, 30 Jul 2001 08:24:46 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15REui-0007iZ-00 for netdev@oss.sgi.com; Mon, 30 Jul 2001 17:25:00 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15R0q6-0000eC-00; Sun, 29 Jul 2001 21:23:18 -0300 Date: Sun, 29 Jul 2001 21:23:17 -0300 From: Harald Welte To: Brad Chapman Cc: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing Message-ID: <20010729212317.I1486@obroa-skai.gnumonks.org> References: <3B64B076.6090709@earthlink.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <3B64B076.6090709@earthlink.net>; from kakadu@earthlink.net on Sun, Jul 29, 2001 at 08:55:18PM -0400 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7 X-Date: Today is Setting Orange, the 64th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1338 Lines: 31 On Sun, Jul 29, 2001 at 08:55:18PM -0400, Brad Chapman wrote: > Everyone, > > I am currently completing a port of the Netfilter connection > tracking subsystem from IPv4 to IPv6. Most of the features in this > port are complete, except for fragment handling, which is non- > existent. I am also not entirely sure about how to properly parse > header chains and extract various extension and layer-4 headers > for use by the connection tracking subsystem. Enclosed with this > message are my current efforts regarding IPv6 fragmentation and > IPv6 header chain parsing. I'm not sure if your 1:1 attempt of a port is a good idea. In IPv6, routers do not fragment packets at all. This clashes with the current way how connection tracking for IPv4 is implemented in netfilter (defrag at input, refrag at output). so, don't try to add fragmentation support to the core (nobody will include it anyway, i guess), but try to implement a connection tracking which works without that defrag-refrag need. > Brad -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Mon Jul 30 11:23:59 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6UINxB09258 for netdev-outgoing; Mon, 30 Jul 2001 11:23:59 -0700 Received: from falcon.mail.pas.earthlink.net (falcon.mail.pas.earthlink.net [207.217.120.74]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6UINwV09255 for ; Mon, 30 Jul 2001 11:23:58 -0700 Received: from earthlink.net (dialup-63.208.190.210.Dial1.Baltimore1.Level3.net [63.208.190.210]) by falcon.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id LAA14306 for ; Mon, 30 Jul 2001 11:23:01 -0700 (PDT) Message-ID: <3B65977C.8080103@earthlink.net> Date: Mon, 30 Jul 2001 13:21:00 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <3B64B076.6090709@earthlink.net> <20010729212317.I1486@obroa-skai.gnumonks.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1997 Lines: 59 Harald Welte wrote: > On Sun, Jul 29, 2001 at 08:55:18PM -0400, Brad Chapman wrote: > >> Everyone, >> >> I am currently completing a port of the Netfilter connection >> tracking subsystem from IPv4 to IPv6. Most of the features in this >> port are complete, except for fragment handling, which is non- >> existent. I am also not entirely sure about how to properly parse >> header chains and extract various extension and layer-4 headers >> for use by the connection tracking subsystem. Enclosed with this >> message are my current efforts regarding IPv6 fragmentation and >> IPv6 header chain parsing. > > > I'm not sure if your 1:1 attempt of a port is a good idea. > > In IPv6, routers do not fragment packets at all. > > This clashes with the current way how connection tracking for IPv4 is > implemented in netfilter (defrag at input, refrag at output). > > so, don't try to add fragmentation support to the core (nobody will include > it anyway, i guess), but try to implement a connection tracking which works > without that defrag-refrag need. > >> Brad > Mr. Harald, (if you get this, Mr. Harald, its because I mispelled `netdev' and deleted the original message) Well, okay. I thought about the possiblity of violating the RFCs, and at first I stayed away from attempting to add IPv4-style fragment support. But, TBH, there are really three different things that can be done with this issue: 1. Ignore all fragments, and NF_DROP fragmented packets. Period. This one is probably the most RFC-compliant. 2. Copy packets, hold originals, and push copies into connection tracking system. This one sounds kludgy and bloaty and violates RFCs. 3. Rewrite _everything_ to deal with fragmented packets. TBH, that's scary. If given a choice, and told that defragging/refragging packets on the fly violated the RFCs, I would probably choose the first option above. BTW, what about header chain parsing? Am I doing that right? Brad From owner-netdev@oss.sgi.com Tue Jul 31 09:43:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VGh9B03481 for netdev-outgoing; Tue, 31 Jul 2001 09:43:09 -0700 Received: from opium.mbsi.ca (opium.mbsi.ca [198.168.101.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VGh5V03477 for ; Tue, 31 Jul 2001 09:43:05 -0700 Received: (from marc@localhost) by opium.mbsi.ca (8.11.3/8.11.3) id f6VGgxm28788; Tue, 31 Jul 2001 12:42:59 -0400 (EDT) Date: Tue, 31 Jul 2001 12:42:59 -0400 From: Marc Boucher To: netfilter@lists.samba.org, netfilter-devel@lists.samba.org, netdev@oss.sgi.com Cc: mostrows@us.ibm.com, mostrows@speakeasy.net Subject: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010731124259.A28736@opium.mbsi.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.19i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4099 Lines: 134 Hi folks, Enclosed is a patch which should eliminate the crashes involving netfilter/iptables/nat (and often pppoe) that several people have been experiencing under kernels >= 2.4.4. Basically the nat manip_pkt handlers were corrupting skb_shinfo(skb)->frag_list (thus causing a crash in skb_drop_fraglist()) by stupidly writing beyond skb->end when attempting to update fields (like tcp->check) in truncated inner-ICMP headers. Cheers Marc --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_core.c 2001/07/31 15:31:51 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_core.c 2001/07/31 16:12:45 @@ -824,6 +824,16 @@ ? "DST" : "SRC", NIPQUAD(info->manips[i].manip.ip), ntohs(info->manips[i].manip.u.udp.port)); + + if((skb->len <= ((void *)inner - (void *)iph)) || + (skb->len < ((void *)( + (u_int32_t *)inner + inner->ihl + ) - (void *)iph))) { + DEBUGP("icmp_reply_translation: skb too small for inner IP header\n"); + READ_UNLOCK(&ip_nat_lock); + return NF_DROP; + } + manip_pkt(inner->protocol, inner, skb->len - ((void *)inner - (void *)iph), &info->manips[i].manip, --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_icmp.c 2001/07/31 15:37:44 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_icmp.c 2001/07/31 15:55:15 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int icmp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -48,6 +54,11 @@ enum ip_nat_manip_type maniptype) { struct icmphdr *hdr = (struct icmphdr *)((u_int32_t *)iph + iph->ihl); + + if(((void *)(hdr+1) - (void *)iph) < len) { + DEBUGP("icmp_manip_pkt: too small\n"); + return; + } hdr->checksum = ip_nat_cheat_check(hdr->un.echo.id ^ 0xFFFF, manip->u.icmp.id, --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 16:19:40 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int tcp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -83,6 +89,12 @@ u_int32_t oldip; u_int16_t *portptr; + + if(((void *)&hdr->dest + sizeof(hdr->dest) - (void *)iph) < len) { + DEBUGP("tcp_manip_pkt: too small\n"); + return; + } + if (maniptype == IP_NAT_MANIP_SRC) { /* Get rid of src ip and src pt */ oldip = iph->saddr; @@ -92,10 +104,17 @@ oldip = iph->daddr; portptr = &hdr->dest; } - hdr->check = ip_nat_cheat_check(~oldip, manip->ip, + + /* this could be a inner header returned in icmp packet; in such + cases we cannot update the checksum field since it is outside of + the 64 bits of transport layer headers typically included */ + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) >= len) { + hdr->check = ip_nat_cheat_check(~oldip, manip->ip, ip_nat_cheat_check(*portptr ^ 0xFFFF, manip->u.tcp.port, hdr->check)); + } + *portptr = manip->u.tcp.port; } --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_udp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_udp.c 2001/07/31 16:19:23 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int udp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -80,6 +86,11 @@ struct udphdr *hdr = (struct udphdr *)((u_int32_t *)iph + iph->ihl); u_int32_t oldip; u_int16_t *portptr; + + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) < len) { + DEBUGP("udp_manip_pkt: too small\n"); + return; + } if (maniptype == IP_NAT_MANIP_SRC) { /* Get rid of src ip and src pt */ From owner-netdev@oss.sgi.com Tue Jul 31 10:12:48 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VHCmY04025 for netdev-outgoing; Tue, 31 Jul 2001 10:12:48 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VHCkV04022 for ; Tue, 31 Jul 2001 10:12:46 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id VAA04463; Tue, 31 Jul 2001 21:12:22 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311712.VAA04463@ms2.inr.ac.ru> Subject: Re: conflicting alignment requirements To: jacoba@cisco.com (Jacob Avraham) Date: Tue, 31 Jul 2001 21:12:22 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: from "Jacob Avraham" at Jul 29, 1 08:17:41 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 317 Lines: 11 Hello! > copy the packet to a fresh skb (rx_copybreak = 0), the packet will > traverse the net layer with unalinged IP header. Doing this for an arch which traps wrong alignment, you can expect everything (except for crash, which could be bug). Particularly, u32 rules are not going to match such packets. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 10:43:43 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VHhhi04879 for netdev-outgoing; Tue, 31 Jul 2001 10:43:43 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VHhcV04871 for ; Tue, 31 Jul 2001 10:43:39 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA07188; Tue, 31 Jul 2001 21:43:31 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00325; Mon, 30 Jul 2001 02:04:41 +0400 Message-Id: <200107292204.CAA00325@mops.inr.ac.ru> Subject: Re: Linux 2.4 networking/routing slowdown To: rusty@rustcorp.COM.AU (Rusty Russell) Date: Mon, 30 Jul 2001 02:04:41 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: from "Rusty Russell" at Jul 29, 1 06:45:00 am From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 414 Lines: 16 Hello! > Yes, you're paying for full connection tracking with the compatibility > stuff. If you just want filtering, switch to iptables (should be > pretty easy for you). Paul, but he said "several seconds"! This has nothing to do with performance and surely cannot be a payment for using an obsolete interface... It is some loss or something sort of this. > Hmmm... this I don't know. Here too. :-) Alexey From owner-netdev@oss.sgi.com Tue Jul 31 10:43:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VHhiE04887 for netdev-outgoing; Tue, 31 Jul 2001 10:43:44 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VHhdV04872 for ; Tue, 31 Jul 2001 10:43:40 -0700 Received: from mops.inr.ac.ru (mops.inr.ac.ru [193.233.7.60]) by ms2.inr.ac.ru (8.6.13/ANK) with ESMTP id VAA07193; Tue, 31 Jul 2001 21:43:32 +0400 Received: (from kuznet@localhost) by mops.inr.ac.ru (8.9.3/8.9.3) id CAA00332; Mon, 30 Jul 2001 02:20:27 +0400 Message-Id: <200107292220.CAA00332@mops.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.ORG (clemens) Date: Mon, 30 Jul 2001 02:20:27 +0400 (MSD) Cc: netdev@oss.sgi.com In-Reply-To: <20010729131615.A382@ghanima.endorphin.org> from "clemens" at Jul 29, 1 04:15:00 pm From: Alexey Kuznetsov X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 795 Lines: 19 Hello! > does this somehow explain why this whole issue doesn't apply to the loopback > devices? Ratelimit checks are simply skipped for it, they apply only to icmps, which are going to be sent to network. Source of the problem was that icmp holds single variable for rate, but still pretends to allow setting different rates for different types of messages. Algo solves this assigning different costs to different types, but it breaks when costs are strongly different, so that low cost one (echo reply in this case) suppresses high cost (icmp errors) too strongly for some short time. nmap sends tight burst of udp messages (which is crazy anyway, icmp errors except for a few will be dropped in any case), after echo and all the icmp errors inevitably fall to this dead interval. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 10:44:13 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VHiD705001 for netdev-outgoing; Tue, 31 Jul 2001 10:44:13 -0700 Received: from opium.mbsi.ca (opium.mbsi.ca [198.168.101.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VHiAV04997 for ; Tue, 31 Jul 2001 10:44:10 -0700 Received: (from marc@localhost) by opium.mbsi.ca (8.11.3/8.11.3) id f6VHi5J29218; Tue, 31 Jul 2001 13:44:05 -0400 (EDT) Date: Tue, 31 Jul 2001 13:44:05 -0400 From: Marc Boucher To: netfilter@lists.samba.org, netfilter-devel@lists.samba.org, netdev@oss.sgi.com Cc: mostrows@us.ibm.com, mostrows@speakeasy.net Subject: ERRATA Re: [PATCH] fix for netfilter/nat/pppoe crashes (hopefully) Message-ID: <20010731134405.A29180@opium.mbsi.ca> References: <20010731124259.A28736@opium.mbsi.ca> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010731124259.A28736@opium.mbsi.ca> User-Agent: Mutt/1.3.19i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4363 Lines: 146 Due to a stupid mistake the patch I just posted was an early broken version (some length checks inversed). Please replace it with the one below. Sorry about that! Marc On Tue, Jul 31, 2001 at 12:42:59PM -0400, Marc Boucher wrote: > Hi folks, > > Enclosed is a patch which should eliminate the crashes involving > netfilter/iptables/nat (and often pppoe) that several people have been > experiencing under kernels >= 2.4.4. > > Basically the nat manip_pkt handlers were corrupting skb_shinfo(skb)->frag_list > (thus causing a crash in skb_drop_fraglist()) by stupidly writing beyond > skb->end when attempting to update fields (like tcp->check) in truncated > inner-ICMP headers. > > Cheers > Marc > --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_core.c 2001/07/31 15:31:51 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_core.c 2001/07/31 17:27:06 @@ -824,6 +824,16 @@ ? "DST" : "SRC", NIPQUAD(info->manips[i].manip.ip), ntohs(info->manips[i].manip.u.udp.port)); + + if((skb->len <= ((void *)inner - (void *)iph)) || + (skb->len < ((void *)( + (u_int32_t *)inner + inner->ihl + ) - (void *)iph))) { + DEBUGP("icmp_reply_translation: skb too small for inner IP header\n"); + READ_UNLOCK(&ip_nat_lock); + return NF_DROP; + } + manip_pkt(inner->protocol, inner, skb->len - ((void *)inner - (void *)iph), &info->manips[i].manip, --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_icmp.c 2001/07/31 15:37:44 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_icmp.c 2001/07/31 17:27:08 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int icmp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -48,6 +54,11 @@ enum ip_nat_manip_type maniptype) { struct icmphdr *hdr = (struct icmphdr *)((u_int32_t *)iph + iph->ihl); + + if(((void *)(hdr+1) - (void *)iph) > len) { + DEBUGP("icmp_manip_pkt: too small\n"); + return; + } hdr->checksum = ip_nat_cheat_check(hdr->un.echo.id ^ 0xFFFF, manip->u.icmp.id, --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_tcp.c 2001/07/31 17:35:20 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int tcp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -83,6 +89,12 @@ u_int32_t oldip; u_int16_t *portptr; + + if(((void *)&hdr->dest + sizeof(hdr->dest) - (void *)iph) > len) { + DEBUGP("tcp_manip_pkt: too small\n"); + return; + } + if (maniptype == IP_NAT_MANIP_SRC) { /* Get rid of src ip and src pt */ oldip = iph->saddr; @@ -92,10 +104,17 @@ oldip = iph->daddr; portptr = &hdr->dest; } - hdr->check = ip_nat_cheat_check(~oldip, manip->ip, + + /* this could be a inner header returned in icmp packet; in such + cases we cannot update the checksum field since it is outside of + the 64 bits of transport layer headers typically included */ + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) <= len) { + hdr->check = ip_nat_cheat_check(~oldip, manip->ip, ip_nat_cheat_check(*portptr ^ 0xFFFF, manip->u.tcp.port, hdr->check)); + } + *portptr = manip->u.tcp.port; } --- linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_udp.c 2001/07/31 15:37:45 1.1 +++ linux-2.4.7-mb/net/ipv4/netfilter/ip_nat_proto_udp.c 2001/07/31 17:27:15 @@ -9,6 +9,12 @@ #include #include +#if 0 +#define DEBUGP printk +#else +#define DEBUGP(format, args...) +#endif + static int udp_in_range(const struct ip_conntrack_tuple *tuple, enum ip_nat_manip_type maniptype, @@ -80,6 +86,11 @@ struct udphdr *hdr = (struct udphdr *)((u_int32_t *)iph + iph->ihl); u_int32_t oldip; u_int16_t *portptr; + + if(((void *)&hdr->check + sizeof(hdr->check) - (void *)iph) > len) { + DEBUGP("udp_manip_pkt: too small\n"); + return; + } if (maniptype == IP_NAT_MANIP_SRC) { /* Get rid of src ip and src pt */ From owner-netdev@oss.sgi.com Tue Jul 31 11:34:32 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIYWu06037 for netdev-outgoing; Tue, 31 Jul 2001 11:34:32 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIYTV06031 for ; Tue, 31 Jul 2001 11:34:30 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA09598; Tue, 31 Jul 2001 22:33:56 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311833.WAA09598@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: pekkas@netcore.fi (Pekka Savola) Date: Tue, 31 Jul 2001 22:33:56 +0400 (MSK DST) Cc: therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.redhat.com, davem@redhat.com In-Reply-To: from "Pekka Savola" at Jul 30, 1 04:03:40 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 148 Lines: 8 Hello! > If you reboot the computer, the _first_ ping/scan attempt will not return > icmp dest unreachable. Hmm... how fast after reboot? Alexey From owner-netdev@oss.sgi.com Tue Jul 31 11:42:57 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIgvJ06227 for netdev-outgoing; Tue, 31 Jul 2001 11:42:57 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIgrV06224; Tue, 31 Jul 2001 11:42:53 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA09801; Tue, 31 Jul 2001 22:42:13 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311842.WAA09801@ms2.inr.ac.ru> Subject: Re: ping bug To: matti.aarnio@zmailer.org (Matti Aarnio) Date: Tue, 31 Jul 2001 22:42:13 +0400 (MSK DST) Cc: cw@f00f.org, ralf@oss.sgi.com, netdev@oss.sgi.com In-Reply-To: <20010730144417.I2650@mea-ext.zmailer.org> from "Matti Aarnio" at Jul 30, 1 02:44:17 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 196 Lines: 10 Hello! > And by the way, softmodems don't need to do DSP work in the interrupt. Really? :-) Do they need, or they do not need, this does not matter at all. v* protocol stuff is binary. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 11:44:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIiF306306 for netdev-outgoing; Tue, 31 Jul 2001 11:44:15 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIiDV06301 for ; Tue, 31 Jul 2001 11:44:13 -0700 Received: (qmail 8337 invoked by uid 1000); 31 Jul 2001 18:44:09 -0000 Date: Tue, 31 Jul 2001 20:44:09 +0200 From: clemens To: Alexey Kuznetsov Cc: netdev@oss.sgi.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010731204409.A8211@ghanima.endorphin.org> References: <200107292220.CAA00332@mops.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107292220.CAA00332@mops.inr.ac.ru> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1464 Lines: 31 On Mon, Jul 30, 2001 at 02:20:27AM +0400, Alexey Kuznetsov wrote: > > does this somehow explain why this whole issue doesn't apply to the loopback > > devices? > > Ratelimit checks are simply skipped for it, they apply only to icmps, > which are going to be sent to network. > > Source of the problem was that icmp holds single variable for rate, but still > pretends to allow setting different rates for different types of messages. > Algo solves this assigning different costs to different types, but > it breaks when costs are strongly different, so that low cost one (echo reply > in this case) suppresses high cost (icmp errors) too strongly > for some short time. nmap sends tight burst of udp messages (which is crazy > anyway, icmp errors except for a few will be dropped in any case), > after echo and all the icmp errors inevitably fall to this dead interval. in some way this "pretending to be a feature" issue should be cleaned up. consequently since there is only one token bucket, there can only be one icmp rate limit. we can add a icmp type mask to enable/disable rate limiting for certain types. or we could add a bunch of token buckets to dst_entry, which would make the whole thing overbloaed. using lazy instantation would be the third option i could think of. change rate_last+rate_token to a token bucket reference only used if needed. in the latest case, one should discipline route.c to keep hands of rate_token+rate_last. clemens From owner-netdev@oss.sgi.com Tue Jul 31 11:45:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIjSC06351 for netdev-outgoing; Tue, 31 Jul 2001 11:45:28 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIjNV06346; Tue, 31 Jul 2001 11:45:24 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 7E19515852; Wed, 1 Aug 2001 06:45:56 +1200 (NZST) Date: Wed, 1 Aug 2001 06:45:56 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: Matti Aarnio , ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010801064556.A8101@weta.f00f.org> References: <200107311842.WAA09801@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311842.WAA09801@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 247 Lines: 10 On Tue, Jul 31, 2001 at 10:42:13PM +0400, kuznet@ms2.inr.ac.ru wrote: Do they need, or they do not need, this does not matter at all. v* protocol stuff is binary. for the uneducated such as myself, WTF is "v* protocol stuff" ? --cw From owner-netdev@oss.sgi.com Tue Jul 31 11:47:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIl0g06530 for netdev-outgoing; Tue, 31 Jul 2001 11:47:00 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIktV06527; Tue, 31 Jul 2001 11:46:56 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA09964; Tue, 31 Jul 2001 22:46:42 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311846.WAA09964@ms2.inr.ac.ru> Subject: Re: ping bug To: cw@f00f.org (Chris Wedgwood) Date: Tue, 31 Jul 2001 22:46:42 +0400 (MSK DST) Cc: ralf@oss.sgi.com, netdev@oss.sgi.com In-Reply-To: <20010730223341.B5289@weta.f00f.org> from "Chris Wedgwood" at Jul 30, 1 10:33:41 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 224 Lines: 13 Hello! > Run away, run away!!!! I would be glad, but one such beast dwells here. :-) > How else can you do this? Well, I assume that man remembering about unused pipes can have some ideas, better than my hacks. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 11:48:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VImHF06631 for netdev-outgoing; Tue, 31 Jul 2001 11:48:17 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VImFV06626 for ; Tue, 31 Jul 2001 11:48:15 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VIlv620472; Tue, 31 Jul 2001 21:47:57 +0300 Date: Tue, 31 Jul 2001 21:47:57 +0300 (EEST) From: Pekka Savola To: cc: , , , Subject: Re: missing icmp errors for udp packets In-Reply-To: <200107311833.WAA09598@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 624 Lines: 20 On Tue, 31 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > Hello! > > > If you reboot the computer, the _first_ ping/scan attempt will not return > > icmp dest unreachable. > > Hmm... how fast after reboot? Can be quite a long time. Previously, I tested it immediately after reboot. Now I tried it about 6 minutes after I had typed 'reboot' with success. I believe it may be the first ICMP message to be sent after reboot.. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 11:51:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIp3106798 for netdev-outgoing; Tue, 31 Jul 2001 11:51:03 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIp2V06789 for ; Tue, 31 Jul 2001 11:51:02 -0700 Received: (qmail 8467 invoked by uid 1000); 31 Jul 2001 18:51:01 -0000 Date: Tue, 31 Jul 2001 20:51:01 +0200 From: clemens To: kuznet@ms2.inr.ac.ru Cc: Pekka Savola , therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.redhat.com, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010731205101.B8211@ghanima.endorphin.org> References: <200107311833.WAA09598@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311833.WAA09598@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 549 Lines: 16 On Tue, Jul 31, 2001 at 10:33:56PM +0400, kuznet@ms2.inr.ac.ru wrote: > Hello! > > > If you reboot the computer, the _first_ ping/scan attempt will not return > > icmp dest unreachable. > Hmm... how fast after reboot? your patch will not prevent the first ping to empty the token bucket, because burst is still 0, which is larger than dst->rate_token, and since XRLIM_BURST_FACTOR times the timeout (which is 6*0=0 in that case) is the token maximum, it will be truncated to 0, causing the following packets (if in time) to be dropped. clemens From owner-netdev@oss.sgi.com Tue Jul 31 11:57:51 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VIvp107086 for netdev-outgoing; Tue, 31 Jul 2001 11:57:51 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VIvmV07083 for ; Tue, 31 Jul 2001 11:57:48 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id WAA10162; Tue, 31 Jul 2001 22:57:34 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311857.WAA10162@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.org (clemens) Date: Tue, 31 Jul 2001 22:57:34 +0400 (MSK DST) Cc: netdev@oss.sgi.com In-Reply-To: <20010731204409.A8211@ghanima.endorphin.org> from "clemens" at Jul 31, 1 08:44:09 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 879 Lines: 34 Hello! > in some way this "pretending to be a feature" issue should be cleaned up. Did you receive the patch? It is supposed to fix this. Cost of the patch is that high cost messages open way to send bursts of low cost ones. Probably, it is even reasonable. (plus it has buglet, misbehaving when relax rate limit is relaxed. Not a big deal too) > consequently since there is only one token bucket, there can only be one > icmp rate limit. we can add a icmp type mask to enable/disable rate limiting > for certain types. Yes. Logically this is 100% right. Also, see below. > whole thing overbloaed. Yes. > using lazy instantation would be the third option i could think Yes, only this is surely overbloat. :-) Actually, I would prefer to limit only icmp errors (not echo) and all in one pool. Leaving all the rest as an option, which can be made with CBQ. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:03:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJ3iV07264 for netdev-outgoing; Tue, 31 Jul 2001 12:03:44 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJ3gV07261 for ; Tue, 31 Jul 2001 12:03:42 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VJ3Rk20606; Tue, 31 Jul 2001 22:03:27 +0300 Date: Tue, 31 Jul 2001 22:03:27 +0300 (EEST) From: Pekka Savola To: cc: clemens , Subject: Re: missing icmp errors for udp packets In-Reply-To: <200107311857.WAA10162@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 609 Lines: 14 On Tue, 31 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > Actually, I would prefer to limit only icmp errors (not echo) > and all in one pool. Leaving all the rest as an option, which can > be made with CBQ. Having two pools, one for error and the other for informational message types (umm.. I guess there isn't a clear, easily obtainable distinction in IPv4..) would probably be what most people might expect. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 12:04:41 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJ4fi07368 for netdev-outgoing; Tue, 31 Jul 2001 12:04:41 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJ4cV07365 for ; Tue, 31 Jul 2001 12:04:38 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA10312; Tue, 31 Jul 2001 23:04:06 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311904.XAA10312@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: therapy@endorphin.org (clemens) Date: Tue, 31 Jul 2001 23:04:06 +0400 (MSK DST) Cc: pekkas@netcore.fi, therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.redhat.com, davem@redhat.com In-Reply-To: <20010731205101.B8211@ghanima.endorphin.org> from "clemens" at Jul 31, 1 08:51:01 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 621 Lines: 18 Hello! > your patch will not prevent the first ping to empty the token bucket, > because burst is still 0, which is larger than dst->rate_token, and since > XRLIM_BURST_FACTOR times the timeout (which is 6*0=0 in that case) is the > token maximum, it will be truncated to 0, > causing the following packets (if in time) to be dropped. Argh... I see, gap is too short and not enough of tokens are accumulated. Thank you. Damn, I see two ways: 1. to make sysctl active function and recalculate max/sum of rates over classes and fill bucket. Or to remove limiting distinguishing types, which is ideal logically. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:06:28 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJ6SW07481 for netdev-outgoing; Tue, 31 Jul 2001 12:06:28 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJ6QV07478 for ; Tue, 31 Jul 2001 12:06:26 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA10448; Tue, 31 Jul 2001 23:06:01 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311906.XAA10448@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: pekkas@netcore.fi (Pekka Savola) Date: Tue, 31 Jul 2001 23:06:01 +0400 (MSK DST) Cc: therapy@endorphin.org, netdev@oss.sgi.com In-Reply-To: from "Pekka Savola" at Jul 31, 1 10:03:27 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 268 Lines: 10 Hello! > Having two pools, one for error and the other for informational message I am "people" too. And I do not expect that echos are limited at all. :-) And assume that default configuration is no limit. Hence, adding it to a common structure is mistake. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:12:37 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJCbU07653 for netdev-outgoing; Tue, 31 Jul 2001 12:12:37 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJCYV07649 for ; Tue, 31 Jul 2001 12:12:35 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VJCCt20677; Tue, 31 Jul 2001 22:12:12 +0300 Date: Tue, 31 Jul 2001 22:12:12 +0300 (EEST) From: Pekka Savola To: cc: , Subject: Re: missing icmp errors for udp packets In-Reply-To: <200107311906.XAA10448@ms2.inr.ac.ru> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 946 Lines: 23 On Tue, 31 Jul 2001 kuznet@ms2.inr.ac.ru wrote: > > Having two pools, one for error and the other for informational message > > I am "people" too. And I do not expect that echos are limited at all. :-) Me neither. It would make 'ping -f' testing of your ISP's connections rather inconvenient ;-) ... > And assume that default configuration is no limit. Hence, adding > it to a common structure is mistake. What I meant (to say) is, for people who _want_ to limit pings too, they want to it to be separate from other ICMP(/other protocol) limiting capabilities. Some might be experiencing excessive pinging, but want to be able to respond with ICMP errors when necessary, for example (some of this is already there, but isn't separate). -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 12:14:07 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJE7G07771 for netdev-outgoing; Tue, 31 Jul 2001 12:14:07 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJE3V07768; Tue, 31 Jul 2001 12:14:03 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA10783; Tue, 31 Jul 2001 23:13:47 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311913.XAA10783@ms2.inr.ac.ru> Subject: Re: ping bug To: cw@f00f.org (Chris Wedgwood) Date: Tue, 31 Jul 2001 23:13:47 +0400 (MSK DST) Cc: matti.aarnio@zmailer.org, ralf@oss.sgi.com, netdev@oss.sgi.com In-Reply-To: <20010801064556.A8101@weta.f00f.org> from "Chris Wedgwood" at Aug 1, 1 06:45:56 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 145 Lines: 9 Hello! > v* protocol stuff is binary. > > for the uneducated such as myself, WTF is "v* protocol stuff" ? Well, V.22, V.32 et at. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:18:56 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJIuA07988 for netdev-outgoing; Tue, 31 Jul 2001 12:18:56 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJItV07985 for ; Tue, 31 Jul 2001 12:18:55 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA10862; Tue, 31 Jul 2001 23:17:55 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311917.XAA10862@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: pekkas@netcore.fi (Pekka Savola) Date: Tue, 31 Jul 2001 23:17:55 +0400 (MSK DST) Cc: therapy@endorphin.org, netdev@oss.sgi.com, davem@redhat.com (Dave Miller), ak@muc.de (Andi Kleen) In-Reply-To: from "Pekka Savola" at Jul 31, 1 10:12:12 pm X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 281 Lines: 12 Hello! > What I meant (to say) is, for people who _want_ to limit pings too, CBQ can do this in any way, which is possible to imagine. In any case, I need to get some verdict from Andi and Dave to move in either way. [ For Dave and Andi: should I resume the problem? ] Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:20:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJKJ008131 for netdev-outgoing; Tue, 31 Jul 2001 12:20:19 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJKGV08127; Tue, 31 Jul 2001 12:20:16 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 35DF815871; Wed, 1 Aug 2001 07:20:45 +1200 (NZST) Date: Wed, 1 Aug 2001 07:20:45 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: matti.aarnio@zmailer.org, ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010801072045.A8228@weta.f00f.org> References: <200107311913.XAA10783@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311913.XAA10783@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 674 Lines: 20 On Tue, Jul 31, 2001 at 11:13:47PM +0400, kuznet@ms2.inr.ac.ru wrote: Well, V.22, V.32 et at. Oh, I see what you mean... yeah, I think the vendors are plagued by patents and consortium rules which would mean an open source implementation of such things sould be hard. That said, I don't see wht the v* stuff can't be a simple library for use in either the kernel or userland, the latter being a target to move towards (ie. make the kernel driver as dump as possible and use a userland program for frob the dsp). What modem and source are you working form anyhow? --cw (who wonders why they don't have DSL in Russia and hence all this woyld be moot!) From owner-netdev@oss.sgi.com Tue Jul 31 12:23:14 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJNET08286 for netdev-outgoing; Tue, 31 Jul 2001 12:23:14 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJNDV08283 for ; Tue, 31 Jul 2001 12:23:13 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 3B24815876; Wed, 1 Aug 2001 07:23:47 +1200 (NZST) Date: Wed, 1 Aug 2001 07:23:47 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: clemens , pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801072347.C8228@weta.f00f.org> References: <200107311904.XAA10312@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311904.XAA10312@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 186 Lines: 10 On Tue, Jul 31, 2001 at 11:04:06PM +0400, kuznet@ms2.inr.ac.ru wrote: Or to remove limiting distinguishing types, which is ideal logically. Why do we do this anyhow? --cw From owner-netdev@oss.sgi.com Tue Jul 31 12:26:21 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJQL808403 for netdev-outgoing; Tue, 31 Jul 2001 12:26:21 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJQKV08400 for ; Tue, 31 Jul 2001 12:26:20 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11038; Tue, 31 Jul 2001 23:25:50 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311925.XAA11038@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: cw@f00f.org (Chris Wedgwood) Date: Tue, 31 Jul 2001 23:25:50 +0400 (MSK DST) Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com In-Reply-To: <20010801072347.C8228@weta.f00f.org> from "Chris Wedgwood" at Aug 1, 1 07:23:47 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 177 Lines: 9 Hello! > Why do we do this anyhow? I have no idea. This is too old facility to be remembered. Anyway, it is clear that echos are to be limited differently of errors. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:33:45 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJXj808719 for netdev-outgoing; Tue, 31 Jul 2001 12:33:45 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJXfV08716; Tue, 31 Jul 2001 12:33:41 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11175; Tue, 31 Jul 2001 23:33:20 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311933.XAA11175@ms2.inr.ac.ru> Subject: Re: ping bug To: cw@f00f.org (Chris Wedgwood) Date: Tue, 31 Jul 2001 23:33:20 +0400 (MSK DST) Cc: matti.aarnio@zmailer.org, ralf@oss.sgi.com, netdev@oss.sgi.com In-Reply-To: <20010801072045.A8228@weta.f00f.org> from "Chris Wedgwood" at Aug 1, 1 07:20:45 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 398 Lines: 16 Hello! > That said, I don't see wht the v* stuff can't be a simple library Do you think they do not know this? :-) Before all, it is wonderful that some drivers for linux exist at all. :-) > What modem and source Notebook, some builtin pctel card. NB DSL, Russia etc. have nothing to do with this. I have one, which must work, otherwise I should blame on myself that bought this. :-) Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:34:19 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJYJ308816 for netdev-outgoing; Tue, 31 Jul 2001 12:34:19 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJYIV08813 for ; Tue, 31 Jul 2001 12:34:18 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id AEAE215882; Wed, 1 Aug 2001 07:34:41 +1200 (NZST) Date: Wed, 1 Aug 2001 07:34:41 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801073441.E8228@weta.f00f.org> References: <200107311925.XAA11038@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311925.XAA11038@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 414 Lines: 14 On Tue, Jul 31, 2001 at 11:25:50PM +0400, kuznet@ms2.inr.ac.ru wrote: Anyway, it is clear that echos are to be limited differently of errors. Even then I wonder if it is worth the code. If you are rate-limiting, who cares if drop the odd echo/reply? ICMP echo/reply is a useful diagnostic tool --- but on the internet as we have it today, its limitations need to be understood by the user :) --cw From owner-netdev@oss.sgi.com Tue Jul 31 12:37:39 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJbdY08932 for netdev-outgoing; Tue, 31 Jul 2001 12:37:39 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJbaV08929; Tue, 31 Jul 2001 12:37:36 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id D92DB15885; Wed, 1 Aug 2001 07:38:09 +1200 (NZST) Date: Wed, 1 Aug 2001 07:38:09 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: matti.aarnio@zmailer.org, ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010801073809.F8228@weta.f00f.org> References: <200107311933.XAA11175@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311933.XAA11175@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 927 Lines: 31 On Tue, Jul 31, 2001 at 11:33:20PM +0400, kuznet@ms2.inr.ac.ru wrote: > That said, I don't see wht the v* stuff can't be a simple library Do you think they do not know this? :-) I've not seen the code, so I know very little. The only one I worked with was the the lucent chips, which required hackery to serial.c and other evil things. Before all, it is wonderful that some drivers for linux exist at all. :-) Indeed :) Linux has support for IBM (mwave), Lucent and PC-Tel is various forms which is rather nice. But none of these AFAICT need to bloat the kernel much at all. NB DSL, Russia etc. have nothing to do with this. I have one, which must work, otherwise I should blame on myself that bought this. :-) Ah, this is why I got my Lucent modem going --- but I found it wasn't as reliably as I had wished so I used a PCCard anyhow :( I hope your is more reliable than mine! --cw From owner-netdev@oss.sgi.com Tue Jul 31 12:37:47 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJblQ08972 for netdev-outgoing; Tue, 31 Jul 2001 12:37:47 -0700 Received: from ms2.inr.ac.ru (minus.inr.ac.ru [193.233.7.97]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJbjV08969 for ; Tue, 31 Jul 2001 12:37:45 -0700 Received: (from kuznet@localhost) by ms2.inr.ac.ru (8.6.13/ANK) id XAA11313; Tue, 31 Jul 2001 23:37:07 +0400 From: kuznet@ms2.inr.ac.ru Message-Id: <200107311937.XAA11313@ms2.inr.ac.ru> Subject: Re: missing icmp errors for udp packets To: cw@f00f.org (Chris Wedgwood) Date: Tue, 31 Jul 2001 23:37:06 +0400 (MSK DST) Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com In-Reply-To: <20010801073441.E8228@weta.f00f.org> from "Chris Wedgwood" at Aug 1, 1 07:34:41 am X-Mailer: ELM [version 2.4 PL24] MIME-Version: 1.0 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 324 Lines: 11 Hello! > ICMP echo/reply is a useful diagnostic tool --- but on the internet as > we have it today, its limitations need to be understood by the user :) But what do you propose eventually? :-) To bind all of them together? Then kernel must be shipped out without rate-limiting enabled by default, that's problem. Alexey From owner-netdev@oss.sgi.com Tue Jul 31 12:41:00 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJf0709152 for netdev-outgoing; Tue, 31 Jul 2001 12:41:00 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJexV09149 for ; Tue, 31 Jul 2001 12:40:59 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 7CB1E1588C; Wed, 1 Aug 2001 07:41:32 +1200 (NZST) Date: Wed, 1 Aug 2001 07:41:32 +1200 From: Chris Wedgwood To: kuznet@ms2.inr.ac.ru Cc: therapy@endorphin.org, pekkas@netcore.fi, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801074132.G8228@weta.f00f.org> References: <200107311937.XAA11313@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311937.XAA11313@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 662 Lines: 28 On Tue, Jul 31, 2001 at 11:37:06PM +0400, kuznet@ms2.inr.ac.ru wrote: To bind all of them together? Sure... why not? The kernel normally does one of two things --- multiplex hardware resources for applications or --- cheap router thing "really good ping responder" is a pointless purpose. Then kernel must be shipped out without rate-limiting enabled by default, that's problem. I guess I missed something. That doesn't seem like a problem to me... and if you need to ship with a rate by default, then ship with a very-high rate. I've never managed to respond to more than 60,000 ICMP packets/second, so I suggest 60,001. --cw From owner-netdev@oss.sgi.com Tue Jul 31 12:45:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJjFR09360 for netdev-outgoing; Tue, 31 Jul 2001 12:45:15 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJjEV09357 for ; Tue, 31 Jul 2001 12:45:14 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 9335A15890; Wed, 1 Aug 2001 07:45:47 +1200 (NZST) Date: Wed, 1 Aug 2001 07:45:47 +1200 From: Chris Wedgwood To: Pekka Savola Cc: kuznet@ms2.inr.ac.ru, therapy@endorphin.org, netdev@oss.sgi.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801074547.H8228@weta.f00f.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 739 Lines: 22 On Tue, Jul 31, 2001 at 10:12:12PM +0300, Pekka Savola wrote: Me neither. It would make 'ping -f' testing of your ISP's connections rather inconvenient ;-) ... As someone who until recently was involved in architecture and planning for a large ISP/carrier who's network spanned 3 continents (I just like saying that, it sounds better than it really is!) I can tell you plenty of people use similar tests. They are bogus. As is traceroute. ping & traceroute are very useful, but there results can often be misleading. For example, cisco routers, of which sadly there are a few still in use, do no respond to ICMP packets terribly reliably when they are busy, which is pretty reasonable (the route packets instead). --cw From owner-netdev@oss.sgi.com Tue Jul 31 12:49:15 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VJnFQ09502 for netdev-outgoing; Tue, 31 Jul 2001 12:49:15 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VJnBV09499 for ; Tue, 31 Jul 2001 12:49:12 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VJmbN20970; Tue, 31 Jul 2001 22:48:38 +0300 Date: Tue, 31 Jul 2001 22:48:37 +0300 (EEST) From: Pekka Savola To: Chris Wedgwood cc: , , Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010801074547.H8228@weta.f00f.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1381 Lines: 33 On Wed, 1 Aug 2001, Chris Wedgwood wrote: > On Tue, Jul 31, 2001 at 10:12:12PM +0300, Pekka Savola wrote: > > Me neither. It would make 'ping -f' testing of your ISP's > connections rather inconvenient ;-) ... > > As someone who until recently was involved in architecture and > planning for a large ISP/carrier who's network spanned 3 continents (I > just like saying that, it sounds better than it really is!) I can > tell you plenty of people use similar tests. > > They are bogus. As is traceroute. > > ping & traceroute are very useful, but there results can often be > misleading. > > For example, cisco routers, of which sadly there are a few still in > use, do no respond to ICMP packets terribly reliably when they are > busy, which is pretty reasonable (the route packets instead). Who said I was pinging Cisco routers? If I have two servers 100 ms off each other, I make them 'ping -f' each other. This does test the infrastructure and forwarding capabilities a bit. Traceroute isn't optimal as you noted, as the routers have to pull the packet with expiring TTL off the "fast path", and this is often subject to the rate-limiting considerations also. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 13:00:06 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VK06p09831 for netdev-outgoing; Tue, 31 Jul 2001 13:00:06 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VK03V09827 for ; Tue, 31 Jul 2001 13:00:03 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VJxdk21032; Tue, 31 Jul 2001 22:59:39 +0300 Date: Tue, 31 Jul 2001 22:59:39 +0300 (EEST) From: Pekka Savola To: Chris Wedgwood cc: , , , , Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010801074132.G8228@weta.f00f.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1581 Lines: 39 On Wed, 1 Aug 2001, Chris Wedgwood wrote: > --- cheap router thing > > "really good ping responder" is a pointless purpose. bad ping responder == bad PR ;-) And anyway, who is anyone to judge what the system should be used for? I want a system to respond to ping without limitations; it's good for debugging, diagnostics, etc. If I want, I can just filter the requests out, or rate-limit the responses. However, ICMP error messages cannot be effectively filtered; they may happen due to TTL=0 when forwarding, legit or illegit UDP connection etc.; only way to effectively limit them is by rate-limiting. If rate-limiting with informational and error types are the same, we have an inflexible situation here. > Then kernel must be shipped out without rate-limiting enabled by > default, that's problem. > > I guess I missed something. That doesn't seem like a problem to > me... and if you need to ship with a rate by default, then ship with a > very-high rate. I've never managed to respond to more than 60,000 > ICMP packets/second, so I suggest 60,001. Yes you did. 60,000 responses/sec is effectively no protection at all, and most people would appeaciate protection for the error messages, which are crucial to the working of TCP/IP; not so with informational ICMP messages. And by the way, rate-limiting ICMP error messages is a MUST item for IPv6. -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 13:11:11 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VKBBv10135 for netdev-outgoing; Tue, 31 Jul 2001 13:11:11 -0700 Received: from zmailer.org (mail.zmailer.org [194.252.70.162]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VKB6V10132; Tue, 31 Jul 2001 13:11:07 -0700 Received: (mea@zmailer.org) by mail.zmailer.org id ; Tue, 31 Jul 2001 23:10:57 +0300 Date: Tue, 31 Jul 2001 23:10:57 +0300 From: Matti Aarnio To: kuznet@ms2.inr.ac.ru Cc: Chris Wedgwood , matti.aarnio@zmailer.org, ralf@oss.sgi.com, netdev@oss.sgi.com Subject: Re: ping bug Message-ID: <20010731231057.A2650@mea-ext.zmailer.org> References: <20010801064556.A8101@weta.f00f.org> <200107311913.XAA10783@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <200107311913.XAA10783@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Tue, Jul 31, 2001 at 11:13:47PM +0400 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 682 Lines: 22 On Tue, Jul 31, 2001 at 11:13:47PM +0400, kuznet@ms2.inr.ac.ru wrote: > Hello! > > v* protocol stuff is binary. > > for the uneducated such as myself, WTF is "v* protocol stuff" ? > > Well, V.22, V.32 et at. The V.22, V.32, V.90, ... are DSP (modulation) stuff, although in case of V.90 that boarders protocol stuff. V.42/V.42bis are definitely protocol material (error correction, and compression). Nevertheless, that binary stuff should sit in userspace, and only raw hardware driving in kernel. ... but probably there won't be generic device API because there are so many vendors ... ... and now we are way off-ping ... > Alexey /Matti Aarnio From owner-netdev@oss.sgi.com Tue Jul 31 13:16:44 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VKGiI10314 for netdev-outgoing; Tue, 31 Jul 2001 13:16:44 -0700 Received: from ghanima.endorphin.org ([62.116.8.197]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VKGcV10304 for ; Tue, 31 Jul 2001 13:16:38 -0700 Received: (qmail 490 invoked by uid 1000); 31 Jul 2001 20:16:35 -0000 Date: Tue, 31 Jul 2001 22:16:35 +0200 From: clemens To: kuznet@ms2.inr.ac.ru Cc: pekkas@netcore.fi, cw@f00f.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org Subject: Re: missing icmp errors for udp packets Message-ID: <20010731221635.A471@ghanima.endorphin.org> References: <200107311857.WAA10162@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="2oS5YaxWCcQjTEyO" Content-Disposition: inline In-Reply-To: <200107311857.WAA10162@ms2.inr.ac.ru> User-Agent: Mutt/1.3.18i Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 9082 Lines: 205 --2oS5YaxWCcQjTEyO Content-Type: text/plain; charset=us-ascii Content-Disposition: inline On Tue, Jul 31, 2001 at 10:57:34PM +0400, kuznet@ms2.inr.ac.ru wrote: > > consequently since there is only one token bucket, there can only be one > > icmp rate limit. we can add a icmp type mask to enable/disable rate limiting > > for certain types. > Yes. Logically this is 100% right. Also, see below. > please give this draft-like patch a try. here at my box it does correct limiting and omits limiting for unfiltered types like echo-reply. clemens --2oS5YaxWCcQjTEyO Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="icmp-global-rate.patch" diff -u linux-sane/net/ipv4/icmp.c linux/net/ipv4/icmp.c --- linux-sane/net/ipv4/icmp.c Thu Jun 21 06:00:55 2001 +++ linux/net/ipv4/icmp.c Tue Jul 31 22:01:13 2001 @@ -16,6 +16,9 @@ * Other than that this module is a complete rewrite. * * Fixes: + * Clemens Fruhwirth : introduce global icmp rate limiting + * with filter mask ability instead + * of unclean mixed icmp timeouts. * Mike Shaver : RFC1122 checks. * Alan Cox : Multicast ping reply as self. * Alan Cox : Fix atomicity lockup in ip_build_xmit @@ -145,6 +148,20 @@ /* Control parameter - ignore bogus broadcast responses? */ int sysctl_icmp_ignore_bogus_error_responses; +/* + * Configurable rate limits. + * Someone should check if these default values are correct. + * Note that these values interact with the routing cache GC timeout. + * If you chose them too high they won't take effect, because the + * dst_entry gets expired too early. The same should happen when + * the cache grows too big. + */ +//int sysctl_icmp_destunreach_time = 1*HZ; +//int sysctl_icmp_timeexceed_time = 1*HZ; +//int sysctl_icmp_paramprob_time = 1*HZ; +//int sysctl_icmp_echoreply_time; /* don't limit it per default. */ +int sysctl_icmp_ratelimit = 1*HZ; + /* * ICMP control array. This specifies what to do with each ICMP. */ @@ -155,7 +172,7 @@ unsigned long *input; /* Address to increment on input */ void (*handler)(struct sk_buff *skb); short error; /* This ICMP is classed as an error message */ - int *timeout; /* Rate limit */ +// int *timeout; /* Rate limit */ }; static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1]; @@ -257,7 +270,7 @@ { struct dst_entry *dst = &rt->u.dst; - if (type > NR_ICMP_TYPES || !icmp_pointers[type].timeout) + if (type > NR_ICMP_TYPES) return 1; /* Don't limit PMTU discovery. */ @@ -272,7 +285,15 @@ if (dst->dev && (dst->dev->flags&IFF_LOOPBACK)) return 1; - return xrlim_allow(dst, *(icmp_pointers[type].timeout)); +#define sysctl_icmp_filtermask 0x1818 + +// filters destunreach (0x03), source quench (0x04) +// time exceed (0x11), paraprob (0x12) + + if((1 << type) & sysctl_icmp_filtermask) + return xrlim_allow(dst,sysctl_icmp_ratelimit); + else + return 1; } /* @@ -929,18 +950,7 @@ } -/* - * Configurable rate limits. - * Someone should check if these default values are correct. - * Note that these values interact with the routing cache GC timeout. - * If you chose them too high they won't take effect, because the - * dst_entry gets expired too early. The same should happen when - * the cache grows too big. - */ -int sysctl_icmp_destunreach_time = 1*HZ; -int sysctl_icmp_timeexceed_time = 1*HZ; -int sysctl_icmp_paramprob_time = 1*HZ; -int sysctl_icmp_echoreply_time; /* don't limit it per default. */ + /* * This table is the definition of how we handle ICMP. @@ -948,37 +958,37 @@ static struct icmp_control icmp_pointers[NR_ICMP_TYPES+1] = { /* ECHO REPLY (0) */ - { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0, &sysctl_icmp_echoreply_time}, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchoReps, &icmp_statistics[0].IcmpInEchoReps, icmp_discard, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* DEST UNREACH (3) */ - { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1, &sysctl_icmp_destunreach_time }, + { &icmp_statistics[0].IcmpOutDestUnreachs, &icmp_statistics[0].IcmpInDestUnreachs, icmp_unreach, 1 }, /* SOURCE QUENCH (4) */ - { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1, }, + { &icmp_statistics[0].IcmpOutSrcQuenchs, &icmp_statistics[0].IcmpInSrcQuenchs, icmp_unreach, 1 }, /* REDIRECT (5) */ - { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutRedirects, &icmp_statistics[0].IcmpInRedirects, icmp_redirect, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* ECHO (8) */ - { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, - { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1, }, + { &icmp_statistics[0].IcmpOutEchos, &icmp_statistics[0].IcmpInEchos, icmp_echo, 0 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].IcmpInErrors, icmp_discard, 1 }, /* TIME EXCEEDED (11) */ - { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1, &sysctl_icmp_timeexceed_time }, + { &icmp_statistics[0].IcmpOutTimeExcds, &icmp_statistics[0].IcmpInTimeExcds, icmp_unreach, 1 }, /* PARAMETER PROBLEM (12) */ - { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1, &sysctl_icmp_paramprob_time }, + { &icmp_statistics[0].IcmpOutParmProbs, &icmp_statistics[0].IcmpInParmProbs, icmp_unreach, 1 }, /* TIMESTAMP (13) */ - { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0, }, + { &icmp_statistics[0].IcmpOutTimestamps, &icmp_statistics[0].IcmpInTimestamps, icmp_timestamp, 0 }, /* TIMESTAMP REPLY (14) */ - { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0, }, + { &icmp_statistics[0].IcmpOutTimestampReps, &icmp_statistics[0].IcmpInTimestampReps, icmp_discard, 0 }, /* INFO (15) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* INFO REPLY (16) */ - { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0, }, + { &icmp_statistics[0].dummy, &icmp_statistics[0].dummy, icmp_discard, 0 }, /* ADDR MASK (17) */ - { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0, }, + { &icmp_statistics[0].IcmpOutAddrMasks, &icmp_statistics[0].IcmpInAddrMasks, icmp_address, 0 }, /* ADDR MASK REPLY (18) */ - { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0, } + { &icmp_statistics[0].IcmpOutAddrMaskReps, &icmp_statistics[0].IcmpInAddrMaskReps, icmp_address_reply, 0 } }; void __init icmp_init(struct net_proto_family *ops) Common subdirectories: linux-sane/net/ipv4/netfilter and linux/net/ipv4/netfilter diff -u linux-sane/net/ipv4/sysctl_net_ipv4.c linux/net/ipv4/sysctl_net_ipv4.c --- linux-sane/net/ipv4/sysctl_net_ipv4.c Mon Mar 26 04:14:25 2001 +++ linux/net/ipv4/sysctl_net_ipv4.c Tue Jul 31 21:44:56 2001 @@ -32,10 +32,14 @@ extern int sysctl_ip_dynaddr; /* From icmp.c */ +/* extern int sysctl_icmp_destunreach_time; extern int sysctl_icmp_timeexceed_time; extern int sysctl_icmp_paramprob_time; extern int sysctl_icmp_echoreply_time; +*/ +extern int sysctl_icmp_ratelimit; +extern int sysctl_icmp_filtermask; /* From igmp.c */ extern int sysctl_igmp_max_memberships; @@ -178,6 +182,7 @@ {NET_IPV4_ICMP_IGNORE_BOGUS_ERROR_RESPONSES, "icmp_ignore_bogus_error_responses", &sysctl_icmp_ignore_bogus_error_responses, sizeof(int), 0644, NULL, &proc_dointvec}, +/* {NET_IPV4_ICMP_DESTUNREACH_RATE, "icmp_destunreach_rate", &sysctl_icmp_destunreach_time, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_IPV4_ICMP_TIMEEXCEED_RATE, "icmp_timeexceed_rate", @@ -187,6 +192,7 @@ {NET_IPV4_ICMP_ECHOREPLY_RATE, "icmp_echoreply_rate", &sysctl_icmp_echoreply_time, sizeof(int), 0644, NULL, &proc_dointvec}, {NET_IPV4_ROUTE, "route", NULL, 0, 0555, ipv4_route_table}, +*/ #ifdef CONFIG_IP_MULTICAST {NET_IPV4_IGMP_MAX_MEMBERSHIPS, "igmp_max_memberships", &sysctl_igmp_max_memberships, sizeof(int), 0644, NULL, &proc_dointvec}, --2oS5YaxWCcQjTEyO-- From owner-netdev@oss.sgi.com Tue Jul 31 13:48:30 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VKmUi10840 for netdev-outgoing; Tue, 31 Jul 2001 13:48:30 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VKmQV10837 for ; Tue, 31 Jul 2001 13:48:28 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id CA0C0158B3; Wed, 1 Aug 2001 08:48:59 +1200 (NZST) Date: Wed, 1 Aug 2001 08:48:59 +1200 From: Chris Wedgwood To: Pekka Savola Cc: kuznet@ms2.inr.ac.ru, therapy@endorphin.org, netdev@oss.sgi.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801084859.B8400@weta.f00f.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1133 Lines: 31 On Tue, Jul 31, 2001 at 10:48:37PM +0300, Pekka Savola wrote: Who said I was pinging Cisco routers? If I have two servers 100 ms off each other, I make them 'ping -f' each other. This does test the infrastructure and forwarding capabilities a bit. Well... in some instances, its a very good approxamation. The point I was making was the ping (and ICMP echo/reply) should be treated as indicitive only. For this reason, I see no harm in linux dropping the odd ping packet if it has something better to do. Traceroute isn't optimal as you noted, as the routers have to pull the packet with expiring TTL off the "fast path", and this is often subject to the rate-limiting considerations also. Well... it depends on the router. Decent modern routers use ASICs for most of their functionality (even tricky stuff like fragmentation is largely handled in silicon) and have processors above the ASIC switching layer for even more complex stuff and can sustain incredibly high rates of crud going through them. Anyhow, I think this thread has been hammered enough so follows off the list perhaps. --cw From owner-netdev@oss.sgi.com Tue Jul 31 13:53:09 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VKr9r10992 for netdev-outgoing; Tue, 31 Jul 2001 13:53:09 -0700 Received: from weta.f00f.org (weta.f00f.org [203.167.249.89]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VKr8V10989 for ; Tue, 31 Jul 2001 13:53:08 -0700 Received: by weta.f00f.org (Postfix, from userid 1000) id 16410158BB; Wed, 1 Aug 2001 08:53:36 +1200 (NZST) Date: Wed, 1 Aug 2001 08:53:36 +1200 From: Chris Wedgwood To: Pekka Savola Cc: kuznet@ms2.inr.ac.ru, therapy@endorphin.org, netdev@oss.sgi.com, linux-kernel@vger.kernel.org, davem@redhat.com Subject: Re: missing icmp errors for udp packets Message-ID: <20010801085336.C8400@weta.f00f.org> References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.3.18i X-No-Archive: Yes Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1147 Lines: 34 On Tue, Jul 31, 2001 at 10:59:39PM +0300, Pekka Savola wrote: bad ping responder == bad PR ;-) And anyway, who is anyone to judge what the system should be used for? I want a system to respond to ping without limitations; it's good for debugging, diagnostics, etc. If I want, I can just filter the requests out, or rate-limit the responses. People who want to do strange stuff can tweak via sysctl. However, ICMP error messages cannot be effectively filtered; they may happen due to TTL=0 when forwarding, legit or illegit UDP connection etc.; only way to effectively limit them is by rate-limiting. If rate-limiting with informational and error types are the same, we have an inflexible situation here. Networks are lossy, you can spill the odd packet anyhow. It was just a suggestion that we merge all ICMP rate-limiting for simplicity, I don't see it being an issue for the majority of users. Perhaps I am wrong, in which case DaveM and Alexey will ignore me :) I really don't see the need to continue to discuss this further on the list, but by all means flame me in private! --cw From owner-netdev@oss.sgi.com Tue Jul 31 13:58:08 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VKw8c11254 for netdev-outgoing; Tue, 31 Jul 2001 13:58:08 -0700 Received: from netcore.fi (netcore.fi [193.94.160.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VKw5V11251 for ; Tue, 31 Jul 2001 13:58:05 -0700 Received: from localhost (pekkas@localhost) by netcore.fi (8.11.1/8.11.1) with ESMTP id f6VKvjj21470; Tue, 31 Jul 2001 23:57:46 +0300 Date: Tue, 31 Jul 2001 23:57:45 +0300 (EEST) From: Pekka Savola To: Chris Wedgwood cc: , Subject: Re: missing icmp errors for udp packets In-Reply-To: <20010801085336.C8400@weta.f00f.org> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 422 Lines: 11 On Wed, 1 Aug 2001, Chris Wedgwood wrote: > I really don't see the need to continue to discuss this further on the > list, but by all means flame me in private! I can perform the flaming if you bring the bananas. :-) -- Pekka Savola "Tell me of difficulties surmounted, Netcore Oy not those you stumble over and fall" Systems. Networks. Security. -- Robert Jordan: A Crown of Swords From owner-netdev@oss.sgi.com Tue Jul 31 14:38:27 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VLcRr12583 for netdev-outgoing; Tue, 31 Jul 2001 14:38:27 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VLcPV12577 for ; Tue, 31 Jul 2001 14:38:25 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15RhDt-0005vU-00 for netdev@oss.sgi.com; Tue, 31 Jul 2001 23:38:41 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15RSrC-0005bX-00; Tue, 31 Jul 2001 03:18:18 -0300 Date: Tue, 31 Jul 2001 03:18:18 -0300 From: Harald Welte To: Benjamin Herrenschmidt Cc: , Subject: Re: airport reset on iBook2 Message-ID: <20010731031818.K1486@obroa-skai.gnumonks.org> References: <20010728003701.H1240@obroa-skai.gnumonks.org> <20010730143236.16877@smtp.wanadoo.fr> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <20010730143236.16877@smtp.wanadoo.fr>; from benh@kernel.crashing.org on Mon, Jul 30, 2001 at 04:32:36PM +0200 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7 X-Date: Today is Setting Orange, the 64th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1043 Lines: 23 On Mon, Jul 30, 2001 at 04:32:36PM +0200, Benjamin Herrenschmidt wrote: > >I can confirm that happen from time to time with my Sony VAIO picturebook > >C1XD and a Lucent WaveLAN/IEEE board. > > > >unloading the whole pcmcia subsystem (incl. yenta_socket) and re-loading > >everything helps. > > What is your card's firmware version ? (displayed by the driver during > boot). You may want to go to MacOS, upgrade to Apple's latest airport > software, and back to linux. This will upgrade your card's firmware to > the latest version provided by Apple, which might help. erm, as I said, I have a Lucent card and a x86 box. There is no MacOS and I don't want to write an AirPort firmware to the Lucent card. > Ben. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Tue Jul 31 14:38:35 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VLcZ712637 for netdev-outgoing; Tue, 31 Jul 2001 14:38:35 -0700 Received: from coruscant.gnumonks.org (mail@coruscant.franken.de [193.174.159.226]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VLcPV12578 for ; Tue, 31 Jul 2001 14:38:25 -0700 Received: from uucp by coruscant.gnumonks.org with local-bsmtp (Exim 3.22 #1) id 15RhDu-0005vt-00 for netdev@oss.sgi.com; Tue, 31 Jul 2001 23:38:42 +0200 Received: from laforge by obroa-skai.gnumonks.org with local (Exim 3.22 #1) id 15RTAH-0005c8-00; Tue, 31 Jul 2001 03:38:01 -0300 Date: Tue, 31 Jul 2001 03:38:01 -0300 From: Harald Welte To: kuznet@ms2.inr.ac.ru Cc: netdev@oss.sgi.com Subject: Re: Fw: oops/bug in tcp, SACK doesn't work? Message-ID: <20010731033801.M1486@obroa-skai.gnumonks.org> References: <20010728004447.I1240@obroa-skai.gnumonks.org> <200107291653.UAA18260@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.17i In-Reply-To: <200107291653.UAA18260@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Sun, Jul 29, 2001 at 08:53:36PM +0400 X-Operating-System: Linux obroa-skai.gnumonks.org 2.4.7 X-Date: Today is Setting Orange, the 64th day of Confusion in the YOLD 3167 Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 1901 Lines: 46 On Sun, Jul 29, 2001 at 08:53:36PM +0400, Alexey Kuznetsov wrote: > Hello! > > > Please note that the netfilter nat protocol helpers for ftp (and irc as well as > > other protocols in patch-o-matic) delete the SACKPERM option on-the-fly > > from all packets. > > Then Marty would not see any sacks at all. > > > > It has to, as you run in neverending complications as soon as the nat helper > > has to alter the tcp sequence numbers, etc. > > It is not a valid justification. It is difficult to rewrite sequence numbers. > As soon as nat does this, rewriting sacks is easy. Even not easy, trivial. not really. The issue is, that we only keep track of the last time a tcp sequence number was rewritten. Yes, that means that current netfilter NAT code does not cope correctly with all cases where you have more than one packet size alteration per window. So I'm not sure if enabling selective acknowledgements could make the situation worse than it is (given this precondition). At least after giving it some though, I cannot see how. I have written some improved conntrack/nat code (called multirel/newnat), which is currently in testing. This improved code will remember all packet size alterations and the exact tcp sequence number at which each of them occurred. > Sad and not expected behaviour. I used to ridicule commercial firewall > vendors, sometimes doing shit of this kind without any clear reasons. :-) Ok, I am willing to extend netfilter conntrack/nat in order to deal with SACK. It is really not about being too lazy to do it. > Alexey -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*) From owner-netdev@oss.sgi.com Tue Jul 31 15:11:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f6VMB3l13422 for netdev-outgoing; Tue, 31 Jul 2001 15:11:03 -0700 Received: from caroubier.wanadoo.fr (smtp-rt-6.wanadoo.fr [193.252.19.160]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f6VMB0V13419 for ; Tue, 31 Jul 2001 15:11:00 -0700 Received: from andira.wanadoo.fr (193.252.19.152) by caroubier.wanadoo.fr; 1 Aug 2001 00:10:54 +0200 Received: from [10.0.1.59] (217.128.74.18) by andira.wanadoo.fr; 1 Aug 2001 00:10:23 +0200 From: Benjamin Herrenschmidt To: Harald Welte , , Subject: Re: airport reset on iBook2 Date: Wed, 1 Aug 2001 01:10:32 +0200 Message-Id: <20010731231032.19768@smtp.wanadoo.fr> In-Reply-To: <20010731031818.K1486@obroa-skai.gnumonks.org> References: <20010731031818.K1486@obroa-skai.gnumonks.org> X-Mailer: CTM PowerMail 3.0.8 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 740 Lines: 18 >> What is your card's firmware version ? (displayed by the driver during >> boot). You may want to go to MacOS, upgrade to Apple's latest airport >> software, and back to linux. This will upgrade your card's firmware to >> the latest version provided by Apple, which might help. > >erm, as I said, I have a Lucent card and a x86 box. There is no MacOS >and I don't want to write an AirPort firmware to the Lucent card. Oops, sorry about that. Well, what is your firmware version anyway ? I know some have problems, and it may be wise to update using Lucent x86 based updaters. (Or maybe it's a too-new firmware you are running). You should really ask Jean Tourrihles or Daniel Gibson as they know much more about these issues. Ben. From owner-netdev@oss.sgi.com Tue Jul 31 18:29:04 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f711T4615750 for netdev-outgoing; Tue, 31 Jul 2001 18:29:04 -0700 Received: from gull.mail.pas.earthlink.net (gull.mail.pas.earthlink.net [207.217.121.85]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f711T1V15744 for ; Tue, 31 Jul 2001 18:29:01 -0700 Received: from earthlink.net (dialup-63.208.185.104.Dial1.Baltimore1.Level3.net [63.208.185.104]) by gull.mail.pas.earthlink.net (EL-8_9_3_3/8.9.3) with ESMTP id SAA03845; Tue, 31 Jul 2001 18:28:54 -0700 (PDT) Message-ID: <3B674CE8.30909@earthlink.net> Date: Tue, 31 Jul 2001 20:27:20 -0400 From: Brad Chapman User-Agent: Mozilla/5.0 (X11; U; Linux 2.4.7 i586; en-US; C-UPD: MaxLinux0301) Gecko/20001107 Netscape6/6.0 X-Accept-Language: en MIME-Version: 1.0 To: Harald Welte CC: netfilter-devel@lists.samba.org, netdev@oss.sgi.com Subject: Re: IPv6 fragmentation and IPv6 header parsing References: <3B64B076.6090709@earthlink.net> <20010729212317.I1486@obroa-skai.gnumonks.org> <3B65914B.3070403@earthlink.net> <20010731031710.J1486@obroa-skai.gnumonks.org> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 5600 Lines: 149 Mr. Harald, Harald Welte wrote: > On Mon, Jul 30, 2001 at 12:54:35PM -0400, Brad Chapman wrote: > >> That's exactly what I was afraid of. The problem is that in order to >> implement connection tracking _without_ fragment handling for IPv6, one would >> have to implement something like the options below: >> >> - No fragment handling; just configure your PMTU to a proper link size and >> NF_DROP fragmented packets; they belong to a broken implementation. >> This one is probably the most compliant with RFC2460, but if you must >> have fragmenting, you would need to run ip6_conntrack on an internal >> gateway and configure an external one to do relevant source fragging. > > > I guess you misunderstood something about IPv6. Having fragments is totally > legal, just routers don't fragment on-the-fly. But if one of the end nodes > of the connections want to fragment, they are allowed (but discouraged) > to do so. Well, okay. I was under the impression that if you configured your low-level hardware to a decent PMTU size and somebody was doing fragmenting, that it was broken and didn't deserve to be tracked. What do you mean by "discouraged" ? > > >> - skb_clone() packets. Hold original; defrag copy, push into >> connection tracking subsystem. If packet makes it through, delete copy >> and release original. This one is not as bad as defragging original >> packets, but it is in violation of RFC2460 and sounds kludgy and bloaty. > > > well... it is a hack, yes. If it really is in violation of the RFC - I'm not > sure. > > As long as the conntrack does not send the defragmented copy of the packet, > but just delays the delivery of the fragmented original packets it could be > ok, IMHO. (The defragmented packet would only be used for conntrack > internally). That could be a major problem. How much of a delay would be reasonable? Something like (time to pass through conntrack) + (time to skb_clone() packet) ? > > > Another idea would be to defragment the packet internally while forwarding > all fragments which don't have the 'final fragment' bit in the IPv6 header > set. After we receive the last fragment of the packet, we send the > internally defragged copy of the whole packet through conntrack. If the > decision of some policy says the packet is to be DROPPEd, we just never send > the last fragment, in which case the receiver has to drop all fragments after > 60 seconds (or something like that, RFC2460 is more precise). Ahhh. That makes sense. So you just grab the fragment header, and look for the final fragment bit. Thus, the code path would be: - fragmented packet arrives at ip6_conntrack_in() - ip6_conntrack_in() scans frag header, looking for `final fragment' bit - not there: send it onward there: stop forwarding - ip6_conntrack_in() calls ip6_ct_gather_frags(), who calls ip6_reassembly() - we send the defragmented packet through conntrack - NF_ACCEPT: send the final fragment onward NF_DROP: drop the final fragment Is this correct? If not, then please point me in the correct direction ;-) > > > I don't think that this is a good solution either, but just needs to be > taken into consideration. > >> - Rewrite the entire codebase to work properly with fragmented packets. >> Add functions like ip6_frag_scan() to scan fragmented packets, and >> change all packet-related stuff to use fragment functions to locate >> relevant packet data. TBH, that sounds frightening and personally, I >> wouldn't want to implement it. > > > Yes. This is what I would like to see in a 2.5.x rework of the whole > conntrack, while also lifting it to a layer-3-protocol independent layer > (which would be needed for IPv4 <-> IPv6 NAT needed by lots of transition > scenarios). Well, that's another thing entirely, and would have to wait until 2.5 is mandated. OT1: does anybody know if Linus thinks it's time to open a 2.5 tree? > > >> If given a choice, and told that doing constant defrag/refrag of IPv6 >> packets is slow, inefficient, and in violation of the RFCs (as I have been >> told), I would choose the first option. > > > I would most likely go for the 'defragment internally, delay fragments' > approach. You have to take care of possible DoS attacks, be aware. I agree. Anything like what we've just discussed would have to wait until you (or someone on the list) figured out a good way to select packets for trackage. OT2: any ideas? I can think of one already, but Henrik Nordstrom told me it would add some overhead. > > >> BTW, what about header parsing? Am I at least doing that correctly? > > > sorry, didn't have the time to go through that. Please have a look. Kis-Szabo Andras helped me out a lot on this, but I would also like to ask someone who lives, breathes, and eats IPv6 ;-) > > >> Brad >> >> P.S.: Mr. Harald, I didn't know you subscribed to netdev.... > > > ;) well, I guess everybody related to linux networking development is > subscribed here. And certainly all netfilter core team members fit within > that group... Maybe I should subscribe as well..... ;-) Anyway, AOTB is great discussion for now, but I have started to play with the code and have found some more bugs in either ip6_conntrack, the ethertap setup, or the testsuite packet generators. Plus, the ip6t_state module is giving me fits about `unrecognized arguments' >:-( I'll have to compile with -D0 and pore over the logs for a while...... Brad > From owner-netdev@oss.sgi.com Tue Jul 31 19:37:17 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f712bHk18101 for netdev-outgoing; Tue, 31 Jul 2001 19:37:17 -0700 Received: from dea.waldorf-gmbh.de (u-246-21.karlsruhe.ipdial.viaginterkom.de [62.180.21.246]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f712bBV18098 for ; Tue, 31 Jul 2001 19:37:12 -0700 Received: (from ralf@localhost) by dea.waldorf-gmbh.de (8.11.1/8.11.1) id f712adh17422; Wed, 1 Aug 2001 04:36:39 +0200 Date: Wed, 1 Aug 2001 04:36:38 +0200 From: Ralf Baechle To: kuznet@ms2.inr.ac.ru Cc: Jacob Avraham , netdev@oss.sgi.com Subject: Re: conflicting alignment requirements Message-ID: <20010801043638.A17397@bacchus.dhis.org> References: <200107311712.VAA04463@ms2.inr.ac.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2.5i In-Reply-To: <200107311712.VAA04463@ms2.inr.ac.ru>; from kuznet@ms2.inr.ac.ru on Tue, Jul 31, 2001 at 09:12:22PM +0400 X-Accept-Language: de,en,fr Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 563 Lines: 14 On Tue, Jul 31, 2001 at 09:12:22PM +0400, kuznet@ms2.inr.ac.ru wrote: > > copy the packet to a fresh skb (rx_copybreak = 0), the packet will > > traverse the net layer with unalinged IP header. > > Doing this for an arch which traps wrong alignment, you can expect > everything (except for crash, which could be bug). Afaik all such architectures have exception handlers to complete the access transparently in software. Such an access is very slow so where more frequent unaligned accesses are expected there are get_unaligned() and put_unaligned(). Ralf From owner-netdev@oss.sgi.com Tue Jul 31 21:42:03 2001 Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f714g3421984 for netdev-outgoing; Tue, 31 Jul 2001 21:42:03 -0700 Received: from sgi.com (sgi.SGI.COM [192.48.153.1]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f714g0V21979 for ; Tue, 31 Jul 2001 21:42:00 -0700 Received: from wiprom2mx1.wipro.com (wiprom2mx1.wipro.com [203.197.164.41]) by sgi.com (980327.SGI.8.8.8-aspam/980304.SGI-aspam: SGI does not authorize the use of its proprietary systems or networks for unsolicited or bulk email from the Internet.) via ESMTP id VAA09593 for ; Tue, 31 Jul 2001 21:41:37 -0700 (PDT) mail_from (kamaldeep.dham@wipro.com) Received: from m2vwall2.wipro.com (m2vwall2.wipro.com [192.168.235.4]) by wiprom2mx1.wipro.com (8.10.2+Sun/8.11.3) with SMTP id f71ACl901574 for ; Wed, 1 Aug 2001 10:12:47 GMT Received: from wipro.com ([10.113.1.104]) by ggnmail.mail.wipro.com (Netscape Messaging Server 4.15) with ESMTP id GHDGQQ00.V1W for ; Wed, 1 Aug 2001 10:05:14 +0530 Message-ID: <3B67816B.89049948@wipro.com> Date: Wed, 01 Aug 2001 09:41:23 +0530 From: Kamal Deep Dham Reply-To: kamaldeep.dham@wipro.com Organization: Wipro Technologies X-Mailer: Mozilla 4.75 [en] (X11; U; Linux 2.4.2-IPv6 i686) X-Accept-Language: en MIME-Version: 1.0 To: netdev@oss.sgi.com Subject: kernel module problem. Content-Type: multipart/mixed; boundary="------------InterScan_NT_MIME_Boundary" Sender: owner-netdev@oss.sgi.com Precedence: bulk Content-Length: 4646 Lines: 179 This is a multi-part message in MIME format. --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi , I have written a small kernel module to learn the basics of it. ======================================================= #include #include int my_ioctl(struct net_device *dev, struct ifreq *ifr,int cmd); int my_open(struct net_device *dev); int my_init(struct net_device *dev) { if(dev == NULL) { printk("dev = null\n"); } dev->open = my_open; dev->do_ioctl = my_ioctl; printk("my_init() called\n"); return 0; } int my_ioctl(struct net_device *dev, struct ifreq *ifr,int cmd) { printk("my_ioctl() called witih cmd = %d\n",cmd); return 0; } int my_open(struct net_device *dev) { printk("my_open() called\n"); return 1; } static struct net_device myDev = { "kamal\0 ", 0, 0, 0, 0, 0, 0, 0, 0, 0, NULL, my_init }; int init_module(void) { int err; printk("init_module() called\n"); if(dev_get_by_name(myDev.name)==NULL) { err = register_netdev(&myDev); if (err) printk("Error Message ::%d\n",err); } else { printk("Dev Already Exist\n"); } return 0; } void cleanup_module(void) { unregister_netdev(&myDev); printk("cleanup_module() called.\n"); } ==================================================== I am trying to call the my_ioctl() function through SIOCDEVPRIVATE ioctl command from user space. My user space program looks like this: ===================================================== #include #include #include #include #include #include #include #include #define LA_COMMAND SIOCDEVPRIVATE+2 int main() { int s,err; int Arg=100; struct ifreq IfReq; s=socket(AF_INET, SOCK_STREAM, 0); /* s = socket(AF_INET, SOCK_RAW, 255); */ if(s<0) { printf("Unable To Open The Socket\n"); return 0; } strcpy(IfReq.ifr_name,"kamal"); IfReq.ifr_data = (void*)&Arg; err = ioctl(s,LA_COMMAND, &IfReq); if(err < 0) { printf("Error Occured :: %d\n",errno); perror(""); return 0; } } ==================================================== But I am getting the following error after its execution: # Error Occured :: 95 Operation not supported In file /usr/src/linux/net/core/dev.c static int dev_ifsioc(struct ifreq *ifr, unsigned int cmd) { . . . . . . . default: if ((cmd >= SIOCDEVPRIVATE && cmd <= SIOCDEVPRIVATE + 15) || cmd == SIOCETHTOOL) { if (dev->do_ioctl) { if (!netif_device_present(dev)) return -ENODEV; return dev->do_ioctl(dev, ifr, cmd); } return -EOPNOTSUPP; ==>> error is returnded from here } . . } I found that in default case, dev->do_ioctl is comming NULL and hence EOPNOTSUPP error is returned. can anyone have idea why this error is there ?? -Kamal --------------InterScan_NT_MIME_Boundary Content-Type: text/plain; name="Wipro_Disclaimer.txt" Content-Transfer-Encoding: 7bit Content-Disposition: attachment; filename="Wipro_Disclaimer.txt" The Information contained and transmitted by this E-MAIL is proprietary to Wipro Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail & notify us immediately at mailadmin@wipro.com --------------InterScan_NT_MIME_Boundary--