netdev
[Top] [All Lists]

Re: [RFR] gianfar ethernet driver

To: Andy Fleming <afleming@xxxxxxxxxxxxx>
Subject: Re: [RFR] gianfar ethernet driver
From: Jeff Garzik <jgarzik@xxxxxxxxx>
Date: Wed, 07 Jul 2004 01:27:40 -0400
Cc: Kumar Gala <kumar.gala@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxx, dwmw2@xxxxxxxxxxxxx, hadi@xxxxxxxxxx
In-reply-to: <89563A5C-CFAE-11D8-BA44-000393C30512@xxxxxxxxxxxxx>
References: <C681B01E-CEA9-11D8-931F-000393DBC2E8@xxxxxxxxxxxxx> <89563A5C-CFAE-11D8-BA44-000393C30512@xxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040510
Andy Fleming wrote:
7) ethtool_ops should not be kmalloc'd.
+       dev_ethtool_ops =
+ (struct ethtool_ops *)kmalloc(sizeof(struct ethtool_ops),
+                                             GFP_KERNEL);


Is there a particular reason? The reason I did it this way is because the driver supports a 10/100 controller which does not have the RMON statistics, and therefore should not read from that register space. So I needed to use different functions to fill in the statistics lists.

Just switch out one static pointer for another, once you know the hardware you're dealing with. You _must_ have that knowledge anyway, before calling register_netdev(), otherwise you're got a race in initialization versus interface-up.


8) I recommend enabling _both_ hardware interrupt coalescing and NAPI,
at the same time.  Several other Linux net drivers need to be changed to
do this, as well


Ok... but this is possible with the driver as it is. Interrupt Coalescing is runtime configurable, and NAPI is a compile-time option. NAPI can sometimes hurt performance, and so we'd like to have the ability to disable it for some deployments. I guess I'm not sure what change this suggestion was meant to cause.

Default hardware mitigation to on in all cases, and preferably NAPI as well.

If you see cases that hurt performance, that wants investigating.


14) surely gfar_set_mac_address() needs some sort of synchronization?


I'm not sure why. It only gets called during gfar_open(), and startup_gfar() gets called after it.

Incorrect. Set-mac-address can be called when the interface is up and active, such as by the bonding driver.


15) gfar_change_mtu() synchronization appears absent?


I'm not exactly sure what kind of synchronization is expected here. stop_gfar() and startup_gfar() do modify the appropriate hardware values.

Same conditions (and response) as set-mac-address. These can be called while you're actively DMAing packets.


23) gfar_set_multi() does not appear to take into account huge
multi-cast lists.  Drivers usually handle large dev->mc_count by falling
back to ALLMULTI behavior.


Actually, it does. The bits which are set represent hash table values. Essentially, the MAC addr is converted to an 8-bit CRC. This value then serves as an index to the 256-bit hash table. If the list is large, then certain bits may be written twice, but if all the bits are set, then the behavior is the same as in the ALLMULTI behavior -- every multicast packet arrives. It would be a mistake, IMHO, to cut it off after N entries, as several of those entries could overlap in the bitmap. Clearly, the less bits which are set, the more effective the hardware filter, so checking each address will, I think, always be a performance win or tie (on the filtering side) over ALLMULTI

ok


24) setting IFF_MULTICAST is suspicious at best, possibly wrong:
+       dev->mtu = 1500;
+       dev->set_multicast_list = gfar_set_multi;
+       dev->flags |= IFF_MULTICAST;
+


I'll believe you, but is there a reason for this? I guess it's already set, by default, so easy fix.

follow the other drivers, and behave predictably :)


28) I see you support register dump (good!), now please send the
register-pretty-print patch to the ethtool package :)


Heh. Well, I haven't written that, yet. There's actually a slight problem, in that the 85xx registers are 32-bit, and refuse to be accessed as bytes. I'll be sure to look at that in the near...ish future.

Nothing wrong with accessing registers as 32-bit quantities. That attribute is common to a lot of NICs.


31) infinite loops, particularly on 1-bits, are discouraged:
+       /* Wait for the transaction to finish */
+       while (gfar_read(&regbase->miimind) & (MIIMIND_NOTVALID |
MIIMIND_BUSY))
+               cpu_relax();


What is the suggested method for waiting for the bus to be free? Should I timeout after some time, and bring the driver down?

it's really up to you, as it depends on the implementation platforms.

The most simple is to include a counter that counts down to zero, and starts some absurdly large number like 100000.


33) merge-stopper:  mii_parse_sr().  never wait for autonegotiation to
complete.  it is an asynchronous operation that could exceed 30 seconds.


Hmm...

34) merge-stopper:  dm9161_wait().  same thing as #33.


This may be a problem. That function is there to work around an issue in the PHY, wherein if you try to configure it before it has come up all the way, it refuses to bring the link up. We've sworn at this code many times, but there has, as of yet, not been a good suggestion as to how we can ensure that the 9161 is ready before we configure it.

I interpret that as a driver bug :)

As common sense, regardless of phy bugs, you should not be trying to configure the MAC _or_ the phy in the middle of autonegotiation. Presumeably you are using a combination of netif_stop_queue(), netif_carrier_off(), and netif_device_detach() to achieve this.


35) liberally borrow code and ideas from Ben H's sungem_phy.c.
Eventually we want to move to a generic phy module.


Heh. ironically, I stole liberally from the ibm_emac driver, which now looks exactly like the sungem_phy code for phy handling. A generic phy module would be a good thing, and I'm even interested in helping with code and/or suggestions. If nothing else, I'll jump on the new generic phy module bandwagon!

Cool. This item #35 is more of a long term "think about it" type of request. Please do not hesitate to think of ways that you could share code with sungem_phy.c, and/or make them both use the same API, and submit patches along those lines :)

It's not enough to just write a driver, help change Linux for the better :)


 From Jamal:


1) The check (in gfar_start_xmit()):

Should happen much sooner - i.e before the skb is touched.


I'm not sure I agree here. What I am doing is detecting the full state before an skb needs to be rejected. I am testing the NEXT descriptor to see if it is ready (if not, dirty_tx will be pointing to it). This way, I always handle any skb that is passed to gfar_start_xmit()

See my comments to jamal as well: if you guarantee that you always have room on the DMA ring for an additional skb, that check should never be needed.

Some extremely cautious/paranoid programmers will add a check, with a printk noting it's a BUG (i.e. impossible) condition, such as

        if (unlikely(... no more descriptors ...)) {
                printk(KERN_ERR "%s: BUG: no room for TX\n", dev->name);
                netif_stop_queue();
                spin_unlock_irqrestore();
                return 1;
        }

        ... queue TX to hardware ...

        if (no more descriptors)
                netif_stop_queue()

        spin_unlock_irqrestore()


2) Also its pretty scary if you are doing:
txbdp->status |= TXBD_INTERRUPT for every packet.
Look at other drivers, they try to do this every few packets;
or enable tx mitigation to slow the rate  of those interupts.


I don't believe this is as dire as you think. This bit only indicates that, if the conditions are right, an interrupt will be generated once that descriptor is processed, and its data sent. Conditions which mitigate that are: 1) coalescing is on, and the timer and counter have not triggered the interrupt yet 2) NAPI is enabled, and so interrupts are disabled after the first packet arrives 3) NAPI is disabled, but the driver is currently handling a previous interrupt, so the interrupts are disabled for now.

Even at 10/100 speeds, you really don't want to be generating one interrupt per Tx...

        Jeff



<Prev in Thread] Current Thread [Next in Thread>