netdev
[Top] [All Lists]

Re: [vortex] can't unload module

To: Andrew Morton <andrewm@xxxxxxxxxx>
Subject: Re: [vortex] can't unload module
From: David Fries <dfries@xxxxxxx>
Date: Sun, 24 Sep 2000 00:46:17 -0500
Cc: vortex@xxxxxxxxx, netdev@xxxxxxxxxxx
In-reply-to: <39CCA283.74E63C9A@uow.edu.au>; from andrewm@uow.edu.au on Sat, Sep 23, 2000 at 11:30:59PM +1100
References: <20000921002752.A10927@d-131-151-189-65.dynamic.umr.edu> <39CB524E.2EB250E0@uow.edu.au>, <39CB524E.2EB250E0@uow.edu.au>; <20000922153900.A24813@d-131-151-189-65.dynamic.umr.edu> <39CCA283.74E63C9A@uow.edu.au>
Sender: owner-netdev@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
I've included my latest patch.  I've narrowed it do to in
boomerange_rx if there is a status field non-null in the entry before
vp->cur_rx, I start at vp->cur_rx, skip the entries that the status
says they are completed uploading and set the status fields of all
the rest of the entries to zero.

I don't know enough to tell if there is data in the entry that has a
status field, so yes I'm just throwing it away.

For me this works, I haven't had networking go down with this patch.
I am getting 'NETDEV WATCHDOG: eth0: transmit timed out' which I
hadn't gotten before, but then it might be possible that networking
went down before it got to that point.

Here are some of my kernel log messages with this patch.

Sep 23 22:52:14 spacedout kernel: bucket 0, status 0x0
Sep 23 22:52:14 spacedout kernel: bucket 1, status 0x600085ea
Sep 23 22:52:14 spacedout kernel: vp->cur_rx 2
Sep 23 22:52:14 spacedout kernel: bucket 0, status 0x0
Sep 23 22:52:14 spacedout kernel: bucket 1, status 0x600085ea
Sep 23 22:52:14 spacedout kernel: vp->cur_rx 2
Sep 23 22:52:14 spacedout kernel: bucket 22, status 0x0
Sep 23 22:52:14 spacedout kernel: bucket 23, status 0x600085ea
Sep 23 22:52:14 spacedout kernel: vp->cur_rx 24
Sep 23 22:52:14 spacedout kernel: bucket 10, status 0x0
Sep 23 22:52:14 spacedout kernel: bucket 11, status 0x600085ea
Sep 23 22:52:14 spacedout kernel: vp->cur_rx 12
Sep 23 22:52:15 spacedout kernel: bucket 16, status 0x0
Sep 23 22:52:15 spacedout kernel: bucket 17, status 0x600085ea
Sep 23 22:52:15 spacedout kernel: vp->cur_rx 18
Sep 23 22:52:17 spacedout kernel: bucket 2, status 0x0
Sep 23 22:52:17 spacedout kernel: bucket 3, status 0x600085ea
Sep 23 22:52:17 spacedout kernel: vp->cur_rx 4
Sep 23 22:52:17 spacedout kernel: bucket 0, status 0x0
Sep 23 22:52:17 spacedout kernel: bucket 1, status 0x600085ea
Sep 23 22:52:17 spacedout kernel: vp->cur_rx 2
Sep 23 22:52:19 spacedout kernel: bucket 2, status 0x0
Sep 23 22:52:19 spacedout kernel: bucket 3, status 0x600085ea
Sep 23 22:52:19 spacedout kernel: vp->cur_rx 4
Sep 23 22:52:19 spacedout kernel: bucket 12, status 0x0
Sep 23 22:52:19 spacedout kernel: bucket 13, status 0x600085ea
Sep 23 22:52:19 spacedout kernel: vp->cur_rx 14
Sep 23 22:52:19 spacedout kernel: bucket 0, status 0x0
Sep 23 22:52:19 spacedout kernel: bucket 1, status 0x600085ea
Sep 23 22:52:19 spacedout kernel: vp->cur_rx 2
Sep 23 22:52:20 spacedout kernel: bucket 10, status 0x0
Sep 23 22:52:20 spacedout kernel: bucket 11, status 0x600085ea
Sep 23 22:52:20 spacedout kernel: vp->cur_rx 12
Sep 23 22:52:20 spacedout kernel: bucket 16, status 0x0
Sep 23 22:52:20 spacedout kernel: bucket 17, status 0x600083d2
Sep 23 22:52:20 spacedout kernel: vp->cur_rx 18

Does that give you any clue where I should look more for the real
cause?  Looks like an off by one error, but I wouldn't know how that
would happen.

Instead of clearing the entries I tried the following, but something
didn't like it, because my machine froze in X as soon as I was
pounding the video card and network card at the same time.

/* it doesn't like this
entry--;
vp->cur_rx--;
*/

I'm wondering if the chipset just can't handle multiple devices trying
to do something at once.

On Sat, Sep 23, 2000 at 11:30:59PM +1100, Andrew Morton wrote:
> David Fries wrote:
> > 
> > 'net drop out' problem,
> > There are two stages, reduced network and no network.  For example
> > when I do a `ping -s 15000 aerospace` ping from spacedout (troubled
> > computer) to aerospace (another one), I'll get response times of
> > either 4ms or 3000ms.
> > 
> > When networking stops I don't get any packets received or interrupts,
> > but I and showing RX overruns incrementing.  When I ping from
> > spacedout, spacedout shows an arp request going out, aerospace sees
> > the arp request, but spacedout never sees the reply.
> 
> This is consistent with an interrupt controller failure.  However if
> this was the case you should be seeing "NETDEV WATCHDOG: eth0: transmit
> timed out" messages and "interrupt posted but not delivered" messages. 
> Are you sure you're not?

I'm not getting those message.  Which interrupt controller are you
thinking of?  I think it was only interrupting when I sent packets.

> Another test: when spacedout is in this state, go to its console and
> ping another machine.  Watch /proc/interrupts to see if you're getting
> Tx interrupts.
> 
> If you are getting tx interrupts then perhaps the NIC is getting its
> registers unprogrammed, or perhaps the multicast filter has gone silly. 
> Try `ifconfig eth0 promisc'.
> 
> Or try a new PCI slot.

Did that.

> Or a new power supply.

I have changed power supplies because the fan was getting noisy, but I
don't remember when.

> Or a new computer.

I'd rather not.

> BTW, I'm currently typing on a K6-2 machine (wildly overclocked - this
> is my main workstation/router/firewall/server :)).  It's running
> 2.4.0-test8-pre1 with a 3c905B.  Solid as a rock.  Different motherboard
> manufacturer: Gigabyte.

I really don't see it being related to the processor, motherboard
chipset probably.

> > I not sure, I think it should work, but it would matter on your mount
> > options.
> 
> OK, I was asking because this problem is related to IP fragmentation,
> and I assume (perhaps wrongly) that if rsize and wsize are larger than
> your MTU, there will be a lot of fragmented packets.
> 
> > > Are you able to provide a set of steps with which others can reproduce
> > > this?
> > 
> > 'net drop out'
> > I'll just say no.  AeroSpace is running SMP, spacedout is not SMP.
> > AeroSpace is a dual Pentium MMX, Spacedout is a K6-2.  They have
> > basically identical network cards in them 3c905b, I have swaped the
> > network cards in the past and the problems follow the computer not the
> > card.
> > 
> > I would suggest try getting a FIC VA 503+ motherboard, K6-2 processor,
> > 3c905B network card, go in X, have something rapidly updating the
> > video card (rxvt doing `locate \*` worked fine), and send a ton of
> > network data to the system at 100BaseT.
> 
> I just did that here:
> 
>       ping -q -f -s 64 -l 100000 bix
> 
> This caused `bix' to take a short trip to an alternate universe, but it
> recovered fine when I killed the ping.
> 
> > If you REALLY pulled my leg you might get me to put one of my Pentium
> > processors in the system, but I would rather not do that.
> 
> Sorry, I think you need to start swapping hardware in spacedout.  It's
> sick.

I would assume that would mean motherboard or network card that gets
along with the motherboard.

> > I did,
> > insmod 3c59x
> > modprobe ne io=0x300 irq=111
> > ifconfig eth0 ...
> > ifconfig eth1 ...
> > ifconfig eth0 down
> > rmmod 3c59x
> > and it keep giving, 'unregister_netdevice' message over and over until
> > I rebooted.
> 
> I tried many combinations of this with a eepro100 and a 3c905C. 
> Everything worked fine.  Sigh.
> 
> [ In reply to a later email ]
> 
> > What does rmmod and insmod do to the network card that vortex_down,
> > vortex_up doesn't?  Something is different.
> 
> All the stuff in vortex_probe1() is run at insmod-time only.  It's
> mainly driver data structure initialisation, but there's some hardware
> initialisation as well.
> 
> David, If this problem is purely exhibited on `spacedout' then it's
> quite possible that there are no software problems, although that
> unregister_netdevice problem sure looks like software to me...   My
> recommendation is to start swapping out hardware.  You get some
> amazingly wierd stuff happening if the hardware is dodgy.

-- 
                +---------------------------------+
                |      David Fries                |
                |      dfries@xxxxxxx             |
                +---------------------------------+

Attachment: differences.patch.gz
Description: Binary data

<Prev in Thread] Current Thread [Next in Thread>