netdev
[Top] [All Lists]

Re: ksoftirqd uses 99% CPU triggered by network traffic (maybe RLT-8139

To: Robert Olsson <Robert.Olsson@xxxxxxxxxxx>
Subject: Re: ksoftirqd uses 99% CPU triggered by network traffic (maybe RLT-8139 related)
From: Pasi Sjoholm <ptsjohol@xxxxxxxxx>
Date: Fri, 30 Jul 2004 21:40:28 +0300 (EEST)
Cc: Francois Romieu <romieu@xxxxxxxxxxxxx>, H?ctor Mart?n <hector@xxxxxxxxxxxxxx>, Linux-Kernel <linux-kernel@xxxxxxxxxxxxxxx>, <akpm@xxxxxxxx>, <netdev@xxxxxxxxxxx>, <brad@xxxxxxxxxx>, <shemminger@xxxxxxxx>
In-reply-to: <16650.21955.869485.332365@robur.slu.se>
Sender: netdev-bounce@xxxxxxxxxxx
>> I don't remember if I have said this but when the ksoftirqd has started to 
>> take all the cpu-time there is no way to stop it excluding booting 
>> computer. You can kill or stop all the processes which are taking your 
>> cpu-time (ie. source compiling) but network wont start to work or neither 
>> there is no free cpu-time for use because ksoftirqd won't stop eating it.
>  No you didn't then it seems you are hit by some bug. Have the bug happen.
>  Kill userland processes like as you did. Try find out what's running. I 
> would 
>  look in /proc/interrupt /proc/net/softnet_stat /proc/net/dev or maybe best 
> to 
>  take a profile if possible or Magic SysRq to find out what's looping. Also 
> try
>  ifconfig down.

Ok, now I know how to go back in the normal state of kernel operation. I 
just have to do "ifconfig eth0 down" and ksoftirqd stop taking all the 
cpu-time, after that I can do "ifconfig eth0 up" and everything is working 
fine.

So I guess it is 8139too-driver which has a bug.

First "SysRq : Show Regs" when the kernel is ok:

--
Pid: 0, comm:              swapper
EIP: 0060:[<c0101ed3>] CPU: 0
EIP is at default_idle+0x23/0x30
 EFLAGS: 00000246    Not tainted  (2.6.7-mm7)
EAX: 00000000 EBX: c03cc000 ECX: 00000000 EDX: 0006a546
ESI: 00099100 EDI: c0427120 EBP: 0046e007 DS: 007b ES: 007b
CR0: 8005003b CR2: 08101000 CR3: 1e9b7000 CR4: 000006d0
 [<c0101f3c>] cpu_idle+0x2c/0x40
 [<c03ce832>] start_kernel+0x162/0x180
 [<c03ce460>] unknown_bootoption+0x0/0x160
--

Second "SysRq : Show Regs" when the ksoftirqd has started to take all the 
cpu-time:

--
Pid: 2, comm:          ksoftirqd/0
EIP: 0060:[<e0871224>] CPU: 0
EIP is at rtl8139_poll+0xb4/0x100 [8139too]
 EFLAGS: 00000247    Not tainted  (2.6.7-mm7)
EAX: ffffe000 EBX: 00000040 ECX: dfa654f8 EDX: c0441978
ESI: dfa65400 EDI: e0868000 EBP: dff85f84 DS: 007b ES: 007b
CR0: 8005003b CR2: 080e9024 CR3: 1e9b7000 CR4: 000006d0
 [<c01194b0>] ksoftirqd+0x0/0xc0
 [<c02c5e3a>] net_rx_action+0x6a/0x110
 [<c011917d>] __do_softirq+0x7d/0x80
 [<c01191a6>] do_softirq+0x26/0x30
 [<c0119518>] ksoftirqd+0x68/0xc0
 [<c01276e5>] kthread+0xa5/0xb0
 [<c0127640>] kthread+0x0/0xb0
 [<c0102111>] kernel_thread_helper+0x5/0x14
--

Third one after I have done ifconfig eth0 down/up.

--
Pid: 0, comm:              swapper
EIP: 0060:[<c0101ed3>] CPU: 0
EIP is at default_idle+0x23/0x30
 EFLAGS: 00000246    Not tainted  (2.6.7-mm7)
EAX: 00000000 EBX: c03cc000 ECX: 00000000 EDX: 000cf65e
ESI: 00099100 EDI: c0427120 EBP: 0046e007 DS: 007b ES: 007b
CR0: 8005003b CR2: b7c5a000 CR3: 1e9b7000 CR4: 000006d0
 [<c0101f3c>] cpu_idle+0x2c/0x40
 [<c03ce832>] start_kernel+0x162/0x180
 [<c03ce460>] unknown_bootoption+0x0/0x160
--

I also compiled 8139too-driver with debub-options on and the logfile is 
included as attachment. 

Everything is going ok until end of the 20:41:29, then it happens:

--
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 004c BufAddr 
0c80, free to 003c, Cmd 0c.
20:41:29  kernel: rtl8139_interrupt: eth0: exiting interrupt, 
intr_status=0x0001.
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 11bc BufAddr 
22e0, free to 11ac, Cmd 0c.
20:41:29  kernel: rtl8139_interrupt: eth0: exiting interrupt, 
intr_status=0x0000.
20:41:29  last message repeated 4 times
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr 
242c, free to 2428, Cmd 0d.
20:41:29  kernel: rtl8139_interrupt: eth0: exiting interrupt, 
intr_status=0x0010.
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr 
242c, free to 2428, Cmd 0d.
20:41:29  kernel: rtl8139_interrupt: eth0: exiting interrupt, 
intr_status=0x0010.
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr 
242c, free to 2428, Cmd 0d.
20:41:29  kernel: rtl8139_interrupt: eth0: exiting interrupt, 
intr_status=0x0010.
20:41:29  kernel: rtl8139_rx: eth0: In rtl8139_rx(), current 2438 BufAddr 
242c, free to 2428, Cmd 0d.
--

The hardest part is to tell where the problem is but I think that 
rtl8139_poll-function would be good place to start looking for the bug?

readprofile didn't tell much.. Where to go next? I'm not so good 
programmer that I could find the right place to fix..

--
Pasi Sjöholm

Attachment: rtl8139-debug.gz
Description: GNU Zip compressed data

<Prev in Thread] Current Thread [Next in Thread>