Received: with ECARTIS (v1.0.0; list netdev); Mon, 26 Jul 2004 17:17:18 -0700 (PDT) Received: from posti6.jyu.fi (posti6.jyu.fi [130.234.4.43]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id i6R0H9im028755 for ; Mon, 26 Jul 2004 17:17:12 -0700 Received: from silmu.st.jyu.fi (IDENT:Tmbksh7Ns+OYZjlJXL5vQOzOavu8e9vu@silmu.st.jyu.fi [130.234.4.64]) by posti6.jyu.fi (8.12.8/8.12.8/antispam) with ESMTP id i6R0Gqob005839; Tue, 27 Jul 2004 03:16:52 +0300 Date: Tue, 27 Jul 2004 03:16:50 +0300 (EEST) From: Pasi Sjoholm X-X-Sender: ptsjohol@silmu.st.jyu.fi To: Robert Olsson cc: Francois Romieu , H?ctor Mart?n , Linux-Kernel , , , , Subject: Re: ksoftirqd uses 99% CPU triggered by network traffic (maybe RLT-8139 related) In-Reply-To: <16645.34772.760879.146784@robur.slu.se> Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=iso-8859-1 Content-Transfer-Encoding: 8BIT X-Virus-Scanned: by amavisd-milter (http://www.amavis.org/) at posti6.jyu.fi; Tue, 27 Jul 2004 03:16:53 +0300 X-archive-position: 7179 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ptsjohol@cc.jyu.fi Precedence: bulk X-list: netdev Content-Length: 2902 Lines: 88 On Tue, 27 Jul 2004, Robert Olsson wrote: >>> In summary: High softirq loads can totally kill userland. The reason is that >>> do_softirq() is run from many places hard interrupts, local_bh_enable etc >>> and bypasses the ksoftirqd protection. It just been discussed at OLS with >>> Andrea and Dipankar and others. Current RCU suffers from this problem as well. >> Ok, this explanation makes sense and my point of view I think this is >> quite critical problem if you can "crash" linux kernel just sending enough >> packets to network interface for an example. > > Well the packets also has to create hard softirq loads in practice this means route > lookup or something else for normal traffic the RX_SOFIRQ is very well behaved > and schedules itself to give other softirq's a chance to run also I'll guess you > have softirq's not only from the network. If you decrease your load a bit you come > to more nomal operation? The system will be more responsive but won't ever come back to normal operation and if I use my computer just for some chatting/writing/browsing the ksoftirqd-problem won't ever happen. It is also very hard to reproduce with only high network load. I ran some test with 100Mbps load (RX and TX both) for 24hours without any problems but soon as I started to compile some stuff or doing something cpu-intensive the system would do crash-boom-bang. >> I would be more than glad to help you in testing if you want to publish >> some patches. > Well maybe we should start to verify that this problem. Alexey did a litte program > doing gettimeofday to run to see how long user mode could be starved. Also note we > add new source of softirq's. First run: -- timestamp diff = 1, maxlat = 50963 timestamp diff = 0, maxlat = 50963 timestamp diff = 0, maxlat = 50963 timestamp diff = 1, maxlat = 50963 timestamp diff = 0, maxlat = 50963 timestamp diff = 0, maxlat = 50963 new maxlat = 124428 new maxlat = 511817 timestamp diff = 1, maxlat = 511817 new maxlat = 749913 new maxlat = 775142 **4581159 new maxlat = 4581159 **1704065 **1464153 timestamp diff = 1, maxlat = 4581159 **1448939 **1464013 **1021339 **1568588 **1861657 timestamp diff = 0, maxlat = 4581159 -- Second run: -- timestamp diff = 0, maxlat = 832003 timestamp diff = 1, maxlat = 832003 timestamp diff = 0, maxlat = 832003 timestamp diff = 0, maxlat = 832003 timestamp diff = 0, maxlat = 832003 timestamp diff = 1, maxlat = 832003 **1793951 new maxlat = 1793951 timestamp diff = 0, maxlat = 1793951 **2579893 new maxlat = 2579893 timestamp diff = 0, maxlat = 2579893 timestamp diff = 0, maxlat = 2579893 timestamp diff = 0, maxlat = 2579893 **1004449 **3199625 new maxlat = 3199625 timestamp diff = 0, maxlat = 3199625 timestamp diff = 1, maxlat = 3199625 -- When the system has gone to this state you will have to wait quite a long time to receive a new line. -- Pasi Sjöholm