netdev
[Top] [All Lists]

[PATCH] "lockless loopback" patch for 2.6.6

To: "David S. Miller" <davem@xxxxxxxxxx>
Subject: [PATCH] "lockless loopback" patch for 2.6.6
From: Arthur Kepner <akepner@xxxxxxx>
Date: Fri, 21 May 2004 14:04:09 -0700
Cc: netdev@xxxxxxxxxxx
In-reply-to: <Pine.SGI.4.56.0405121256510.7328714@xxxxxxxxxxxxxxxxxxx>
References: <Pine.SGI.4.56.0405111251080.7038576@xxxxxxxxxxxxxxxxxxx> <20040512120810.464aaee6.davem@xxxxxxxxxx> <Pine.SGI.4.56.0405121256510.7328714@xxxxxxxxxxxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
Lock contention on the loopback device can lead to poor
performance, even an essentially hung system, on systems
with many processors.

For the loopback device, the only purpose that locking serves
is to protect the device statistics. The attached patch
keeps per-cpu statistics for the loopback device and removes
all locking. The patch is against 2.6.6.


Worst-case loopback lock contention (2.4 kernel)
================================================

The following 'lockstat' data was taken with a 2.4 kernel
(though I expect similar results with 2.6).

The test scenario is that N transmitters simultaneously send
to N receivers, via the loopback, on an N cpu system. (This
was an attempt to simulate a hang seen at a customer site on
a large system.)

N   SPINLOCKS         HOLD            WAIT
      UTIL  CON    MEAN(  MAX )   MEAN(  MAX )(% CPU) NOWAIT SPIN NAME
------------------------------------------------------------------------------
2   6.0%  3.9%   48us(1397us)   74us(1172us)(0.18%)   96.1%  3.9% 
loopback_dev+0x1c0
8   17.0% 19.3%  137us( 117ms)  841us( 118ms)( 2.5%)  80.7% 19.3% 
loopback_dev+0x1c0
31* 72.2% 88.4% 1262us(  73ms)   25ms( 814ms)(39.7%)  11.6% 88.4% 
loopback_dev+0x1c0
64  74.7% 89.9% 2814us(1068ms)   85ms(4925ms)(31.5%)  10.1% 89.9% 
loopback_dev+0x1c0

(* yes, 31, one processor was down on a 32p system)

In the 64p case, the system is essentially unusable for
interactive processes when running this test case.

Throughput on the loopback device also scales very poorly.
In fact, if more than approximately 16 processors are used,
throughput decreases as more processors are added.

With (the 2.4 version of) the patch, the system remains quite
usable during the same test scenario and throughput scales
much better.

--

Arthur

Attachment: lockless_loopback.patch
Description: lockless loopback patch

<Prev in Thread] Current Thread [Next in Thread>