netdev
[Top] [All Lists]

Re: [Lse-tech] fwd: Process Pinning

To: Gerrit Huizenga <gerrit@xxxxxxxxxx>
Subject: Re: [Lse-tech] fwd: Process Pinning
From: Andrew Morton <andrewm@xxxxxxxxxx>
Date: Sun, 24 Dec 2000 15:21:58 +1100
Cc: timw@xxxxxxxxx, Andi Kleen <ak@xxxxxxx>, Tim Hockin <thockin@xxxxxxxxxx>, npollitt@xxxxxxx, lse-tech@xxxxxxxxxxxxxxxxxxxxx, slinx@xxxxxxxxxxxx, netdev@xxxxxxxxxxx
References: Your message of Thu, 21 Dec 2000 16:48:48 PST. <20001221164848.A1091@scutter.internal.splhi.com> <200012230052.eBN0q6E07065@eng2.sequent.com>
Sender: owner-netdev@xxxxxxxxxxx
Gerrit Huizenga wrote:
> 
> Andi Kleen wrote:
> 
> > You think it wouldn't help for database servers ?  (e.g. with a NIC and a
> > SCSI controller per CPU)
> >
> > -Andi
> 
> I think NIC to CPU binding would simply increase the latency problem
> for interrupt delivery in this case.  Allowing the APIC to direct an
> interrupt to the first available CPU decreases the average interrupt
> delivery latency.  And, I'd guess that the interrupt latency more
> likely governs the throughput than the sharing of a few cache lines ( 1
> / nprocessors) of the time on most modern SMP systems.
> 
> Of course, this depends a lot on how long a lock is held (lock_irq()).
> I had heard that the number of instructions generally that a lock
> was held in Linux was *very* small, although some code I've looked
> at doesn't seem to bear that out (at least not any more).
> 


I had a few wild thoughts on this topic earlier in the year.  I haven't
had a chance to do anything with them because people keep on putting
bugs in the kernel :)

* presume that interrupts are wickedly expensive and we want to
  minimise them.  This is more relevant to low-end (100mbit) NICs.

* presume that cross-CPU traffic and cache misses are expensive, and
  we want to optimise for these.

Some avenues for investigation:

* Disable the NIC's interrupts at the hardware level when we're doing
  receive processing.

  This would be a big performance win on uniprocessor - there's no
  *point* in taking the Rx interrupt when we're doing protocol
  processing - we're just going to queue the packet and go back to
  protocol processing.

  I think it's also a performance win on SMP.  If we're using
  NIC->CPU bonding then it's basically a UP problem anyway.

  So it's better to disable the Rx interrupts at the end of the Rx
  ISR if we have sent something to netif_rx().  At the end of
  net_rx_action() processing we call back into the driver to see if it
  has more Rx frames available.  If there are, well, we just process
  them as well, still with hardware interrupts disabled.  This is
  super-quick.  If there aren't any Rx packets available, turn on
  Rx interrupts.

  Note that this magically fixes the SMP packets-out-of-order problem
  as well, independent of any NIC<->CPU bonding.

  We lose the capability to deliver an incoming packet to a different
  socket on a different CPU while we're doing protocol processing, but
  is that valuable? A net loss?

* Disable Tx interrupt altogether.  Gone. Dead.

  Instead, do the tx descriptor reaping within the driver's
  start_xmit method.  Also within the (now very occasional) Rx
  interrupt.

  This would have to be backed up with a timer of some sort.  I
  expect that a one millisecond timer would be sufficiently short to
  avoid screwing up TCP.  You'd keep pushing it back in time each time
  you reaped some Tx descriptors, so under heavy load it would never
  fire.

  If the timer _does_ fire then you can assume that there isn't much
  network load and it may be best to reenable Tx interrupts just so you
  can turn the 1 kHz timer interrupt off.

* Poll for Tx descriptor reaping in the Rx interrupt.  Poll for Rx
  packets in the start_xmit method.  Save interrupts.  With the above
  two tricks, we get *zero* interrupts per packet under heavy load.

"Ah-ha!", you say, "what about latency?".  Well, yes, this scheme
introduces up to one millisecond latency in the very specific case
where traffic is falling from a high level to a low one, which may
make it inappropriate for some classes of LAN application, but I suspect
that the effects will be low.  Plus there are a number of things here
which *decrease* latency, such as reducing the interrupt count under
load.

* Dynamic interrupt bonding.

  Some very brief testing on a 2-way indicates that TCP is a little
  more efficient when you hardwire the NIC to the CPU.

  I was thinking of a simple heuristic where you simply keep track of
  which CPU sends the most packets in a one-second time period.  At the
  end of that period, subject to some hysteresis and thresholding, bond
  the NIC's interrupt to that CPU.  Repeat each second.

  This assumes that a preponderation of Tx packet count correlates
  with one of Rx packet count, which seems fairly sane to me.

  Note that this scheme (and many other bonding schemes) will come
  horridly unstuck if multiple NICs are sharing the same interrupt!
  Don't do that.

One thing which concerns me about _any_ scheme which involves dynamic
APIC reprogramming is that wierd things are likely to happen if we
reprogram APICs when we're under load.  PCs are crap, and we're already
subject to a worrisome number of strange APIC problems.  Trying to give
the APIC a brain transplant while it is handling 5,000 interrupts per
second seems like a recipe for problems.

Last time I looked, Alphas didn't have APICs.  We need to design a
sensible architecture-neutral interrupt bonding API (or at least a
queryable one) before we run off making x86-specific changes.


As a footnote, and I know this won't be a popular view on lse-tech -
philosophically speaking I believe that 2.4 has given enough to the
big-end guys.  I hope that in 2.5, more emphasis and kernel developer
talent will be devoted to the other 99.99% of Linux users.  Better
device support, plug-n-play, manageability, upgradability, etc.  Linux
seems to be becoming more and more a server OS lately and I'd like to
see that turned around.

Of course the three-letter corps need the scalability.  Good luck to
them and thanks for supporting Linux.  For the privateers, yes, it's
*fun* to make Linux faster and it is gratifying, but we need to be
aware that it is also *easy*.  Solving the problems which are faced by
the wider community of Linux users is going to be dull, and hard.


-

<Prev in Thread] Current Thread [Next in Thread>