netdev
[Top] [All Lists]

Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5

To: jamal <hadi@xxxxxxxxxx>
Subject: Re: [announce] [patch] limiting IRQ load, irq-rewrite-2.4.11-B5
From: Benjamin LaHaise <bcrl@xxxxxxxxxx>
Date: Mon, 1 Oct 2001 21:04:45 -0400
Cc: linux-kernel@xxxxxxxxxxxxxxx, kuznet@xxxxxxxxxxxxx, Robert Olsson <Robert.Olsson@xxxxxxxxxxx>, Ingo Molnar <mingo@xxxxxxx>, netdev@xxxxxxxxxxx
In-reply-to: <Pine.GSO.4.30.0110012018430.27922-100000@xxxxxxxxxxxxxxxx>; from hadi@xxxxxxxxxx on Mon, Oct 01, 2001 at 08:41:20PM -0400
References: <Pine.GSO.4.30.0110012018430.27922-100000@xxxxxxxxxxxxxxxx>
Sender: owner-netdev@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
On Mon, Oct 01, 2001 at 08:41:20PM -0400, jamal wrote:
> 
> >The new mechanizm:
> >
> >- the irq handling code has been extended to support 'soft mitigation',
> >  ie. to mitigate the rate of hardware interrupts, without support from
> >  the actual hardware. There is a reasonable default, but the value can
> >  also be decreased/increased on a per-irq basis via
> > /proc/irq/NR/max_rate.
> 
> I am sorry, but this is bogus. There is no _reasonable value_. Reasonable
> value is dependent on system load and has never been and never
> will be measured by interupt rates. Even in non-work conserving schemes

It is not dependant on system load, but rather on the performance of the 
CPU and the number of interrupt sources in the system.

> There is already a feedback system that is built into 2.4 that
> measures system load by the rate at which the system processes the backlog
> queue. Look at netif_rx return values. The only driver that utilizes this
> is currently the tulip. Look at the tulip code.
> This in conjuction with h/ware flow control should give you sustainable
> system.

Not quite.  You're still ignoring the effect of interrupts on the users' 
ability to execute instructions during their timeslice.

> [Granted that mitigation is a hardware specific solution; the scheme we
> presented at the kernel summit is the next level to this and will be
> non-dependednt on h/ware.]
> 
> >(note that in case of shared interrupts, another 'innocent' device might
> >stay disabled for some short amount of time as well - but this is not an
> >issue because this mitigation does not make that device inoperable, it
> >just delays its interrupt by up to 10 msecs. Plus, modern systems have
> >properly distributed interrupts.)
> 
> This is a _really bad_ idea. not just because you are punishing other
> devices.

I'm afraid I have to disagree with you on this statement.  What I will 
agree with is that 10msec is too much.

> Lets take network devices as examples: we dont want to disable interupts;
> we want to disable offending actions within the device. For example, it is
> ok to disable/mitigate receive interupts because they are overloading the
> system but not transmit completion because that will add to the overall
> latency.

Wrong.  Let me introduce you to my 486DX/33.  It has PCI.  I'm putting my 
gige card into the poor beast.  transmitting full out, it can receive a 
sufficiently high number of tx done interrupts that it has no CPU cycles left 
to run, say, gated in userspace.

Falling back to polled operation is a well known technique in realtime and 
reliable systems.  By limiting the interrupt rate to a known safe limit, 
the system will remain responsive to non-interrupt tasks even under heavy 
interrupt loads.  This is the point at which a thruput graph on a slow 
machine shows a complete breakdown in performance, which is always possible 
on a slow enough CPU with a high performance device that takes input from 
a remotely controlled user.  This is *required*, and is not optional, and 
there is no way that a system can avoid it without making every interrupt 
a task, but that's a mess nobody wants to see in Linux.

                -ben


<Prev in Thread] Current Thread [Next in Thread>