On Thu, 30 Mar 2000 kuznet@xxxxxxxxxxxxx wrote:
> > We can now get rid of Space.c, since no current documentation advises people
..
> I tried to kill it while 2.1 and failed miserably by the same reason. 8)
> Too much of hacks... It was difficult to clean this without harm.
Look at the note in net_init.c, which I created to replace Space.c. For
those that don't get the reference "pl14" refers to kernel 0.99pl14, i.e.
0.0.99-pre14 !
Once an bad interface is put into place, it's very difficult to change.
For instance, it's far easier to update a bug-free library function than to
fix buggy version because someone might be depending on the bugs. Further,
the cruftier the interface the more people rely on out-of-interface hooks.
> Yes, Space.c must die.
We should lead a chant.. "Space.c must die. Space.c must die.." Yet people
want to change non-broken things first.
[[ PCI access functions. ]]
> Yes. It was true in 2.2 and it should be still true in 2.3.
> In current 2.3 only calls made from networking layer are made
> not under big kernel lock.
This is an important detail: no PCI config modifications in a running
driver, only during initialization and after shutdown.
> > Not recommended? But look at what the only guide, the driver conversions,
> > have as a de facto recommendation -- a horrible spinning waste of time, the
> > cost of which is hidden because it's difficult to measure.
>
> No, Donald. It does not... Well, it was not supposed to tell this at least.
> Probably, it is badly written.
Yes. The de facto guide, the existing SMP changes, are inefficient.
(And "it's not my fault". I know I'm sounding like a broken record.)
> > The existing practice makes it appear that the driver must always protect
> > itself against simultaneous transmission attempts from multiple processors.
...
> So, driver is deemed either to serialize itself via tbusy (as tulip did),
> or to use TX lock (as eepro100 did).
Historical note: The dev->tbusy flag turned into a lock because of a bug.
Timer-based retransmission were not added to the Tx queue, but instead went
through the normal queueing path during the timer IRQ. So if there was no
Tx packet in the queue, it was sent to the driver immediately, even if the
timer IRQ was interrupting the queue output routine!
The cost of this extra lock in each driver was mitigated (but not overcome)
by permitting less expensive, non-strict queue locking and using the Tx
attempts that slipped through as a watchdog timer.
> The first is cheaper from performance viewpoint, but it is really
> complicated. It requires understanding CPU ordering, memory barriers etc.
> We cannot force people to use this, actually the only known example
> of flawlessly working driver using this approach was your tulip.
> But I am afraid even it will not work on smp alpha or ultra,
> due to absence of cpu ordering there.
The tricky part for avoiding further locking is making certain that
descriptor index entries are integers that may be atomically incremented WRT
to the Tx queuing.
In the Tx packet routine
entry = tp->cur_tx % TX_RING_SIZE;
... queue a packet to the 'entry' slot
tp->cur_tx++;
In the interrupt (Tx clean-up) routine
for (dirty_tx = tp->dirty_tx; tp->cur_tx - dirty_tx > 0; dirty_tx++) {
This should *not* require locking on any architecture.
What may require locking is when the Tx queue is shared by the Rx filter
setup logic, as on the Tulip. The set_rx_mode() code must either be able to
spin on the Tx queue lock or otherwise have a way to be Tx-serialized.
Setting the Rx filter mode is very rare compared to sending packets, so
adding a new Tx lock for this is a bad design.
> > > RX locking is made automatically by IRQ serialization logic,
> > > so that the question occurs only when driver needs to make something
> > > serialized with RX interrupts itself. No help from top level
> > > is possible here, it is an internal driver problem and driver
> > > have to create its private locks, if it is necessary,
> > > or, alternatively, to disable device irqs, while doing dangerous job.
Disabling the device's IRQ generation is not valid in a shared IRQ
environment. Doing this used to be a common technique in other OSes, and
even some types of Linux device drivers, but I always avoided it.
The only reasonable approach is disabling all IRQs or selectively disabling
one IRQ chain. The cli() approach used to be very cheap, and is now
expensive only on SMPs. Disabling an IRQ chain used to be very expensive,
and is only slightly less expensive now.
> > For instance, there was a bug in (IIRC) early 2.2 kernels where an interrupt
> > handler would be re-entered on SMP systems. It took a long time to track
> > this down. The reaction I got (which was later corrected) was that it was
> > perfectly reasonable for two processors to be running the interrupt handler
> > simultaneously, and that driver should add locks to make certain that this
> > worked. Yes, that could be made to work, but it would add horrible locking
> > overhead for the very rare case where more than one processor could do work.
>
> It was _my_ statement. I was beated and recognized the defeat. 8)
I wasn't naming names.. (And I don't think that you were responsible for
the original bug, just for not seeing that the semantics that it created
were unreasonable.)
> [ Though to be honest, I still suspect sometimes that it was right way.
> Look at current IRQ engine, it is _very_ complicated due to this
> service. It was not easy to recognize that such overhead
> at the deepest level is good. But all this is history in any case.
> ]
That way creates continuously costly overhead for the rare case that two
processors would try to handle the same interrupt. And in that rare case
you would end up with two processors vying for a single unit of work, or
both handling work while pounding on each others cache lines. It sounds
so elegant, having multiple processors leaping up to handle an interrupt
like men leaping to open a door for a girl, but you get the same low
efficiency.
Donald Becker
Scyld Computing Corporation, becker@xxxxxxxxx
|