On Sun, 2004-10-31 at 04:13, Evgeniy Polyakov wrote:
> On 30 Oct 2004 19:41:27 -0400
> jamal <hadi@xxxxxxxxxx> wrote:
> > Can you explain the "rate" or "speed" parameter ?
>
> Driver writer can set "rate" parameter to any number from 0 to 64k -
> and it will show speed of this driver in current mode/type/operation.
[..]
>
> That mean that this driver perform des 6 time faster than aes, but
> it should be fair numbers and somehow measured and compared to other
> drivers.
>
So you have some init code that does testing? Or is this factor of six
part of the spec provided by chip vendor?
> This also can be achieved by qlen parameter - if driver writer sets
> it to bigger walue then in this mode driver/hardware works faster.
> But driver writer can set qlen in a too big value just because
> it want it to be such without any means of driver/hardware capabilities.
> It is not forbidden.
>
It is no different than say the way you will do ethernet drivers.
DMA ring sizes and link speeds. harder for ethernet drivers if link
speeds change (Linux net scheduling assumes fixed speed ;->)
> And the last and the most interesting one is following:
> we create per session initialiser parameter "session_processin_time"
> which will be sum of the time slices when driver performed operation
> on session, since we alredy has "scomplete" paramenter which is equal
> to the amount of completed(processed) session then we have _fair_ speed
> of the driver/hardware in a given mode/operation/type.
>
> Of course load blancer should select device with the lowest
> session_processing_time/scompleted.
>
> I think third variant is what we want. I will think of it some more
> and will implement soon.
>
I think you should be able to have multiple, configurable LB algos.
> > I havent studied your code, however, what Eugene is pointing out is
> > valuable detail/feedback.
> >
> > You should have in your queuing towards the crypto chip ability to
> > batch. i.e sort of nagle-like "wait until we have 10 packets/20KB or 20
> > jiffies" before you send everything in the queue to the chip.
>
> That is exactly how crypto driver should be written.
> Driver has it's queue and number of session in it, so it and only it
> can decide when begin to process them in the most effective way.
>
This should be above driver, really.
You should have one or more queues where the scheduler feeds off and
shoves to hardware. Perhaps several levels:
->LB-+
+-> device1 scheduler --> driver queues/rings.
|
|
+-> device2 scheduler --> driver queues/rings.
|
.
.
+-> devicen scheduler --> driver queues/rings.
If you look at the linux traffic control, the stuff below LB is how it
behaves. That should be generic enough to _not_ sit in the driver.
This allows for adding smart algorithms to it; fe: qos, rate limiting,
feedback to LB so it could make smarter decisions etc.
> > As he points out (and i am sure he can back it with data ;->), that
> > given the setup cost, packet size, algo and CPU and bus speed, it may
> > not make sense to use the chip at all ;->
>
> Michal has numbers - pure hardware beats soft in a certain setups
> in a fully synchronous schema, let's work SW and HW in parallel.
>
> Of course SW can encrypt 64 byte faster than it will be transfered to
> old ISA crypto card, but it worth to do it for compressing with LZW
> 9000 bytes jumbo frame.
>
Would be interesting. I have seen the numbers from Eugene and they are
quiet intriguing - but they are for the sync mode.
cheers,
jamal
|