On Wed, Jan 31 2001, Rajagopal Ananthanarayanan wrote:
> Another datapoint. On a 2CPU 64MB system dbench (48 clients) would
> yield about 3.5-4.5 MB/sec on 2.4.0 ... with 2.4.1pre9 the same
> setup would yield about 5+ MB/sec. The tests were run on ext2.
>
> I too noticed that the individual threads were completing at about
> the same time in 2.4.1pre9, as opposed to some threads finishing
> really early in 2.4.0.
This just goes to show that the free request batching done with
blk-14 really helps, so even though we are much more latency
oriented there are still setups where 2.4.1-xx beats the pants
of 2.4.0.
Yet another small optimization that helps at least here over
the approach that Linus wanted (...), is this:
--- /opt/kernel/linux-2.4.1/drivers/block/ll_rw_blk.c Tue Jan 30 13:32:10 2001
+++ drivers/block/ll_rw_blk.c Wed Jan 31 18:09:59 2001
@@ -628,11 +628,19 @@
&& atomic_read(&queued_sectors) < low_queued_sectors)
wake_up(&blk_buffers_wait);
+ if (!list_empty(&q->request_freelist[rw])) {
+ blk_refill_freelist(q, rw);
+ list_add(&req->table, &q->request_freelist[rw]);
+ if (waitqueue_active(&q->wait_for_request))
+ wake_up_nr(&q->wait_for_request, 2);
+ return;
+ }
+
/*
- * Add to pending free list and batch wakeups
+ * free list is empty, add to pending free list and
+ * batch wakeups
*/
list_add(&req->table, &q->pending_freelist[rw]);
-
if (++q->pending_free[rw] >= batch_requests) {
int wake_up = q->pending_free[rw];
blk_refill_freelist(q, rw);
which makes sure that we observe the batch count all the time, but
trickles wakeups if X sleepers didn't eat all the requests.
--
Jens Axboe
|