netdev
[Top] [All Lists]

Csum and csum copyroutines benchmark

To: Russell King <rmk@xxxxxxxxxxxxxxxx>, Roy Sigurd Karlsbakk <roy@xxxxxxxxxxxxx>, netdev@xxxxxxxxxxx, Kernel mailing list <linux-kernel@xxxxxxxxxxxxxxx>
Subject: Csum and csum copyroutines benchmark
From: Denis Vlasenko <vda@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Date: Fri, 25 Oct 2002 09:36:22 -0200
Cc: libc-alpha@xxxxxxxxxxxxxxxxxx
In-reply-to: <200210241249.g9OCnOp09750@Port.imtp.ilyichevsk.odessa.ua>
References: <200210231218.18733.roy@karlsbakk.net> <20021024125030.A7529@flint.arm.linux.org.uk> <200210241249.g9OCnOp09750@Port.imtp.ilyichevsk.odessa.ua>
Reply-to: vda@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Sender: netdev-bounce@xxxxxxxxxxx
/me said:
> I'm experimenting with different csum_ routines in userspace now.

Short conclusion: 
1. It is possible to speed up csum routines for AMD processors by 30%.
2. It is possible to speed up csum_copy routines for both AMD and Intel
   three times or more. Roy, do you like that? ;)

Tests: they checksum 4MB block and csum_copy 2MB into second 2MB.
POISON=0/1 controls whether to perform correctness tests or not.
That slows down test very noticeably. What does glibc use for
memset/memcmp? for() loop?!!

With POISON=1 ntqpf2_copy bugs out, see its source. I left it in
to save repeating my work by others. BTW, i do NOT understand why
it does not work. ;) Anyone with cluebat?

IMHO the only way to make it optimal for all CPUs is to make these
functions race at kernel init and pick the best one.

tests on Celeron 1200 (100 MHz, x12 core)
=========================================
Csum benchmark program
buffer size: 4 Mb
Each test tried 16 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took   717 max,  704 min cycles per kb. 
sum=0x44000077
                     kernel_csum - took  4760 max,  704 min cycles per kb. 
sum=0x44000077
                     kernel_csum - took   722 max,  704 min cycles per kb. 
sum=0x44000077
                  kernelpii_csum - took   539 max,  528 min cycles per kb. 
sum=0x44000077
                kernelpiipf_csum - took   573 max,  529 min cycles per kb. 
sum=0x44000077
                        pfm_csum - took  1411 max, 1306 min cycles per kb. 
sum=0x44000077
                       pfm2_csum - took   875 max,  762 min cycles per kb. 
sum=0x44000077
copy tests:
                     kernel_copy - took  5738 max, 3423 min cycles per kb. 
sum=0x99aaaacc
                     kernel_copy - took  3517 max, 3431 min cycles per kb. 
sum=0x99aaaacc
                     kernel_copy - took  4385 max, 3432 min cycles per kb. 
sum=0x99aaaacc
                  kernelpii_copy - took  2912 max, 2752 min cycles per kb. 
sum=0x99aaaacc
                      ntqpf_copy - took  2010 max, 1700 min cycles per kb. 
sum=0x99aaaacc
                     ntqpfm_copy - took  1749 max, 1701 min cycles per kb. 
sum=0x99aaaacc
                        ntq_copy - took  2218 max, 2141 min cycles per kb. 
sum=0x99aaaacc
BAD copy! <-- ntqpf2_copy is buggy :) see its source
'copy tests' above are with POISON=1
These are with POISON=0:
                     kernel_copy - took  2009 max, 1935 min cycles per kb. 
sum=0x44000077
                     kernel_copy - took  2240 max, 1959 min cycles per kb. 
sum=0x44000077
                     kernel_copy - took  2197 max, 1936 min cycles per kb. 
sum=0x44000077
                  kernelpii_copy - took  2121 max, 1939 min cycles per kb. 
sum=0x44000077
                      ntqpf_copy - took   667 max,  548 min cycles per kb. 
sum=0x44000077
                     ntqpfm_copy - took   651 max,  546 min cycles per kb. 
sum=0x44000077
                        ntq_copy - took   660 max,  545 min cycles per kb. 
sum=0x44000077
                     ntqpf2_copy - took   644 max,  548 min cycles per kb. 
sum=0x44000077
Done

Tests on Duron 650 (100 MHz, x6,5 core)
=======================================
Csum benchmark program
buffer size: 4 Mb
Each test tried 16 times, max and min CPU cycles are reported.
Please disregard max values. They are due to system interference only.
csum tests:
                     kernel_csum - took  1090 max, 1051 min cycles per kb. 
sum=0x44000077
                     kernel_csum - took  1080 max, 1052 min cycles per kb. 
sum=0x44000077
                     kernel_csum - took  1178 max, 1058 min cycles per kb. 
sum=0x44000077
                  kernelpii_csum - took  1614 max, 1052 min cycles per kb. 
sum=0x44000077
                kernelpiipf_csum - took   976 max,  962 min cycles per kb. 
sum=0x44000077
                        pfm_csum - took   755 max,  746 min cycles per kb. 
sum=0x44000077
                       pfm2_csum - took   749 max,  745 min cycles per kb. 
sum=0x44000077
copy tests:
                     kernel_copy - took  1251 max, 1072 min cycles per kb. 
sum=0x99aaaacc
                     kernel_copy - took  1363 max, 1072 min cycles per kb. 
sum=0x99aaaacc
                     kernel_copy - took  1352 max, 1072 min cycles per kb. 
sum=0x99aaaacc
                  kernelpii_copy - took  1132 max, 1014 min cycles per kb. 
sum=0x99aaaacc
                      ntqpf_copy - took   514 max,  480 min cycles per kb. 
sum=0x99aaaacc
                     ntqpfm_copy - took   495 max,  482 min cycles per kb. 
sum=0x99aaaacc
                        ntq_copy - took  1153 max,  948 min cycles per kb. 
sum=0x99aaaacc
BAD copy! <-- ntqpf2_copy is buggy :) see its source
'copy tests' above are with POISON=1
These are with POISON=0:
                     kernel_copy - took  1145 max,  871 min cycles per kb. 
sum=0x44000077
                     kernel_copy - took   879 max,  871 min cycles per kb. 
sum=0x44000077
                     kernel_copy - took   876 max,  871 min cycles per kb. 
sum=0x44000077
                  kernelpii_copy - took  1019 max,  845 min cycles per kb. 
sum=0x44000077
                      ntqpf_copy - took  2972 max,  229 min cycles per kb. 
sum=0x44000077
                     ntqpfm_copy - took   248 max,  245 min cycles per kb. 
sum=0x44000077
                        ntq_copy - took   460 max,  452 min cycles per kb. 
sum=0x44000077
                     ntqpf2_copy - took   390 max,  340 min cycles per kb. 
sum=0x44000077
Done
--
vda

Attachment: timing_csum_copy.tar.bz2
Description: BZip2 compressed data

<Prev in Thread] Current Thread [Next in Thread>