New performance tests.
These tests were run on a 32-processor Altix system.
It has 1300 MHz CPUs. Note that the individual CPUs are slower than my
earlier testing with the 2p box. fork-bomb-like tests are slower on
big NUMA machines like this unless you restrict them to running CPUs
within the same node (and this is why the 2p box does better in that
respect).
I know it was requested that I run this on a behemoth - I'm just hoping
32p is big enough. The bigger the machine, the harder it is to reserve
time...
As seen in the thread, my first pass at an RCU implementation isn't good
enough to use for real. I'm hoping it is good enough to get us some numbers.
Conclusions:
1. My feeling is that RCU isn't buying us much here. In fact, it
restricts what kernel modules can do in that you can't sleep in many
interesting places. This affects things such as kmalloc's (unless you
say you aren't willing to wait)... It also makes it so you can't use
rwsems in all the places you might need to.
2. In the data below, unlike the 2p test, there are some differences.
My conclusion here, and the reason I added an additional test, is
that the Job infrastructure is the cause for a small slow down.
So I added a kernel test where pnotify still has two kernel module
subscribers per task, but Job isn't one of them. Job is replaced by
a tiny kernel module that just simply counts the number of times
the fork, exit, and exec hooks fire. This takes Job out of the
equation. That kernel closely tracks the stock kernel. That means
pnotify isn't at fault, Job is. In other words, if Job implemented its
own hooks and was configured on, we'd probably see the same small dip.
3. The fork/exit tests (jump to the end for a summary) show stock performing
a tiny bit better than pnotify with two subscribers and no job (keyrings
and a test module). I suspect if I did a test where pnotify only had
keyrings as a subscriber, the numbers would be nearly the same. The AIM7
data shows these two kernels as very similar.
4. So I feel pnotify isn't really costing us but the data does show we need
to keep an eye on pnotify users. We need to treat new pnotify users in a
similar way as new callouts in exit, copy_process, and exec when the
pnotify user plans to associate with all or many running tasks (at least
for performance reasons)
I'm going to integrate other comments I got on the pnotify patch now and
send a new version of the non-rcu pnotify patch soon.
Some info on the tests:
jobtest: a mini job test suite that includes forking enough processes
to make the PID numbers roll. Output trimmed.
forkexit: just forks and exits the specified number of times.
fork-wait-exit: is the same, but the parent waits for the child.
AIM7: The version of aim7 we use is not tracking the current community
release.
All kernels had kdb patches applied and kdb was enabled.
4 kernels tested:
2.6.14-rc2 with pnotify, job, and pnotify-aware keyrings, original NON-RCU
2.6.14-rc2 with pnotify, job, and pnotify-aware keyrings, RCU Testing
2.6.14-rc2 stock (no pnotify, no job, non-modified keyrings enabled)
2.6.14-rc2 non-rcu pnotify, NO Job, pnotify-aware keyrings, tiny pnotify module
In the last kernel, I'm showing a pnotify user that only does atomic adds
and subtracts on a few variables and nothing else. The test module, like
keyrings, is a subscriber to all processes. The purpose of this test is to
take the Job infrastructure out of the performance picture to just focus
on pnotify. This tiny test module counts the number of times fork, exit,
and exec fires. Output is provided.
2.6.14-rc2 pnotify, job, pnotify-aware keyrings implementation NOT using rcu
------------------------------------------------------------------------------
=== jobtest ===
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m26.222s
user 0m1.736s
sys 0m21.380s
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m35.574s
user 0m2.512s
sys 0m31.356s
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m35.696s
user 0m2.556s
sys 0m31.128s
=== forkexit ===
belay:~ # time ./forkexit 40000
Fork returned an error: 7720 times
real 0m14.665s
user 0m0.072s
sys 0m14.464s
belay:~ # time ./forkexit 40000
Fork returned an error: 7720 times
real 0m15.439s
user 0m0.104s
sys 0m15.212s
belay:~ # time ./forkexit 40000
Fork returned an error: 7720 times
real 0m15.115s
user 0m0.068s
sys 0m14.924s
=== fork-wait-exit ===
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m25.494s
user 0m2.136s
sys 0m31.588s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m25.208s
user 0m1.760s
sys 0m28.208s
=== AIM7 ===
Linux version 2.6.14-rc2 (erikj@attica) (gcc version 3.3.3 (SuSE Linux)) #2 SMP
PREEMPT Wed Sep 28 15:01:33 CDT 2005
HOST = belay
CPUS = 32
DIRS = 1
DISKS= 0
FS = xfs
SCSI = non-xscsi
ID = pnotify+job+new-keyring, NON-RCU
Run 1 of 1
Benchmark Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1" AIM7 Run Sep 29 16:03:37
2005
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
1 1907.6 100 3.1 0.6 31.7929
2 3694.1 96 3.2 1.2 30.7839
3 5643.2 96 3.1 1.5 31.3510
4 7994.5 99 2.9 1.8 33.3104
5 9417.5 97 3.1 2.4 31.3916
10 18976.2 97 3.1 4.7 31.6270
20 37439.7 98 3.1 9.1 31.1997
50 74348.5 94 3.9 23.6 24.7828
100 101694.9 90 5.7 46.5 16.9492
150 124643.1 87 7.0 67.8 13.8492
200 130669.1 86 8.9 90.3 10.8891
500 154877.9 83 18.8 227.9 5.1626
1000 150566.6 80 38.7 453.3 2.5094
2000 154665.9 78 75.3 937.6 1.2889
2.6.14-rc2 pnotify, job, pnotify-aware keyrings implementation with RCU
------------------------------------------------------------------------------
=== jobtest ===
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m26.501s
user 0m1.608s
sys 0m20.596s
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m35.501s
user 0m2.320s
sys 0m30.148s
belay:~ # time jobtest
[snip]
Great. All tests completed, no errors
real 0m35.563s
user 0m2.576s
sys 0m30.632s
=== forkexit ===
--> Note: The first attempt took 1 minutes 23 seconds. This only
--> happened once. With the stock kernel test below, one attempt was
--> an outlier taking more than 5 minutes. Therefore, I don't think it
--> is anything I changed that caused this.
belay:~ # time ./forkexit 40000
Fork returned an error: 7701 times
real 1m22.814s
user 0m0.064s
sys 1m22.740s
belay:~ # time ./forkexit 40000
Fork returned an error: 7693 times
real 0m14.557s
user 0m0.076s
sys 0m14.368s
belay:~ # time ./forkexit 40000
Fork returned an error: 7693 times
real 0m14.774s
user 0m0.076s
sys 0m14.580s
belay:~ # time ./forkexit 40000
Fork returned an error: 7693 times
real 0m15.218s
user 0m0.112s
sys 0m14.992s
=== fork-wait-exit ===
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.922s
user 0m1.560s
sys 0m28.060s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m25.178s
user 0m1.680s
sys 0m27.988s
=== AIM7 ===
nux version 2.6.14-rc2 (erikj@attica) (gcc version 3.3.3 (SuSE Linux)) #5 SMP
PREEMPT Thu Sep 29 15:00:00 CDT 2005
HOST = belay
CPUS = 32
DIRS = 1
DISKS= 0
FS = xfs
SCSI = non-xscsi
ID = RCU pnotify+job+new-keyring
Run 1 of 1
Benchmark Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1" AIM7 Run Sep 29 16:42:08
2005
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
1 1903.8 100 3.1 0.6 31.7305
2 3655.8 96 3.2 1.2 30.4648
3 5628.6 96 3.1 1.6 31.2701
4 7553.5 96 3.1 2.1 31.4731
5 10003.4 99 2.9 2.2 33.3448
10 18945.3 97 3.1 4.7 31.5755
20 36823.8 97 3.2 9.2 30.6865
50 72532.4 94 4.0 23.4 24.1775
100 97552.8 90 6.0 46.7 16.2588
150 120981.2 87 7.2 68.4 13.4424
200 130581.1 86 8.9 88.7 10.8818
500 151665.2 83 19.2 228.0 5.0555
1000 153586.3 81 37.9 467.5 2.5598
2000 153454.7 79 75.9 941.7 1.2788
2.6.14-rc2 stock+kdb, no pnotify, no job, unmodified keyrings enabled
------------------------------------------------------------------------------
=== jobtest ===
None: kernel doesn't have pnotify or job
=== forkexit ===
--> My first attempt on a stock kernel+kdb took more than 5 minutes. The
--> data isn't useful because I escaped in to kdb to check on some things.
--> There was a similar outlier for my RCU pnotify kernel above that took
--> 1 minute 23 above. Later runs were normal.
belay:~ # time ./forkexit 40000
Fork returned an error: 7698 times
real 0m14.421s
user 0m0.088s
sys 0m14.220s
belay:~ # time ./forkexit 40000
Fork returned an error: 7699 times
real 0m14.282s
user 0m0.064s
sys 0m14.100s
belay:~ # time ./forkexit 40000
Fork returned an error: 7699 times
real 0m15.736s
user 0m0.072s
sys 0m15.648s
=== fork-wait-exit ===
--> The 16.838 was an outlier I was never able to duplicate.
--> I tried many times. Of course, if I resrict it to just two processors
--> on a single node, it's faster. Perhaps we got lucky once.
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m16.838s
user 0m1.080s
sys 0m17.092s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.599s
user 0m1.728s
sys 0m28.248s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.742s
user 0m1.668s
sys 0m27.332s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.777s
user 0m1.720s
sys 0m26.548s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.829s
user 0m1.644s
sys 0m27.168s
=== AIM7 ===
nux version 2.6.14-rc2 (erikj@attica) (gcc version 3.3.3 (SuSE Linux)) #2 SMP
PREEMPT Wed Sep 28 14:53:02 CDT 2005
HOST = belay
CPUS = 32
DIRS = 1
DISKS= 0
FS = xfs
SCSI = non-xscsi
ID = stock, non-modified-keyrings enabled
Run 1 of 1
Benchmark Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1" AIM7 Run Sep 29 17:38:42
2005
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
1 1909.4 100 3.0 0.6 31.8241
2 3670.8 96 3.2 1.2 30.5897
3 5597.9 96 3.1 1.6 31.0997
4 7543.7 96 3.1 2.1 31.4323
5 9445.0 97 3.1 2.4 31.4833
10 19088.2 97 3.0 4.5 31.8137
20 37977.2 98 3.1 9.0 31.6476
50 73559.2 95 4.0 23.6 24.5197
100 101147.0 90 5.8 47.1 16.8578
150 120496.9 86 7.2 68.5 13.3885
200 129003.7 86 9.0 90.5 10.7503
500 149737.6 83 19.4 222.8 4.9913
1000 158010.5 80 36.8 439.7 2.6335
2000 156453.7 77 74.4 885.3 1.3038
2.6.14-rc2 non-rcu pnotify, NO Job, pnotify-aware keyrings, tiny pnotify module
------------------------------------------------------------------------------
=== jobtest ===
None: kernel doesn't have job
=== forkexit ===
--> Like the other tests, the first run of this took longer than the other
--> runs of it.
belay:~ # time ./forkexit 40000
Fork returned an error: 7686 times
real 1m35.260s
user 0m0.076s
sys 1m35.064s
belay:~ # time ./forkexit 40000
Fork returned an error: 7686 times
real 0m15.843s
user 0m0.068s
sys 0m15.652s
belay:~ # time ./forkexit 40000
Fork returned an error: 7687 times
real 0m14.404s
user 0m0.064s
sys 0m14.220s
belay:~ # time ./forkexit 40000
Fork returned an error: 7687 times
real 0m14.487s
user 0m0.060s
sys 0m14.304s
=== fork-wait-exit ===
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m25.205s
user 0m2.112s
sys 0m30.684s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m25.045s
user 0m1.872s
sys 0m30.208s
belay:~ # time ./fork-wait-exit 40000
Fork returned an error: 0 times
real 0m24.927s
user 0m1.964s
sys 0m29.200s
=== AIM7 ===
Linux version 2.6.14-rc2 (erikj@attica) (gcc version 3.3.3 (SuSE Linux)) #3 SMP
PREEMPT Fri Sep 30 09:21:15 CDT 2005
HOST = belay
CPUS = 32
DIRS = 1
DISKS= 0
FS = xfs
SCSI = non-xscsi
ID = non-rcu pnotify, NO Job, pnotify keyrings, tiny pnotify test module
Run 1 of 1
Benchmark Version Machine Run Date
AIM Multiuser Benchmark - Suite VII "1.1" AIM7 Run Sep 30 10:19:29
2005
Tasks Jobs/Min JTI Real CPU Jobs/sec/task
1 1909.4 100 3.0 0.6 31.8241
2 3694.1 96 3.2 1.2 30.7839
3 5679.9 97 3.1 1.5 31.5550
4 7538.9 96 3.1 2.2 31.4119
5 10031.0 99 2.9 2.2 33.4367
10 19088.2 97 3.0 4.5 31.8137
20 36964.1 97 3.1 9.2 30.8034
50 72424.1 95 4.0 23.9 24.1414
100 102410.7 90 5.7 46.1 17.0684
150 120148.6 86 7.3 67.4 13.3498
200 131927.9 87 8.8 89.5 10.9940
500 154229.4 83 18.9 220.7 5.1410
1000 158552.9 80 36.7 440.5 2.6425
2000 152884.3 78 76.1 882.8 1.2740
=== Special note for this kernel ===
After all the testing, I removed my tiny test kernel module that did
atomic increments to count the number of times various hooks happened.
Here is the info from that (from dmesg):
Unregistering pnotify support for (name=pnotify-test)
exit called 797136 times...
fork called 796785 times...
init called 351 times...
exec called 97675 times ...
Good - fork count + init count equals exit count.
------------------------------------------------------------------------------
I really would need more trials to see if these converge but I think we're
covered on data with the AIM runs. Let me know if more data is requested.
It appears stock is performing best but pnotify with two subscribers
(keyrings, which is present in stock and my test module that atomic increments
counters for the callbacks) is very close. I suspect if I ran another
test where only one subscriber was present to match what was in stock,
the numbers would be nearly the same.
forkexit summary (real time average minus outliers):
pnotify+job+pnotify keyrings, NON-RCU: 15.07
pnotify+job+pnotify keyrings, RCU: 14.85
stock: 14.813
pnotify+pnotify keyrings+test mod, non-rcu: 14.911
fork-wait-exit summary (real time average minus outlier):
pnotify+job+pnotify keyrings, NON-RCU: 25.351
pnotify+job+pnotify keyrings, RCU: 25.05
stock: 24.703
pnotify+pnotify keyrings+test mod, non-rcu: 25.059
|