> The changes you have done for atomic_dec_and_lock() in Lockmeter
v1.4.10
> does not seem to give correct statistics.
I'm quite willing to believe my 1.4.10 is wrong, but I think the better
terminology is "does not seem to give identical statistics", not "does
not seem to give correct statistics."
> If you want to see what's the difference just look at the folowing
numbers
>
> With Lockmeter v1.4.10
> ----------------------
> 6.3% 9.2% 0.4us(1659us) 3.4us(1648us)( 1.3%) 23182304 90.8% 9.2%
0% dcache_lock
> 2.3% 6.4% 0.3us( 120us) 3.3us(1379us)(0.48%) 12336769 93.6% 6.4%
0% dput+0x18
>
> Making actual atomic_dec_and_lock inline
> ----------------------------------------
> 4.0% 8.6% 0.5us(1333us) 1.9us(1309us)(0.35%) 11940384 91.4% 8.6%
0% dcache_lock
> 0.3% 7.0% 0.4us( 191us) 1.9us(1197us)(0.03%) 1096658 93.0% 7.0%
0% dput+0x30
>
> There is a difference of more than 10 times in number of times
dcache_lock is
> taken from dput. I didnot face any problem in making the actual
> atomic_dec_and_lock() inline and could not understand why you choose
the less
> efficient version.
I'm guessing that you're looking at i386, which I believe means that the
code that executes is found in arch/i386/lib/dec_and_lock.c, right?
That implementation has a "fast path" that uses a direct asm sequence
using cmpxchgl, and a "slow path" that invokes spin_lock() and
spin_unlock(). Lockmeter, of course, will only instrument this "slow
path". You simply won't see any 1.4.9 statistics for the "fast path".
What I did in 1.4.10 was to force everything through the "slow path",
which means that all those formerly invisible "fast path" spinlock
acquisitions are now showing up as metered spin_lock/spin_unlock pairs.
So I would argue that 1.4.9 not only aggregates lots of spinlock/unlock
activity into appearing as atomic_dec_and_lock() calls -- which 1.4.10
fixes to now make that activity visible in the actual spinlock
acquisition routine, such as dput() -- but 1.4.9 also completely misses
quite a few spinlock acquisitions that were going through that "fast
path". So I would argue that 1.4.10 actually produces correct
statistics, not simply *different* statistics, vs. 1.4.9.
John Hawkes
|