It seems that the atomic_dec_and_test for the child dst in net/core/dst.c is really not needed since dst_destroy cannot be called simultanously for the same dst (frees dst unconditionally via kmem_ca
Otimizing big SGI NUM systems again, are you? :-) atomic_dec() has no memory ordering guarentees, only the atomic routines returning values do the proper SMP memory barriers. So, based upon this alon
Correct that applies in general. But what could go wrong if the atomic_dec is separated from the atomic_read in this specific location? I fail to see what the point of having a single instance of ato
If this is true, what performance improvement could you possibly be seeing from this change? I know you are making this change for performance reasons, yet you aren't mentioning any details about thi
We could make refcnt into an array of pointers that point to node specific memory. This avoids cache line bouncing. However, you cannot atomically dec_and_test an array. This is the only location whe
The dst object is already too large. You have to show a serious performance improvement to justify bloating it up further. Then we lose the optimizations of those memory barriers that platforms do in
This is racy (albeit very unlikely) because dst may be freed by dst_run_gc after the atomic_dec. When we are not the real parent of the dst (e.g., when we're xfrm_dst and the child is an rtentry), it
If that is so then it is also possible that the gc frees after atomic_dec_and_test: cpu0 dst_destroy cpu1 dst_run_gc atomic_dec_and_test(refcnt) if (atomic_read(refcnt)) ... .. dst_destroy(dst) kmem_
No this is prevented by the nohash check. -- Visit Openswan at http://www.openswan.org/ Email: Herbert Xu ~{PmV>HI~} <herbert@xxxxxxxxxxxxxxxxxxx> Home Page: http://gondor.apana.org.au/~herbert/ PGP
Ok then the purpose of this whole thing is to decrement the usage counter and at the same time figure out if this is the last child on an unhashed entry. That unhashed entry is not handled by the gar
Yep you're totally correct. In fact, the atomic_dec_and_test is really only needed for unhashed entries (i.e., IPsec). So we could do something like this so that all hashed entries undergo atomic_dec
See his other emails in this thread. If it can be converted to atomic_dec() then he wants to change the counter into an array of counters on NUMA systems. But his trick only works if the atomic_dec_a
Correct. Some changes to the way locking is done between the garbage collector and dst_destroy would be necessary. Lets see if the author of the patch can come up with a solution to this.
OK I've thought more about this and indeed it is possible to get rid of atomic_dec_and_test. As you can see from my previous patch, it is possible to restrict the use of atomic_dec_and_test to the NO
In an earlier email you said that there was a slight chance that child entries (typically NOHASH from what I can see in the code) could be on the gc list. If its not on the hash table and not on the
Only child entries with NOHASH unset can be on the GC list. For NOHASH entries refcnt can be greater than one due to dst_pop. Of course concurrent access is possible. However, the important thing is
So in case f.e. the refcnt was 2 for a child NOHASH entry, then we decrement the refcnt for the child but do not free it. However after dst_destroy the parent is gone and presumably some skb->dst are
If the refcnt is non-zero after the atomic_dec and the entry is NOHASH, we return it and the caller will add it to the GC list. See the relevant section in dst_run_gc and dst_free. Cheers, -- Visit O
That was just some C code to explain what I meant. Here is a patch against 2.6.12-rc2 with explanations as to what exactly is going on with __refcnt so that we can avoid future confusion about this p