Hi,
Ive got a bucketload of acenic adapters in a ppc64 box. I get random
tx timeouts, I suspect there is a missing memory barrier (power4 is
good at catching those). Still looking.
I did manage to lock a card up in ace_start_xmit:
restart:
...
if (tx_ring_full(ap, ap->tx_ret_csm, idx))
goto overflow;
...
overflow:
/*
* This race condition is unavoidable with lock-free drivers.
* We wake up the queue _before_ tx_prd is advanced, so that we
* can
* enter hard_start_xmit too early, while tx ring still looks
* closed.
* This happens ~1-4 times per 100000 packets, so that we can
* allow
* to loop syncing to other CPU. Probably, we need an additional
* wmb() in ace_tx_intr as well.
*
* Note that this race is relieved by reserving one more entry
* in tx ring than it is necessary (see original non-SG driver).
* However, with SG we need to reserve 2*MAX_SKB_FRAGS+1, which
* is already overkill.
*
* Alternative is to return with 1 not throttling queue. In this
* case loop becomes longer, no more useful effects.
*/
barrier();
goto restart;
Its stuck there and never coming out. Alexey: I have a feeling you
wrote this code, is that correct? :)
Anton
|