Hi!
When we have ip_queue being used from LOCAL_IN, then we end up with a
situation where the verdicts coming back from userspace traverse the TCP
input path from syscall context. While this seems to work most of the
time, there's an ungly deadlock:
syscall context is interrupted by the timer interrupt. When the timer
interrupt leaves, the timer softirq get's scheduled and calls
tcp_delack_timer() and alike. They themselves do bh_lock_sock(sk),
which is already held from somewhere else[1] -> boom.
The patch below adds an additional queue for ip_queue verdicts. They
come up from userspace, are appended to a queue which is then processed
by a tasklet. The tasklet itself runs in softirq context, so when timer
hardirq leaves, no tcp_delack_timer() will be executed until
nf_reinject() has finished.
Dave: Please don't apply yet, I want to receive feedback from the
netfilter developers first. I'm just Cc'ing netdev in case somebody
wants an intermediate fix to fix the problem.
[1] i didn't actually find from where that lock was held. In any case,
it didn't seem nice to traverse the IP input codepath from syscall
context. There might be a number of places which just assume to run in
softirq.
Comments welcome!
--
- Harald Welte <laforge@xxxxxxxxxxxxx> http://netfilter.org/
============================================================================
"Fragmentation is like classful addressing -- an interesting early
architectural error that shows how much experimentation was going
on while IP was being designed." -- Paul Vixie
2.6.12-rc5-ip_queue-local-reinject-smp-deadlock-fix.patch
Description: Text document
pgpx6Wy0Sp7HT.pgp
Description: PGP signature
|