netdev
[Top] [All Lists]

[PATCH 2.6] fix deadlock with ip_queue and tcp local input path

To: Netfilter Development Mailinglist <netfilter-devel@xxxxxxxxxxxxxxxxxxx>
Subject: [PATCH 2.6] fix deadlock with ip_queue and tcp local input path
From: Harald Welte <laforge@xxxxxxxxxxxxx>
Date: Thu, 26 May 2005 16:24:21 +0200
Cc: Linux Netdev List <netdev@xxxxxxxxxxx>, James Morris <jmorris@xxxxxxxxxx>
Mail-followup-to: Harald Welte <laforge@xxxxxxxxxxxxx>, Netfilter Development Mailinglist <netfilter-devel@xxxxxxxxxxxxxxxxxxx>, Linux Netdev List <netdev@xxxxxxxxxxx>, James Morris <jmorris@xxxxxxxxxx>
Sender: netdev-bounce@xxxxxxxxxxx
User-agent: mutt-ng 1.5.8-r168i (Debian)
Hi!

When we have ip_queue being used from LOCAL_IN, then we end up with a
situation where the verdicts coming back from userspace traverse the TCP
input path from syscall context.  While this seems to work most of the
time, there's an ungly deadlock:

syscall context is interrupted by the timer interrupt.  When the timer
interrupt leaves, the timer softirq get's scheduled and calls
tcp_delack_timer() and alike.  They themselves do bh_lock_sock(sk),
which is already held from somewhere else[1] -> boom.

The patch below adds an additional queue for ip_queue verdicts.  They
come up from userspace, are appended to a queue which is then processed
by a tasklet.  The tasklet itself runs in softirq context, so when timer
hardirq leaves, no tcp_delack_timer() will be executed until
nf_reinject() has finished.

Dave: Please don't apply yet, I want to receive feedback from the
netfilter developers first.  I'm just Cc'ing netdev in case somebody
wants an intermediate fix to fix the problem.

[1] i didn't actually find from where that lock was held.  In any case,
it didn't seem nice to traverse the IP input codepath from syscall
context.  There might be a number of places which just assume to run in
softirq.

Comments welcome!

-- 
- Harald Welte <laforge@xxxxxxxxxxxxx>                 http://netfilter.org/
============================================================================
  "Fragmentation is like classful addressing -- an interesting early
   architectural error that shows how much experimentation was going
   on while IP was being designed."                    -- Paul Vixie

Attachment: 2.6.12-rc5-ip_queue-local-reinject-smp-deadlock-fix.patch
Description: Text document

Attachment: pgpx6Wy0Sp7HT.pgp
Description: PGP signature

<Prev in Thread] Current Thread [Next in Thread>