Received: with ECARTIS (v1.0.0; list netdev); Fri, 28 Jan 2005 14:13:43 -0800 (PST) Received: from relay3.uli.it (relay3.uli.it [62.212.0.49]) by oss.sgi.com (8.13.0/8.13.0) with ESMTP id j0SMDboq003448 for ; Fri, 28 Jan 2005 14:13:38 -0800 Received: from nabla.orlandi.com (nabla.orlandi.com [62.212.12.10]) (using TLSv1 with cipher RC4-MD5 (128/128 bits)) (No client certificate requested) by greg.uli.it (Postfix) with ESMTP id F21D0953A for ; Fri, 28 Jan 2005 23:13:31 +0100 (CET) From: Daniele Orlandi To: netdev@oss.sgi.com Subject: Possible race/deadlock in netdev_unregister Date: Fri, 28 Jan 2005 23:13:29 +0100 User-Agent: KMail/1.7.1 MIME-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200501282313.30517.daniele@orlandi.com> X-Virus-Scanned: ClamAV 0.80/650/Sun Jan 2 19:00:02 2005 clamav-milter version 0.80j on 127.0.0.1 X-Virus-Status: Clean X-archive-position: 974 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: daniele@orlandi.com Precedence: bulk X-list: netdev Content-Length: 2911 Lines: 57 Hello, First of all, please excuse me if this happens to be a bug in my code :) Unfortunately I'm still a newbie with netdev and his interactions with netlink, hotplug, etc... please help me understand what's happening and who is wrong. The scenario is this: - A device driver module (written by me) has two netdevices registered. - Another module (written by me, too) provides sockets implementation for the protocol spoken by the device. - An application has one socket bound to one netdevice - I rmmod the device driver module - The exit function in the module calls netdev_unregister - The event dispatcher notifies the socket layer that a device is going down - The socket is marked errored but the application keeps it open for a while - The application ends, the socket is destroyed, the remaining reference to netdevice is released but netdev_unregister keeps sleeping forever with this backtrace: Jan 28 19:02:58 bastard kernel: Call Trace: Jan 28 19:02:58 bastard kernel: [] __down+0x6e/0xd0 Jan 28 19:02:58 bastard kernel: [] default_wake_function+0x0/0x10 Jan 28 19:02:58 bastard kernel: [] netlink_dump+0x66/0x180 Jan 28 19:02:58 bastard kernel: [] __wake_up_common+0x35/0x60 Jan 28 19:02:58 bastard kernel: [] __down_failed+0x8/0xc Jan 28 19:02:58 bastard kernel: [] .text.lock.dev+0x91/0xb9 Jan 28 19:02:58 bastard kernel: [] rtnetlink_dump_ifinfo+0x0/0x70 Jan 28 19:02:58 bastard kernel: [] rtnetlink_rcv+0x1d8/0x3f0 Jan 28 19:02:58 bastard kernel: [] rtnetlink_rcv+0x0/0x3f0 Jan 28 19:02:58 bastard kernel: [] netlink_data_ready+0x28/0x50 Jan 28 19:02:58 bastard kernel: [] netdev_wait_allrefs+0xf1/0x100 Jan 28 19:02:58 bastard kernel: [] kobject_release+0x0/0x10 Jan 28 19:02:58 bastard kernel: [] netdev_run_todo+0xfc/0x1c0 Jan 28 19:02:58 bastard kernel: [] rtnetlink_dump_ifinfo+0x0/0x70 Jan 28 19:02:58 bastard kernel: [] rtnetlink_rcv+0x1d8/0x3f0 Jan 28 19:02:58 bastard kernel: [] printk+0xf/0x20 Jan 28 19:02:58 bastard kernel: [] wakeme_after_rcu+0x0/0x10 Jan 28 19:02:58 bastard kernel: [] rtnetlink_rcv+0x0/0x3f0 Jan 28 19:02:58 bastard kernel: [] netlink_data_ready+0x28/0x50 Jan 28 19:02:58 bastard kernel: [] rtnl_unlock+0x31/0x40 Jan 28 19:02:58 bastard kernel: [] fake_module_exit+0x2e/0x7e [fake_isdn] Jan 28 19:02:58 bastard kernel: [] sys_delete_module+0x15a/0x170 Jan 28 19:02:58 bastard kernel: [] unmap_vma_list+0xe/0x20 Jan 28 19:02:58 bastard kernel: [] do_munmap+0xd8/0x120 Jan 28 19:02:58 bastard kernel: [] sys_munmap+0x3c/0x60 Jan 28 19:02:58 bastard kernel: [] sysenter_past_esp+0x52/0x79 Note that disabling hotplug avoids the problem. Bye, -- Daniele Orlandi