Received: with ECARTIS (v1.0.0; list netdev); Tue, 26 Apr 2005 06:47:12 -0700 (PDT) Received: from one.firstfloor.org (one.firstfloor.org [213.235.205.2]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j3QDl8Zf021764 for ; Tue, 26 Apr 2005 06:47:08 -0700 Received: by one.firstfloor.org (Postfix, from userid 502) id 6BF98D033E; Tue, 26 Apr 2005 15:47:07 +0200 (CEST) To: Matt Mackall Cc: netdev@oss.sgi.com, davem@redhat.com Subject: Re: [PATCH] Fix deadlock in netconsole with no carrier References: <20050419135350.GH7715@wotan.suse.de> <20050419170650.GW21897@waste.org> From: Andi Kleen Date: Tue, 26 Apr 2005 15:47:07 +0200 In-Reply-To: <20050419170650.GW21897@waste.org> (Matt Mackall's message of "Tue, 19 Apr 2005 10:06:50 -0700") Message-ID: User-Agent: Gnus/5.110002 (No Gnus v0.2) Emacs/21.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Scanned: ClamAV 0.83/854/Tue Apr 26 05:28:25 2005 on oss.sgi.com X-Virus-Status: Clean X-archive-position: 475 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: ak@muc.de Precedence: bulk X-list: netdev Content-Length: 1609 Lines: 45 Matt Mackall writes: [sorry for the late answer, but you dont seem to have cced the answer to me so I lost it until now] > On Tue, Apr 19, 2005 at 03:53:50PM +0200, Andi Kleen wrote: >> >> I got a deadlock at boot with netconsole when the netword card >> did not have a cable connected. This patch fixes this by limiting >> the number of retries. > > It should be waiting for carrier detect before proceeding. What NIC is that? e1000 > I'm sure five retries is not enough. Well, infinite is definitely too many. And the early netconsole code already waits for carrier up, so waiting even longer in the actual write does not make much sense to me. The problem with spinning longer here is that when you boot on a system with no carrier but netconsole configured it will waste a lot of time uselessly spinning/polling here all the time. It is better to end this early. In theory you could do a more clever backoff scheme and note when a device is always down, but I think the short retry combined with the long wait at early netconsole init is nearly equivalent. Without this patch my setup doesnt even boot so I would appreciate if the patch could be applied. > >> Also when we run into the device spinlock dont poll all the time, >> just spin. > > Two patches? Again, I don't think we should give up so easily. For the device spinlock polling is useless because the NIC is not actually out of resources, all you need to do is to spin. Polling too is a waste of CPU time. In case polling is really needed (in case of a race) it will be retried once the spinlock is free. -Andi