Received: with ECARTIS (v1.0.0; list netdev); Tue, 13 Sep 2005 02:25:44 -0700 (PDT) Received: from relay.uni-heidelberg.de (relay.uni-heidelberg.de [129.206.100.212]) by oss.sgi.com (8.12.10/8.12.10/SuSE Linux 0.7) with ESMTP id j8D9PaiL032121 for ; Tue, 13 Sep 2005 02:25:37 -0700 Received: from hamilton1.pci.uni-heidelberg.de (hamilton1.pci.uni-heidelberg.de [129.206.21.201]) by relay.uni-heidelberg.de (8.13.4/8.13.1) with ESMTP id j8D9MnL3004747; Tue, 13 Sep 2005 11:22:49 +0200 Received: from lanczos.pci.uni-heidelberg.de ([129.206.21.135] helo=lanczos ident=foobar) by hamilton1.pci.uni-heidelberg.de with smtp (Exim 3.36 #1 (Debian)) id 1EF709-0001oQ-00; Tue, 13 Sep 2005 11:22:53 +0200 Received: by lanczos (sSMTP sendmail emulation); Tue, 13 Sep 2005 11:22:53 +0200 From: Bernd Schubert Reply-To: TC-ADMIN@listserv.uni-heidelberg.de To: netdev@oss.sgi.com Subject: Re: 2.613: network write socket problems Date: Tue, 13 Sep 2005 11:22:52 +0200 User-Agent: KMail/1.7.2 Cc: linux-kernel@vger.kernel.org References: <200509121739.46172.bernd.schubert@pci.uni-heidelberg.de> In-Reply-To: <200509121739.46172.bernd.schubert@pci.uni-heidelberg.de> MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <200509131122.53286.bernd.schubert@pci.uni-heidelberg.de> X-archive-position: 3618 X-ecartis-version: Ecartis v1.0.0 Sender: netdev-bounce@oss.sgi.com Errors-to: netdev-bounce@oss.sgi.com X-original-sender: bernd.schubert@pci.uni-heidelberg.de Precedence: bulk X-list: netdev Content-Length: 1120 Lines: 24 On Monday 12 September 2005 17:39, Bernd Schubert wrote: > Hello, > > on last Friday we switched on our server to 2.6.13 and today we are > experiencing problems with our nfs clients. > In particular I'm talking about the unfs3 daemon, not the kernel nfs > daemon. Both are running on the server but on different ports, of course. > Both are also serving to the same clients, but different directories. > > Today it already several times happend that the unfs3 daemon stalled. > Ethereal showed no network packages on the unfs3 daemon port during this > time. A strace to the proc-id of the daemon clearly shows that *some* > writes to some network sockets will take ages to finish > > write(37, "\200\0\0x\203\326(\5\0\0\0\1\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 124) > = 124 Sorry for the noise, its not a kernel problem. Switching back to 2.6.11 didn't help, so we investigated further. It turned out, that one of our clients was in a kind of a zombie state and asking for filehandles, but not answering request from the server. Since unfs3 is only single threaded, all other clients had to wait for timeouts. Bernd