Received: (from majordomo@localhost) by oss.sgi.com (8.11.2/8.11.3) id f5E7dCW11590 for linux-xfs-outgoing; Thu, 14 Jun 2001 00:39:12 -0700 Received: from mail.coltex.nl (IDENT:root@edge.coltex.nl [194.151.97.115]) by oss.sgi.com (8.11.2/8.11.3) with SMTP id f5E7dAP11581 for ; Thu, 14 Jun 2001 00:39:10 -0700 Received: from auto-nb1.xs4all.nl (auto-nb1.coltex.nl [10.0.1.171]) by mail.coltex.nl (8.11.2/8.11.2) with ESMTP id f5E7ceV32059; Thu, 14 Jun 2001 09:38:56 +0200 Message-Id: <4.3.2.7.2.20010614091958.030b5aa0@pop.xs4all.nl> X-Sender: knuffie@pop.xs4all.nl X-Mailer: QUALCOMM Windows Eudora Version 4.3.2 Date: Thu, 14 Jun 2001 09:38:39 +0200 To: Gerald Weber , linux-xfs@oss.sgi.com From: Seth Mos Subject: Re: linux 2.4.5 + xfs lockup In-Reply-To: <3B284D99.908121D2@teleworld.at> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed Sender: owner-linux-xfs@oss.sgi.com Precedence: bulk At 07:37 14-6-2001 +0200, Gerald Weber wrote: >hi there, > >following kernel-oops occoured yesterday night: >(ksymoops output) > >Jun 14 00:14:17 twasrv1 kernel: Unable to handle kernel paging request at >virtual address 41a9b2a6 >Jun 14 00:14:17 twasrv1 kernel: 41a9b2a6 >Jun 14 00:14:17 twasrv1 kernel: *pde = 00000000 >Jun 14 00:14:17 twasrv1 kernel: Oops: 0000 >Jun 14 00:14:17 twasrv1 kernel: CPU: 1 >Jun 14 00:14:17 twasrv1 kernel: EIP: >0010:[pagebuf_locking_terminate+1101639818/-1072693788] >Jun 14 00:14:17 twasrv1 kernel: EFLAGS: 00010206 >Jun 14 00:14:17 twasrv1 kernel: eax: 00000016 ebx: 08090fc0 ecx: >488d0000 >edx: 00000000 >Jun 14 00:14:17 twasrv1 kernel: esi: 5cbf8000 edi: 00000000 ebp: >cbcd0d2c >esp: ca6e9f04 >Jun 14 00:14:17 twasrv1 kernel: ds: 0018 es: 0018 ss: 0018 >Jun 14 00:14:17 twasrv1 kernel: Process gzip (pid: 21791, stackpage=ca6e9000) >Jun 14 00:14:17 twasrv1 kernel: Stack: cbcd0e44 5cbf8000 00000000 d67015e0 >00000000 00004000 08090fc0 da268520 >Jun 14 00:14:17 twasrv1 kernel: 00000000 d67016ec d06ef820 00000000 >c0280f0a ca6e9f88 d67015e0 c27d3260 >Jun 14 00:14:17 twasrv1 kernel: c01e378e cbcd0d44 ca6e9f7c 00000000 >00000000 00000000 d67015e0 c27d3260 >Jun 14 00:14:17 twasrv1 kernel: Call Trace: [ip_rcv+786/920] >[linvfs_write+266/324] [sys_write+143/196] [system_call+51/56] >[startup_32+43/203] >Jun 14 00:14:17 twasrv1 kernel: Code: Bad EIP value. > >system is a dell poweredge 2450,2 mylex AcceleRAID 352 >one of them is serving a raid5 with 300gb. >the machine locks up (no net,no sysreq,no console input accepted) >kernel is from cvs (2.4.5) You mean serving NFS? There is a patch missing in 2.4.5 that makes it barf after longer use. I suggest checking out the newer CVS which is 2.4.6-pre2 based. If you don't want to run the pre kernel pickup the patches for 2.4.5 from the oss.sgi.com ftp server. They have the NFS fix included. Good luck. I've made a torture shell scripts that.. well tortures the the NFS server I use for testing. It starts up a Bonnie for a 100MB file waits 30 seconds and starts another, this will slowly make the number of concurrent Bonnies rise. The server and clients are 2.4.6-pre1 based. The server is a Dell Optiplex PIII 450 with 256MB ram and a 40GB IDE disk. The clients were 2 Dell Poweredge 2450 Servers with dual 733 and also 256MB ram. All was connected to a 100Mbit switch. Both clients ran this test simultaneously. It did not crash the server machine but it was having difficulty keeping (load avg 2) when the number of Bonnies on both servers reached 15. I stopped the test after 30 simultaneous Bonnies on each client. I let the remaining Bonnies complete. The IDE led of the "server" machine was still on more then 2 hours after I stopped the scripts. It seems I was running into IDE limitations on the server side. The stats on the 3Com switch were dropping of when the number of Bonnies on each client went above 15. So this means that the server ran into an IO bottleneck on the server side. The load average during those hours when the Bonnies were completing remained at 6-7. I'm affraid I'll need some heavier equipment to push harder over NFS. >ps: ...not bad.. xfs check takes only 4 secs for 300gb.i'm impressed... That's the cool part. By accident I pulled a 2450 out of the rack for a few centimers but it made the powercords fall out. XFS to the rescue ;-) It's weird, every plug on modern PC's has clips to hold it except the euro power plugs. Damn. I think I'll take some tie-wraps and connect it permanently. Superglue makes a good second. Bye -- Seth Every program has two purposes one for which it was written and another for which it wasn't I use the last kind.