xfs
[Top] [All Lists]

Re: linux 2.4.5 + xfs lockup

To: Gerald Weber <gerald.weber@xxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
Subject: Re: linux 2.4.5 + xfs lockup
From: Seth Mos <knuffie@xxxxxxxxx>
Date: Thu, 14 Jun 2001 09:38:39 +0200
In-reply-to: <3B284D99.908121D2@xxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
At 07:37 14-6-2001 +0200, Gerald Weber wrote:
hi there,

following kernel-oops occoured yesterday night:
(ksymoops output)

Jun 14 00:14:17 twasrv1 kernel: Unable to handle kernel paging request at
virtual address 41a9b2a6
Jun 14 00:14:17 twasrv1 kernel: 41a9b2a6
Jun 14 00:14:17 twasrv1 kernel: *pde = 00000000
Jun 14 00:14:17 twasrv1 kernel: Oops: 0000
Jun 14 00:14:17 twasrv1 kernel: CPU:    1
Jun 14 00:14:17 twasrv1 kernel: EIP:
0010:[pagebuf_locking_terminate+1101639818/-1072693788]
Jun 14 00:14:17 twasrv1 kernel: EFLAGS: 00010206
Jun 14 00:14:17 twasrv1 kernel: eax: 00000016 ebx: 08090fc0 ecx: 488d0000
edx: 00000000
Jun 14 00:14:17 twasrv1 kernel: esi: 5cbf8000 edi: 00000000 ebp: cbcd0d2c
esp: ca6e9f04
Jun 14 00:14:17 twasrv1 kernel: ds: 0018   es: 0018   ss: 0018
Jun 14 00:14:17 twasrv1 kernel: Process gzip (pid: 21791, stackpage=ca6e9000)
Jun 14 00:14:17 twasrv1 kernel: Stack: cbcd0e44 5cbf8000 00000000 d67015e0
00000000 00004000 08090fc0 da268520
Jun 14 00:14:17 twasrv1 kernel:        00000000 d67016ec d06ef820 00000000
c0280f0a ca6e9f88 d67015e0 c27d3260
Jun 14 00:14:17 twasrv1 kernel:        c01e378e cbcd0d44 ca6e9f7c 00000000
00000000 00000000 d67015e0 c27d3260
Jun 14 00:14:17 twasrv1 kernel: Call Trace: [ip_rcv+786/920]
[linvfs_write+266/324] [sys_write+143/196] [system_call+51/56]
[startup_32+43/203]
Jun 14 00:14:17 twasrv1 kernel: Code:  Bad EIP value.

system is a dell poweredge 2450,2 mylex AcceleRAID 352
one of them is serving a raid5 with 300gb.
the machine locks up (no net,no sysreq,no console input accepted)
kernel is from cvs (2.4.5)

You mean serving NFS? There is a patch missing in 2.4.5 that makes it barf after longer use. I suggest checking out the newer CVS which is 2.4.6-pre2 based. If you don't want to run the pre kernel pickup the patches for 2.4.5 from the oss.sgi.com ftp server. They have the NFS fix included.

Good luck.
I've made a torture shell scripts that.. well tortures the the NFS server I use for testing. It starts up a Bonnie for a 100MB file waits 30 seconds and starts another, this will slowly make the number of concurrent Bonnies rise.

The server and clients are 2.4.6-pre1 based. The server is a Dell Optiplex PIII 450 with 256MB ram and a 40GB IDE disk. The clients were 2 Dell Poweredge 2450 Servers with dual 733 and also 256MB ram. All was connected to a 100Mbit switch.

Both clients ran this test simultaneously. It did not crash the server machine but it was having difficulty keeping (load avg 2) when the number of Bonnies on both servers reached 15. I stopped the test after 30 simultaneous Bonnies on each client. I let the remaining Bonnies complete. The IDE led of the "server" machine was still on more then 2 hours after I stopped the scripts.

It seems I was running into IDE limitations on the server side. The stats on the 3Com switch were dropping of when the number of Bonnies on each client went above 15. So this means that the server ran into an IO bottleneck on the server side. The load average during those hours when the Bonnies were completing remained at 6-7.

I'm affraid I'll need some heavier equipment to push harder over NFS.

ps: ...not bad.. xfs check takes only 4 secs for 300gb.i'm impressed...

That's the cool part. By accident I pulled a 2450 out of the rack for a few centimers but it made the powercords fall out. XFS to the rescue ;-) It's weird, every plug on modern PC's has clips to hold it except the euro power plugs. Damn. I think I'll take some tie-wraps and connect it permanently. Superglue makes a good second.

Bye
--
Seth
Every program has two purposes one for which
it was written and another for which it wasn't
I use the last kind.


<Prev in Thread] Current Thread [Next in Thread>