xfs
[Top] [All Lists]

Re: 2.4.18 XFS 1.1 : Gave up on XFS - too many Oops

To: Poul Petersen <petersp@xxxxxxxxxxxxx>
Subject: Re: 2.4.18 XFS 1.1 : Gave up on XFS - too many Oops
From: Benito Venegas <venevene@xxxxxxxxxxxxxx>
Date: Fri, 19 Jul 2002 18:15:43 -0400 (EDT)
Cc: <linux-xfs@xxxxxxxxxxx>
In-reply-to: <F888C30C3021D411B9DA00B0D0209BE8038F95E6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
Poul:

I have a similar HW, but using 650F and 630 F (one of this is without F
:/)

I have some minor problems with 2.4.9-13 and the previous one, but after
to install 2.4.9-31 and I'haven;t experienced any problem

Suggestion (like Austin and Andi said): Sent to the list or developers the
result of ksymoops using the kernel Oops output (probably it's in
/var/log/message or /var/log/kernel if you configured syslog.conf)

System has been very well, without any problems and I can tell you my box
it's like a Christmas tree. It's sharing through NFS to 5 differen systems
and each one with different rates in I/O.

Give us more details.

Keith, Eric, and the rest of team can help you.

Feel free to contact me if you want more details.

Good luck..


 On Fri, 19 Jul 2002, Poul Petersen wrote:

>       Since we started using XFS over a year ago, we have had periodic
> problems. At the worst, our server would die perhaps twice a week with a
> kswapd oops, or other Null Pointer errors. These problems persisted, though
> changed slightly (I don't remember the whole history now - but I've posted
> here several times) as we went from XFS 1.01, to 1.02, to 1.1 and from
> kernel 2.4.5, 2.4.14, to 2.4.18 (I think that is right - the only ones I
> remember for sure are 2.4.14+XFS-1.0.2 and 2.4.18+XFS-1.1). The problems
> definitely seemed to intensify as if the file-systems had become hopelessly
> corrupt. Indeed, running xfs_repair after each crash would extend the uptime
> between crashes to a week, perhaps 10 days.
>
>       We also tried different hardware, in fact a total of three different
> machines. We also tried to duplicate the problem: we kept the original
> file-server OS disk with 2.4.5+XFS-1.01 which had been horribly unstable and
> installed it into identical hardware: cpu, SCSI controllers, SAN controller,
> SAN disks, network cards, etc. I then wrote a script which tried to
> duplicate our typical usage (a continuous build) and stressed the machine
> from 12 nodes for well over a week (pushing about 30MB/s: yup gig ether). I
> also added a continuous backup and I even threw in some memory stress tests
> as well as Bonnie. This configuration *never* crashed. I can only conclude
> that there is something about the real usage of our file systems that
> exposes a flaw in XFS, but I have no idea what it could be.
>
>       I do feel that it is *not* a hardware problem. I state this because
> we have migrated all of our file-systems from XFS to ext3 (about 1TB of
> data). While I was going through the hassle of moving data, I decided to add
> LVM as well (much recommended to anyone considering LVM - way cool). This
> configuration, though more complicated, has been up solidly for about a
> month now - a time period which would have seen our XFS based file server
> crash perhaps 3 or 4 times.
>
>       I debated not sending this message since it seemed like I might be
> taken as complaining. I'm not. We've given up on XFS simply because we can't
> duplicate the problem in a controlled environment, so we don't feel like we
> are going to be able to fix it. As this is a production machine, we simply
> can't have it crash each week. I felt it was important to let everyone know
> that there may be a significant bug, though obviously obscure.
>
> -poul
>
> Last XFS config:
>
> Dell 2550 Dual P-III (Coppermine) 933 MHz
> 1 GB Ram
> 1 9GB internal disk, aic7xxx driver (kernel)
> RedHat 7.2
> kernel 2.4.18 + XFS 1.1 (and xfs tools)
> nfsutils-0.3.3
> Qlogic 2200 SAN Adaptor firmware 2.02.01 running Qlogic driver 6.0b20
> Intel PRO/1000 Gig-Fiber running Intel e1000 driver 4.0.7
> Zzyzx RocketStor 2000 Raid Head
> Qlogic SAN box (Isolated Hard segment between file-server and Zzyzx)
>

-- 
Benito A. Venegas
System Engineer, Technology
225 Park Ave. South (6th floor ISI)
New York, NY 10003
A Euromoney Institutional Investor company.

**********************************************************************
This communication contains information which is confidential. It is
for the exclusive use of the intended recipient(s). If you are not
the intended recipient(s) please note any distribution, copying or
use of this communication or the information in it is strictly
prohibited. If you have received this communication in error please
notify us by e-mail or by telephone (as above) and then delete the
e-mail and all attachments and any copies thereof.
**********************************************************************






<Prev in Thread] Current Thread [Next in Thread>