xfs
[Top] [All Lists]

Re: 2.4.18 XFS 1.1 : Gave up on XFS - too many Oops

To: linux-xfs@xxxxxxxxxxx
Subject: Re: 2.4.18 XFS 1.1 : Gave up on XFS - too many Oops
From: Walter R Fletcher <fletcher@xxxxxxxx>
Date: Wed, 24 Jul 2002 11:03:31 -0600 (MDT)
Sender: owner-linux-xfs@xxxxxxxxxxx
Dear People,

     I take care of a modest number of SGI, Sun, and Linux based
systems in the department labs and offices.  The one resource that
gets continually exhausted is disk space.  Some time ago I read an
article on the web describing a way to make a large RAID for a
meager cost.  Some further searching revealed a modest body of
information about this subject.

     I got approval to set up a test system.  The filesystem type I
chose was XFS because I had used it for over 5 years on my Challenge-S
fileserver (15 different filesystems, about 350 gigabytes) without a
single snag.  The test system consisted of a collection of 23 and 70
gigabyte disks.  Tests were flawless, so I bought a good motherboard
(Tyan ThunderK7, a good RAID card (3ware 7850), 8 120 gigabyte Maxtors
(plus some spares), and a small disk for the system.  I went with a
hardware RAID because I didn't trust the Linux software RAID at the
time.  I ended up with an 860 gigabyte filesystem.

     Initially I used the SGI XFS ISO (RH7.1 I think) to set
the system up.  The system ran OK except for frequent rpc timeout
problems reported by the NFS clients but only during write operations.
The problem could be made to happen by setting processes running on
each client which wrote data to the NFS server at the fastest rate
possible, i.e.:

    dd  if=/dev/zero  count=<bignumber>  bs=<various>  of=/big/file

The dd parameters were chosen to mimic the record sizes and quantities
typical of the the processing being done by the users, i.e.:

       count = whatever would result in an approximately 100gig sized
               file given the block size chosen.
       bs    = anywhere from 6000 to 16000 bytes.  Size didn't seem
               to matter.

I would set this in motion on 1, 2, or 3 systems at once, depending on
how many systems with 100baseT interfaces were free at the moment.  This
would also mimic a normal load at any given time.

     Over time I tried tweaks of various parameters used to tune NFS
or other aspects of the system to improve performance and reliability,
but never did totally get rid of the failures so I've never been sure
I was tweaking anything/everything really meaningful.  I wish there
was better documentation of the /proc filesystem/NFS parameters.

     Recently I downloaded the 2.4.18/XFS 1.1 ISO image installer from
SGI to do a clean update.  There had been so many tinkerings and patches
to the system I was beginning to worry that the server would suffer or
unravel altogether.  The new install has been much more stable.  I have
only once been able to hang the system.  That failure took nearly 24 hours
of continual onslaught before it finally happened.  The system has been
returned to normal service.  No problems have been reported by the users
for several weeks.  I'm keeping a hopeful eye on it.

    All in all, I now believe XFS based filesystems as NFS exports are
sufficiently reliable and will continue to improve such that I'm in the
process of building two more servers like the first.  We need the space.

--  Reid

  Walter Reid Fletcher, WB7CJO
  Senior Systems Analyst / Unix Systems Administrator
  Department of Geology and Geophysics                  P. O. Box 3006
  University of Wyoming                                 1-307-766-6227
  Laramie,  WY   82071                  Internet:  Fletcher @ UWyo.Edu

    Are we all roadkill on the information highway?  - Jeff Greenfield



<Prev in Thread] Current Thread [Next in Thread>