xfs
[Top] [All Lists]

Re: Errors using amanda/xfsdump

To: Justin Tripp <justin@xxxxxxxxx>
Subject: Re: Errors using amanda/xfsdump
From: Steve Lord <lord@xxxxxxx>
Date: Tue, 08 May 2001 08:41:41 -0500
Cc: Federico Sevilla III <jijo@xxxxxxxxxxxxxxx>, Linux XFS Mailing List <linux-xfs@xxxxxxxxxxx>
In-reply-to: Message from Justin Tripp <justin@tripp.org> of "Tue, 08 May 2001 07:35:03 MDT." <Pine.BSF.4.33.0105080720100.51666-100000@rogue.tripp.org>
Sender: owner-linux-xfs@xxxxxxxxxxx
Justin, you are correct, you have found a scenario which xfs cannot always
cope with - in fact there is a deliberate panic in there. This will only
be triggered by the combination of NFS and local access to xfs - and probably
very heavy access at that. This is going to take some design work to fix.

Steve

> 
> I do not believe this is the problem.  For the RAID 5 failure to occur,
> one must run the raid in degraded mode (e.g. with a disk failed.)  My raid
> has not run in the degraded mode, so according to them, it is not a
> problem.  My own observations seem indicate that there may be a problem
> with XFS.
> 
> I have already been able to sucessfully backup the drives before I
> exported them via NFS and started running NNTP spools across the NFS
> exports.  It seems to be the way, I am using the drives that causes the
> failure.  If I start the NNTP daemons running across the file system
> (examining articles to see what ones exist and how old they are) as
> a backup is going, it seems to consistently fail.  The oops says that it
> occurs in xfsdump and the software trace seems to indicate it was the XFS
> code that attempted to dereference a NULL pointer.  So, I am led to
> believe that the problem may lie in XFS.  I could be convinced to
> re-arrange the array (since it currently has no real data anyway), but I
> currently do not believe that it will make a difference.
> 
> 
>                               .justin.
> 
> ------------------------------------------------------------------------
> Justin Leonard Tripp                                   justin@xxxxxxxxxx
> Configurable Computing Laboratory Research Assistant      CB 461 x8-7206
> Electrical and Computer Engineering Department  Brigham Young University
> 
> On Tue, 8 May 2001, Federico Sevilla III wrote:
> 
> > Hi, this is a little late and not particularly XFS-specific but ...
> >
> > > The machine is a Dual 500 MHz PIII, and the filesystems run on top of
> > > the 3ware IDE raid card with 4 46G disks running in RAID level 5.
> > > (138G filesystem available...)  The XFS is the 2.4.3 version from
> > > April 5th.
> >
> > 3Ware sent me an e-mail about a problem they found with their Escalade
> > 6400 controllers, RAID 5, and ext2. Their e-mail did not have conclusive
> > information on any other filesystem, but I'm thinking that XFS could be
> > affected, too. I "dirty hack" seems to be to mount the ext2 partition with
> > the sync option. Maybe XFS treats a disk in a similar manner, making XFS
> > "immune"?
> >
> > Anywa, until a patch comes out (expected somewhere around May 15) for
> > their software, RAID 5 looks iffy. Or does it?
> >
> > Here is a copy of the e-mail I got for whatever it's worth. :)
> >
> >  --> Jijo
> >
> > ---------- Forwarded message ----------
> > Date: Wed, 2 May 2001 09:24:15 -0700
> > From: Mike Wentz <mike.wentz@xxxxxxxxx>
> > Subject: 3ware Technical Bulletin concerning RAID 5
> >
> >              Important Technical Bulletin
> >
> >
> > Dear customer,
> >
> > You are receiving this email because you are a registered
> > owner of a 3ware Escalade 6400 Series Storage
> > Switch, or you have opened a Tech Support case with 3ware
> > on a 6400, 6410 or 6800 Series Escalade Storage Switch.
> >
> > 3ware has found a bug in our RAID 5 code that can cause file
> > system errors resulting in possible loss of data. This
> > problem has only been experienced on Linux operating systems
> > running the default ext2 file system, however, other file
> > systems may be affected.
> >
> > Products affected:           Escalade 6400, 6410 and 6800
> > Software Versions affected:  6.5 and 6.6
> > RAID level affected:         RAID 5
> > Operating systems affected:    
> >                   Known:     Linux with default ext2 file
> >                              system
> >                   Possible:  Win98/ME, WinNT, Win2000
> >
> > ************************************************************
> >
> > Symptoms:
> > 1. 3ware BIOS VERIFY command reports VERIFY Failed.
> > 2. Linux fsck -vf command returns inode and/or superblock
> >    errors when run while the array is in degraded mode.
> > 3. Windows chkdsk command returns errors on the MFT
> >    (Master File Table).
> >
> > ************************************************************
> >
> > Problem description:
> > 1. In Linux, the RAID 5 parity data can get corrupted during
> >    writes.
> > 2. On Microsoft operating systems, parity and/or user data
> >    can get corrupted during writes.
> >
> > ************************************************************
> >
> > Is my data corrupted?
> > There are two factors that determine whether or not your
> > data is affected:
> > 1. Whether or not the array has ever degraded
> > 2. Which operating system you are using
> >
> > If the array has never degraded, then the data should be
> > fine.
> >
> > If the array has degraded, then there is a potential that
> > your data is affected. In most cases, the operating system
> > will detect the errors and correct them. It is possible,
> > however, that the operating system cannot detect and correct
> > all the errors, in which case you will need to restore from
> > backup.
> >
> > To date, this problem has only been seen on Linux operating
> > systems running the ext2 default file system.
> >
> > 3ware has not yet found an instance where user data on a
> > Microsoft operating system has been affected, however, it is
> > theoretically possible.
> >
> > ************************************************************
> > I'm using RAID 5, what should I do?
> > Depending on your operating system and the state of the
> > array, the following actions are recommended:
> >
> > Linux:
> >
> > If the RAID 5 has never degraded there are two choices:
> > 1. Reconfigure the array to RAID 1 or RAID 10.
> > 2. Mount the file system in synchronous mode. Synchronous
> >    mode will prevent the parity data from being affected,
> >    thereby eliminating the problem. Please note that there
> >    is a significant performance penalty.
> >  
> >                   Command syntax:
> > mount -t ext2 /dev/device_node /mountpoint -o sync
> >
> > If the array has degraded run fsck to repair damaged data.
> > 1. If the repair completes successfully, you may either
> >    convert the array to RAID 1 or RAID 10, or run the file
> >    system in synchronous mode.
> > 2. If the repair is unsuccessful, you will need to restore
> >    from backup.
> >
> > Microsoft NT, 98, ME or 2000:
> >
> > If the RAID 5 array has never degraded you should
> > reconfigure the array to RAID 1 or RAID 10.
> >
> > If the array has degraded, run chkdsk /R to repair damaged
> > data.
> > 1. If the repair completes successfully, reconfigure the
> >    array to RAID 1 or RAID 10.
> > 2. If the repair is unsuccessful, you will need to restore
> >    from backup.
> >
> > ************************************************************
> >
> > What happens if I stay on RAID 5?
> > 1. 3ware does not recommend using RAID 5 on any Microsoft
> >    operating system at this time.
> > 2. If you are running Linux and wish to remain on RAID 5,
> >    3ware recommends you mount the file system in synchronous
> >    mode.
> >
> > ************************************************************
> >
> > When will a RAID 5 fix be available?
> > 1. 3ware has identified the cause of the problem and
> >    developed a fix. We are in the process of testing to
> >    insure that it completely corrects the problem. We
> >    anticipate having the fix available on or before
> >    May 15, 2001.
> > 2. If you would like to register to be notified when the
> >    fix is available please click here:   
> >    http://www.3ware.com/support/contact3wareraid5.asp
> >
> > 3ware regrets any inconvenience this problem may cause you.
> >
> > Sincerely
> > Michael L. Wentz
> > Director, Customer Service (650.269.2977)
> >
> >



<Prev in Thread] Current Thread [Next in Thread>