xfs
[Top] [All Lists]

Re: raid5 resync aborted under heavy XFS use

To: Chris Bednar <cjb@xxxxxxxxxxxxxxxxxxxxxxxxx>
Subject: Re: raid5 resync aborted under heavy XFS use
From: Steve Lord <lord@xxxxxxx>
Date: Mon, 30 Jul 2001 08:51:05 -0500
Cc: Seth Mos <knuffie@xxxxxxxxx>, Linux XFS Mailing List <linux-xfs@xxxxxxxxxxx>
In-reply-to: Message from Chris Bednar <cjb@xxxxxxxxxxxxxxxxxxxxxxxxx> of "Sun, 29 Jul 2001 15:32:28 CDT." <Pine.LNX.4.10.10107291519140.11670-100000@xxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
> > there are some slight problems with xfs over software raid5 that have just 
> > been fixed in the CVS tree. There were also IO stall problems with this 
> > setup when you have a internal log. Basically you also want to make the log
>  
> > an external log which has a large performance boost.
> > 
> > Search the archive for discussion about this. 1 week ago it was discussed 
> > and it has some bencmarks to back it up.
> 
>     I've seen them. For some reason, I'm suspicious that this
> is a different problem.
> 
>     An external log is scary to me;  I can afford lackluster
> write performance easier than I can afford:
> 
>   ``You know that $16k RAID setup you bought? it's gone to Hell
>     because the one disk I was using for the log croaked.''
> 
> or:
> 
>   ``You know that big RAID system? Well, I moved it from one 
>     machine to another, and now it won't work.''
> 
> In my opinion, an internal log has to work reasonably well for
> XFS to be viable. It's fine, of course, if an external log works
> better, and I don't mind doing that on my own systems.
> 
>     I'll take a shot at CVS... by the way, is an internal log
> also a performance issue on IRIX systems?


Looks like there has been some 'discussion' over the weekend. There have
been two bugs with the raid5 code, one of which is in the 1.0.1 release,
the other was not.

 o There was a stall problem where the raid would just grind to a halt,
   this was fixed by a kernel change in the 2.4.7-pre series. This was
   in the 1.0.1 release and would exhibit itself as a filesystem which
   just ground to a halt.

 o The second was a problem writing out an internal log. The log was 
   not getting written correctly on raid5 devices, which meant you
   could not always mount the filesystem again without running
   xfs_repair. This problem was introduced and fixed in the cvs tree
   after the 1.0.1 release.

As for internal vs external log. XFS has always had support for an external
log, but on Irix the volume managers support building a volume with
separate 'sub-volumes' which can have different layout characteristics
from each other. Internally within the XFS code this is identical to the
external log device on Linux, the only difference being that on Linux
the log device must be specified by the administrator making and mounting
the filesystem.

As for raid5 log performance, the log is written in chunks of upto 32Kbytes,
but these chunks can be any 512 byte multiple long, and start on any 512
byte boundary. This does not appear to be well treated by the raid5 code
and results in less than optimal performance. A raid1 external log works
much better and is still safe against a single failure. This appears to
be an interaction with the linux raid5 software, Irix does not have software
raid5, it always uses hardware for raid combinations which include
parity.

Can try the 2.4.7 patch which is on the ftp site, it has fixes to
both raid5 problems. 

ftp://oss.sgi.com/projects/xfs/download/patches/patch-2.4.7-xfs-2001-07-27.bz2

It may be that your problem is still there and is being caused by the 
layout of the xfs logwrites, in which case the only fix may be to move
to an external log until we can do something about the alignment of the
log writes.

Steve



> 
> 
> ----
> Chris J. Bednar   <http://optics.tamu.edu/~bednar/>
> Director, Distributed Computing Product Group
> http://AdvancedDataSolutions.com/



<Prev in Thread] Current Thread [Next in Thread>