xfs
[Top] [All Lists]

Re: "Corruption of in-memory data"

To: Steve Lord <lord@xxxxxxx>
Subject: Re: "Corruption of in-memory data"
From: Sidik Isani <lksi@xxxxxxxxxxxxxxx>
Date: Mon, 20 May 2002 10:28:54 -1000
Cc: Sidik Isani <lksi@xxxxxxxxxxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <1021923461.4832.335.camel@jen.americas.sgi.com>; from lord@sgi.com on Mon, May 20, 2002 at 02:37:41PM -0500
References: <20020520090515.B18897@cfht.hawaii.edu> <1021923461.4832.335.camel@jen.americas.sgi.com>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
On Mon, May 20, 2002 at 02:37:41PM -0500, Steve Lord wrote:
> On Mon, 2002-05-20 at 14:05, Sidik Isani wrote:
> > Hello -
> > 
> >   Thanks again for your help several months ago with xfs_growfs!
> >   Now we have a new problem . . .
> >   I was trying to resolve the slow performance issue of some of our
> >   RAID5+XFS (2.4.16 kernel) by upgrading to 2.4.18 and XFS-1.1, and
> >   reformatting with an external log, as suggested in the FAQ.
> > 
> >   During resyncing, one of the disks failed and the raid 5 went into
> >   degraded mode (no other disks had errors).  After a clean reboot, still
> >   running in degraded mode, (shouldn't matter to XFS, but I thought I'd
> >   mention it) everything seemed OK until I tried to remove a directory:
> > 
> 
> Just from a quick scan, all your corruption is localized to two parts of
> the volume, these are around the headers for allocation groups 14 and
> 17. I do not know if these areas will map onto the failed disk, but it

  It's a software raid5, and only one of 6 disks failed.  To XFS,
  the device should have been completely functional.

> looks something like that. Destruction in these areas also appears
> pretty drastic.

  Oh, there's no worry about the contents of this filesystem.  I was
  actually trying to do the benchmarks on performance that Seth
  mentioned had not been done.  If I get them done, I'll post the
  results here.  I'd just really like to understand what happened.
  If raid5 is to blame here, there isn't much point in using it!
  Any suggestions on the best way to narrow it down?

  Maybe it is worth starting over, with the bad disk still in there,
  to see if it happens again.  The sequence was:

  1. Partitioned 6 13GB IDE disks as ~12.9x6 raid5 plus a small raid1
     on the first two disks for the log.
  2. Rebooted to be sure partition tables were seen correctly.
  3. mkraid /dev/md0 (the raid5 ... resyncing began)
  4. mkraid /dev/md1 (the raid1 ... resyncing delayed because on same disk)
  5. mkfs.xfs -o logdev=/dev/md1 /dev/md0
  6. mounted it and untarred a kernel tree (notably faster than 2.4.16!)
  7. resyncing continues until about half way through it finds a bad sector
     on *one* of the disks and switches to degraded mode.  No IDE errors
     after that one.
  8. Rebooted (wanted to see if it came back still in degraded mode. It did.)
  9. mounted again and tried to rm -rf the linux kernel tree... crash.

  Only a second disk failing should have caused any data loss, but this
  was not the case here.

Be seeing you,

- Sidik


<Prev in Thread] Current Thread [Next in Thread>