xfs
[Top] [All Lists]

Re: "Corruption of in-memory data"

To: Wessel Dankers <wsl@xxxxxxxxxxxx>
Subject: Re: "Corruption of in-memory data"
From: Sidik Isani <lksi@xxxxxxxxxxxxxxx>
Date: Fri, 24 May 2002 09:51:50 -1000
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20020522110606.GI633@fruit.eu.org>; from wsl@fruit.eu.org on Wed, May 22, 2002 at 01:06:06PM +0200
References: <20020520090515.B18897@cfht.hawaii.edu> <1021923461.4832.335.camel@jen.americas.sgi.com> <20020520102853.C18897@cfht.hawaii.edu> <20020522110606.GI633@fruit.eu.org>
Sender: owner-linux-xfs@xxxxxxxxxxx
User-agent: Mutt/1.2.5i
On Wed, May 22, 2002 at 01:06:06PM +0200, Wessel Dankers wrote:
> On 2002-05-20 10:28:54-1000, Sidik Isani wrote:
> >   1. Partitioned 6 13GB IDE disks as ~12.9x6 raid5 plus a small raid1
> >      on the first two disks for the log.
> >   2. Rebooted to be sure partition tables were seen correctly.
> >   3. mkraid /dev/md0 (the raid5 ... resyncing began)
> >   4. mkraid /dev/md1 (the raid1 ... resyncing delayed because on same disk)
> >   5. mkfs.xfs -o logdev=/dev/md1 /dev/md0
> >   6. mounted it and untarred a kernel tree (notably faster than 2.4.16!)
> >   7. resyncing continues until about half way through it finds a bad sector
> >      on *one* of the disks and switches to degraded mode.  No IDE errors
> >      after that one.
> >   8. Rebooted (wanted to see if it came back still in degraded mode. It 
> > did.)
> >   9. mounted again and tried to rm -rf the linux kernel tree... crash.
> 
> If the disk error occurred WHILE syncing, the raid5 drivers may not have
> had time yet to write the parity information, and when in degraded mode may
> have tried to restore data from unitialized parity data. This would explain
> why the disk is so drastically garbled in places.

  We have not been able to run 2.4 or 2.2.20+ SMP kernels reliably on our 
  uniprocessor machines, while 2.2.16 and 2.2.19 never show a problem. 
  Kernel developers claim the BIOS is at fault.  Anyway, I rebooted with 
  "noapic" (which seems to help but I'm not 100% convinced) and we ran 
  some more tests that indicate both raid5 and XFS are working fine, even 
  when a disk error is detected during the resync. 
 
  Eventually one of my tests caused the troublesome sector to be 
  re-written, which causes the drive to remap that sector and recover. 
  Before that happened, I managed to run a test which wrote a pseudo- 
  random number sequence to both the raid1 and raid5 during resyncing 
  (and across the time the raid decides to go into degraded mode) and 
  it all read back perfectly. 
 
  A repeat of the test with XFS involved showed no problems either. 
  So I think there is just general flakiness with running an SMP kernel 
  which XFS was the first to detect.  Thanks for the help.  I got the
  RAID running now, and will post some statistics I got on XFS performance
  next...

Be seeing you,

- Sidik


<Prev in Thread] Current Thread [Next in Thread>