On Wed, May 22, 2002 at 01:06:06PM +0200, Wessel Dankers wrote:
> On 2002-05-20 10:28:54-1000, Sidik Isani wrote:
> > 1. Partitioned 6 13GB IDE disks as ~12.9x6 raid5 plus a small raid1
> > on the first two disks for the log.
> > 2. Rebooted to be sure partition tables were seen correctly.
> > 3. mkraid /dev/md0 (the raid5 ... resyncing began)
> > 4. mkraid /dev/md1 (the raid1 ... resyncing delayed because on same disk)
> > 5. mkfs.xfs -o logdev=/dev/md1 /dev/md0
> > 6. mounted it and untarred a kernel tree (notably faster than 2.4.16!)
> > 7. resyncing continues until about half way through it finds a bad sector
> > on *one* of the disks and switches to degraded mode. No IDE errors
> > after that one.
> > 8. Rebooted (wanted to see if it came back still in degraded mode. It
> > did.)
> > 9. mounted again and tried to rm -rf the linux kernel tree... crash.
>
> If the disk error occurred WHILE syncing, the raid5 drivers may not have
> had time yet to write the parity information, and when in degraded mode may
> have tried to restore data from unitialized parity data. This would explain
> why the disk is so drastically garbled in places.
We have not been able to run 2.4 or 2.2.20+ SMP kernels reliably on our
uniprocessor machines, while 2.2.16 and 2.2.19 never show a problem.
Kernel developers claim the BIOS is at fault. Anyway, I rebooted with
"noapic" (which seems to help but I'm not 100% convinced) and we ran
some more tests that indicate both raid5 and XFS are working fine, even
when a disk error is detected during the resync.
Eventually one of my tests caused the troublesome sector to be
re-written, which causes the drive to remap that sector and recover.
Before that happened, I managed to run a test which wrote a pseudo-
random number sequence to both the raid1 and raid5 during resyncing
(and across the time the raid decides to go into degraded mode) and
it all read back perfectly.
A repeat of the test with XFS involved showed no problems either.
So I think there is just general flakiness with running an SMP kernel
which XFS was the first to detect. Thanks for the help. I got the
RAID running now, and will post some statistics I got on XFS performance
next...
Be seeing you,
- Sidik
|