Hi.
First thanks for your answers so far!
Yippie...
And I know, that I did the rest in forcing the mkraid command!
But just for the future - cause I still like RAIDs - only this IBM
harddisc modell "DTLA 307045" (or something like this)...
Is total crap - In half a year I now have 5 broken harddiscs from this
serie!
Just for the future:
1.)
I have to check the /proc/mdstats to always see if all harddiscs are
running or if the raid is in degraded mode (means that one harddisc
is not in use - because of errors, or???) ???
2.)
What schould I had done at the point of the failure of the second disc
to rescue my data???
Because the raidhotadd thing is only possible if one harddisc fails!
3.)
To backup my raid-data nevertheless it is a raid5 - but what is a good,
safe and fast method to backup 130GB of data???
And - because it is my private raid and I am a poor student - a most
likely a cheap method to backup...??!
Greetings,
Knuth Posern.
---
On Tue, 16 Oct 2001, Seth Mos wrote:
> At 02:47 16-10-2001 +0200, Knuth Posern wrote:
> >Hi.
> >
> >I have (or had?!) a sotware RAID-5 with the following /etc/raidtab
>
> Short answer: You had.
>
> >I built the raid half a year ago. Formatted it with XFS (and a 2.4.5er
> >kernel). At the moment I use 2.4.10-xfs.
> >
> >The machine is a debian-unstable linux-server.
> >
> >The following happened to me:
> >
> >While I was playing an mp3-file on a console I got the following kernel
> >message(s) bumped into the console:
> >___________________________________________________________________________
> >hde: dma_intr: status=0x51 { DriveReady SeekComplete Error }
> >hde: dma_intr: error=0x40 { UncorrectableError }, LBAsect=56410433,
> >sector=56410368
> >end_request: I/O error, dev 21:01 (hde), sector 56410368
> >raid5: Disk failure on hde1, disabling device. Operation continuing on 2
> >devices
> >md: recovery thread got woken up ...
> >md0: no spare disk to reconstruct array! -- continuing in degraded mode
> >md: recovery thread finished ...
> >md: updating md0 RAID superblock on device
> >md: hdi1 [events: 000000de](write) hdi1's sb offset: 45034816
> >md: hdg1 [events: 000000de](write) hdg1's sb offset: 45034816
> >md: (skipping faulty hde1 )
> >XFS: device 0x900- XFS write error in file system meta-data block 0x40 in
> >md(9,0)
> >XFS: device 0x900- XFS write error in file system meta-data block 0x40 in
> >md(9,0)
> >XFS: device 0x900- XFS write error in file system meta-data block 0x40 in
> >md(9,0)
> >XFS: device 0x900- XFS write error in file system meta-data block 0x40 in
> >md(9,0)
> >XFS: device 0x900- XFS write error in file system meta-data block 0x40 in
> >md(9,0)
>
> This should not happen with a md raid5. This means corruption. Normally
> when a disk fails in a md raid 1/5 set the OS is unaffected. XFS or not.
>
> This is the first alarming sign.
>
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >I then switched to runlevel 0:
> >____________________________________________________________________________
> >Give root password for maintenance
> >(or type Control-D for normal startup):
> >jolie:~# umount /raid
>
> You should have had errors in your log before unmounting is what my
> intuition says.
>
> >xfs_unmount: xfs_ibusy says error/16
> >XFS unmount got error 16
> >linvfs_put_super: vfsp/0xdf467520 left dangling!
> >VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice
> >day...
> >jolie:~# mount
> >/dev/hda3 on / type ext2 (rw,errors=remount-ro,errors=remount-ro)
> >proc on /proc type proc (rw)
> >devpts on /dev/pts type devpts (rw,gid=5,mode=620)
> >/dev/md0 on /mnt/raid type xfs (rw)
> >jolie:~# lsof
> >...
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >The lsof did NOT show any open files on /mnt/raid.
> >So I tried again to unuount /mnt/raid:
> >_____________________________________________________________________________
> >jolie:~#
> >jolie:~# umount /mnt/raid
> >umount: /mnt/raid: not mounted
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >But now it was unmounted already?!
>
> It was unmounted, with errors that is. That message you got with inodes
> left dangling was the result from a unclean unmount.
>
> >So I tried to mount it...
>
> Bad idea, i would have run xfs_repair first.
>
> >jolie:~#
> >jolie:~# mount /mnt/raid
> >XFS: SB read failed
> >I/O error in filesystem ("md(9,0)") meta-data dev 0x900 block 0x0
> > ("xfs_readsb") error 5 buf count 512
> >mount: wrong fs type, bad option, bad superblock on /dev/md0,
> > or too many mounted file systems
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >I rebooted the computer - and got the following during bootup:
>
> <snip
>
> >XFS: SB read failed
> >I/O error in filesystem ("md(9,0)") meta-data dev 0x900 block 0x0
> > ("xfs_readsb") error 5 buf count 512
> >mount: wrong fs type, bad option, bad superblock on /dev/md0,
> > or too many mounted file systems
> > (could this be the IDE device where you in fact use
> > ide-scsi so that sr0 or sda or so is needed?)
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>
> This means it at least needs repair. Sometimes xfs_repair can recover a
> secondary superblock.
>
> >I logged in and I edited the /etc/raidtab to have a SPARE-DISC:
>
> Don't do that.
>
> >I connected an identical harddrive (like the other raid-harddiscs) as
> >/dev/hdc.
>
> Course of action on a failed disk.
> Power off the box.
> Remove the disk from hde.
> Insert the new disk to hde.
> Power on the box.
> raidhotadd hde.
>
> >And rebooted again. - without any changes.
> >
> >I then read in the Software-RAID-Howto (from January 2000) to just remove
> >the faulty drive and instead connect a new drive.
>
> Correct.
>
> >So I connected the harddisc from /dev/hdc on /dev/hde (and edited the
> >/etc/raidtab to be as it was before (without spare-disks)!).
>
> Don't touch the raidtab file when something goes wrong. Unless you really
> know what you are doing it will make things worse then it was.
>
> >And rebooted.
> >
> >md0 didn't start the array - because /dev/hde is 0K big (or something like
> >that).
> >That was because I had forgotten to built a partion on /dev/hde - so I
> >built the one partition (as on the other raid-drives too).
>
> That is not fatal, it happened to me once as well.
>
> >And rebooted again - But md0 had an "Failed autostart of /dev/md0" again.
>
> That is normal. It does note rebuild fully automatically. You have to
> instruct it yourself.
>
> >And the Software-RAID-Howto told me to "raidhotadd /dev/md0 /dev/hde1".
>
> correct.
>
> >Which I tried but it said somehting like: "/dev/md0 - no such raid is
> >running".
>
> what did /proc/mdstat tell?
>
> >So I tried to get /dev/md0 RUNNING again.
> >
> >In 6.1 of the Software-Raid-Howto there was something with "mkraid
> >/dev/md0 --force".
>
> DON'T DO THIS UNLESS YOU ARE BUILDING THE ARRAY!
>
> >So I tried:
>
> <snip>
>
> >And tried again to hotadd the /dev/hde1:
>
> You just remade the md0 array which means the disks will be syncing.
>
> >___________________________________________________________________________
> >jolie:~# raidhotadd /dev/md0 /dev/hde1
> >md: trying to hot-add hde1 to md0 ...
> >/dev/md0: can not hot-add disk: disk busy!
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >Then I checked /proc/mdstat.
> >Where it said about reconstructing - which sounds hopefully good... ?!
>
> It means you're data is gone.
>
> >But:
> >___________________________________________________________________________
> >jolie:~# mount /mnt/raid
> >XFS: bad magic number
> >XFS: SB validate failed
> >mount: wrong fs type, bad option, bad superblock on /dev/md0,
> > or too many mounted file systems
> >^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >So I just rebooted again - in the hope that the raid-autostart during boot
> >time would bring some new/other results. But a mount /mnt/raid gives still
> >the same results!!!???
>
> It just synced random parts of the other disks and constructed parity out
> of that.
>
> >What can I do? - Is my data lost? - if so: Is there ANY CHANCE to get at
> >least SOME of it BACK SOMEHOW (it doesnt matter how difficult)!?
>
> No. :-(
>
> >???
> >
> >Help would be VERY, VERY, VERY apreciated!!!
>
> I am very afraid that I can not help you anymore.
>
> You can try xfs_repair and see if it shows up anything or repair anything
> at all but I don't have high hopes.
>
> Cheers
> --
> Seth
> Every program has two purposes one for which
> it was written and another for which it wasn't
> I use the last kind.
>
|