On Tue, 6 Jan 2004, Rainer Krienke wrote:
> I think originally after I had started the machines (from power off) the
> filesystem on server1 (that later on could be repaired) was mounted but not
> accessible. If I tried to list the directory and ls reported an I/O error.
Maybe the filesystem had encountered an error and shut down at this
point. Anything in the logs?
> So
> I unexported it, unmounted it and the tried to run xfs_repair which reported
> that there was still a log on the filesystem that should be replayed by
> mounting the filesystem again or using xfs_restore -L. So I tried to mount it
> again and now mount said, that either there are too many mounts, wrong
> filesystem type or invalid superblock.
At that point you need to look for the specific failure message from
the kernel, either dmesg or /var/log/messages, to know what really
happened.
> > Sounds like the filesystem shut down due to some error, can you check
> > your logs? In fact checking your logs in general might be useful
> > here, I wonder if there is anything else going on.
>
> One the first machine (server1) I found a sequence of messages like the log
> attached to this mail. But this message was generated upon startup after
> powerfail not before. Before the power failure there is nothing xfs related
> in the logs.
Ok, I'll take a look...
> > Can you convince it to dump a corefile? We could then have a better
> > idea of what's going wrong.
>
> Yes I ran xfs_repair once again (perhaps the 5th time) on the corrupted
> filesystem on server2 and lifted the ulimit for core dumps. It produced one.
> You can download it under
>
> http://www.uni-koblenz.de/~krienke/core-xfs_repair.gz
Can you put your xfs_repair binary there as well?
> Thanks for these explanations. Perhaps the buffer in the hardware raids that
> are used as basis for data storage are to blame (IFT 7250 Raid (Level 5),
> with 12 160GB IDE disks inside). I'll try to find out if this cache is read
> only or read/write.
Ah, and if the hardware raid does write caching, and it's not battery
backed, then you -could- have inconsistency problems. if the raid
claims that something is written when in fact it is only cached,
then there is a chance that the fs+log is inconsistent in the
case of a power failure.
-Eric
|