xfs
[Top] [All Lists]

Re: xfs_force_shutdown

To: linux-xfs@xxxxxxxxxxx
Subject: Re: xfs_force_shutdown
From: Klaus Rein <k.rein@xxxxxxxxx>
Date: Tue, 26 Jun 2001 19:53:51 +0200
Organization: levigo systems gmbh
References: <200106221820.f5MIKEi01932@xxxxxxxxxxxxxxxxxxxx>
Sender: owner-linux-xfs@xxxxxxxxxxx
> Forced shutdown is used when an I/O error is reported by the
> underlying layers. If everything else is working correctly it
> usually means there is a hardware problem.

I had already assumed that, but wanted to know for sure. The 
hardware is ok.

> On Linux there is a chance you have been bitten by a compiler whih
> does not like the xfs code. 

# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-redhat-linux/2.96/specs
gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81)

We never had similar problems without the soft raid. At least not
with cvs snapshots after the 1.0-Release. But at that point we only
had one server machine, no shared SCSI and no heartbeat (see below).

> I suppose with Raid 1 you may have
> stumbled across a case where the raid reconstruction and xfs got in each
> other's way. Is there any way you can tell if reconstruction was in
> progress?

It was.

The setup a little more in detail:

There are two SCSI disks in an external box, connected via SCSI 
with two server machines and heartbeat running on both of them.
The filesystem is xfs with raid1 (md-tools) exported via nfs.
The heartbeat script starts and stops nfs, the raid and does
the (un)loading of the corresponding kernel modules. Two clients
mount the exported filesystem and try to stress it. The system
disks of the servers are IDE.

The power of the primary (as defined in heartbeat) server is 
controlled from a timeswitch (2h on, 2h off).

What I do not understand (and I'm not sure if it is a xfs or a
md/vfs/etc problem):

- the primary server gets switched off
- when the secondary server takes the disks the raid seems to
  be ok and never gets sync'ed (takes about 30 minutes)
- after the primary server is back up again two hours later 
  the raid is always sync'ed [1]

I would have expected it the other way round.

Klaus.

[1] At this point the failure described in my previous posting
occured while the 'test cycle' had been running for a few days
without any problem.

<Prev in Thread] Current Thread [Next in Thread>