On Wed, Jan 13, 2010 at 07:11:27PM -0600, Steve Costaras wrote:
> Ok, I've been seeing a problem here since had to move over to XFS from
> JFS due to file system size issues. I am seeing XFS Data corruption
> under ?heavy io? Basically, what happens is that under heavy load
> (i.e. if I'm doing say a xfs_fsr (which nearly always triggers the
> freeze issue) on a volume the system hovers around 90% utilization for
> the dm device for a while (sometimes an hour+, sometimes minutes) the
> subsystem goes into 100% utilization and then freezes solid forcing me
> to do a hard reboot of the box.
xfs_fsr can cause a *large* amount of IO to be done, so it is no
surprise that it can trigger high load bugs in hardware and
software. XFS can trigger high load problems on hardware more
readily than other filesystems because using direct IO (like xfs_fsr
does) it can push far, far higher throughput to the starge subsystem
than any other linux filesystem can.
The fact that the IO subsystem is freezing at 100% elevator queue
utilisation points to an IO never completing. This immediately makes
me point a finger at either the RAID hardware or the driver - a bug
in XFS is highly unlikely to cause this symptom as those stats are
generated at layers lower than XFS.
Next time you get a freeze, the output of:
# echo w > /proc/sysrq-trigger
will tell use what the system is waiting on (i.e. why it is stuck)
> Since I'm using hardware raid w/ BBU when I reboot and it comes back up
> the raid controller writes out to the drives any outstanding data in
> it's cache and from the hardware point of view (as well as lvm's point
> of view) the array is ok. The file system however generally can't be
> mounted (about 4 out of 5 times, some times it does get auto-mounted but
> when I then run an xfs_repair -n -v in those cases there are pages of
> errors (badly aligned inode rec, bad starting inode #'s, dubious inode
> btree block headers among others). When I let a repair actually run
> in one case out of 4,500,000 files it linked about 2,000,000 or so but
> there was no way to identify and verify file integrity. The others were
> just lost.
> This is not limited to large volume sizes I have seen similar on small
> ~2TiB file systems as well. Also when it happened in a couple cases the
> file system that was taking the I/O (say xfs_fsr -v /home ) another XFS
> filesystem on the same system which was NOT taking much if any I/O gets
> badly corrupted (say /var/test ). Both would be using the same areca
> controllers and same physical discs (same PV's and same VG's but
> different LV's).
These symptoms really point to a problem outside XFS - the only time
I've seen this sort of behaviour is on buggy hardware. The
cross-volume corruption is the smoking gun, but proving it is damn
near impossible without expensive lab equipment and a lot of time.
> Any suggestions on how to isolate or eliminate this would be greatly
I'd start by not running xfs_fsr as a short term workaround to keep
the load below the problem threshold.
Looking at the iostat output - the volumes sd[f-i] all lock up at
100% utilisation at the same time. Then looking at this:
> LVM is using as it's base physical volumes 8 hardware raids
> (MediaVol00-70 inclusive):
> [ 175.320738] ARECA RAID ADAPTER4: FIRMWARE VERSION V1.47 2009-07-16
> [ 175.336238] scsi4 : Areca SAS Host Adapter RAID Controller( RAID6
> [ 175.356231] ARECA RAID ADAPTER5: FIRMWARE VERSION V1.47 2009-10-22
> [ 175.376144] scsi5 : Areca SAS Host Adapter RAID Controller( RAID6
You've got 4 luns on each controller, and it looks like all the luns
on one controller have locked up. Everything is pointing at the
raid controller as being the problem....