xfs
[Top] [All Lists]

Re: xfs_force_shutdown called on hardware RAID5+0 XFS filesystem

To: slaton <slaton@xxxxxxxxxxxxxxxx>
Subject: Re: xfs_force_shutdown called on hardware RAID5+0 XFS filesystem
From: Eric Sandeen <sandeen@xxxxxxx>
Date: Thu, 16 Sep 2004 11:39:53 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <Pine.SOL.4.61.0409151613490.12964@conquest.OCF.Berkeley.EDU>
References: <Pine.SOL.4.61.0409151613490.12964@conquest.OCF.Berkeley.EDU>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.7.3 (X11/20040803)
This is a scsi hardware problem, not an xfs bug. XFS shuts down when the underlying hardware goes screwy, to avoid (further) corruption. XFS is working exactly as intended in this case.

-Eric

slaton wrote:
We noticed that NFS mounts from the fileserver had gone stale this morning. These correspond to two hardware RAID LUNs (info below). I logged into the fileserver and found that the mountpoints were dead as well, even though according to mount they were still there. Checked the kernel log and found a whole slew of SCSI errors had started shortly after 4am (hmm, cron-time) and then continued when a user showed up to work, culminating in an xfs_force_shutdown of the filesystem at 9am. Which of course triggered a whole slew of further I/O errors.

After rebooting (with NFS shares disabled), the two RAID volumes mounted as clean. xfs_check found no errors and exited silently. The data appears to be there, although I haven't run anything to generate much file I/O, and haven't yet re-opened the NFS shares.

Should I upgrade to a new kernel and XFS release before investigating this further? System info and some kernel log excerpts are below; the full kernel log (events related to this) can be downloaded from http://cryoem.berkeley.edu/~slaton/kernel.040915.scsicrash.gz

thanks,
slaton

system info:

hardware: dual 32-bit Xeon system
OS: Red Hat Linux 8.0
kernel: custom 2.4.19 kernel compiled with SGI XFS 1.2pre5
kernel args: max_scsi_luns=255
host adapter: Adaptec 29160
RAID volume: 3.7 TB hardware RAID5+0 box, SATA drives, SCSI system interface,
divided into two LUNs of 2.0 and 1.7 TB size.


kernel log excerpts:

scsi1:0:3:0: Attempting to queue an ABORT message
scsi1: Dumping Card State while idle, at SEQADDR 0x8
DevQ(0:3:0): 0 waiting
DevQ(0:3:1): 0 waiting
scsi1:A:3: parity error detected in DT Data-in phase. SEQADDR(0x1a2) SCSIRATE(0x0)
^IUnexpected non-DT Data Phase
scsi1:0:3:0: Attempting to queue an ABORT message
scsi1: Dumping Card State in Message-in phase, at SEQADDR 0x168
scsi1:0:3:0: Cmd aborted from QINFIFO
aic7xxx_abort returns 0x2002
scsi: device set offline - not ready or command
retry failed after bus reset: host 1 channel 0 id 3 lun 0
SCSI disk error : host 1 channel 0 id 3 lun 0 return code = 70002
I/O error: dev 08:11, sector 671088736
I/O error in filesystem ("sd(8,17)") meta-data dev 0x811 block 0x28000060^I ("xfs_trans_read_buf") error 5 buf count 4096
EFSCORRUPTED returned from file xfs_ialloc.c line 1313
last message repeated 29 times
xfs_btree_check_sblock: Not OK:
magic 0x3a0eb8a5 level 47532 numrecs 50791 leftsib -1188756534 rightsib -1171161293
nfsd: non-standard errno: -990
xfs_force_shutdown(sd(8,17),0x2) called from line 957 of file xfs_log.c.
Return address = 0xf8bc4b2f
Log I/O Error Detected. Shutting down filesystem: sd(8,17)
Please umount the filesystem, and rectify the problem(s)
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 64
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 72
I/O error in filesystem ("sd(8,33)") meta-data dev 0x821 block 0x40^I ("xfs_trans_read_buf") error 5 buf count 8192
XFS unmount got error 5
linvfs_put_super: vfsp/0xc28df640 left dangling!
VFS: Busy inodes after unmount. Self-destruct in 5 seconds. Have a nice day...
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 0
XFS: bad magic number
XFS: SB validate failed
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 0
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 1
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 2
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 3
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 4
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 5
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 6
SCSI disk error : host 1 channel 0 id 3 lun 1 return code = 70002
I/O error: dev 08:21, sector 7




<Prev in Thread] Current Thread [Next in Thread>