[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: XFS behaviour on "bad" disks.
Quoting Vincent Bernat <bernat@free.fr>:
> I am currently using xfs on SCSI disks and I have some problem with the SCSI
> card : my box crashed several time under heavy condition. I am investigating
> the problem and I have a proper backup of all my data.
Just a thought:
I'm using a 3ware Escalade 6400 and not just once but at least twice and maybe thrice (my brain logs are screwed up) my system running Linux 2.4.7 with XFS froze. Not even the Magic SysRq key combinations would unfreeze the darned thing. The only solution would be to reboot (cold).
100% of the time this happened was when a single faulty drive among a total of four in my RAID5 system hit a bad sector causing the 3ware card to mark that drive as "offline". For some reason this has been messing things up.
It is unfortunate that this is a server, and us being an SME in a 3rd world country, is the only one with such "advanced" a controller as 3ware's (we got it because we couldn't afford a SCSI RAID system but needed RAID). Thus I cannot test how the system would fare if it were using ext2, ext3, ReiserFS or JFS, and cannot conclude whether it's XFS-specific, or if it's an issue with the controller.
Dan Yocum has been doing some work, though, basically doing really agressive tests from what I gather. It looks like he's noticing issues with NFS, and in his case, user space. I don't use user space NFS, but have support for v3 kernel NFS built into my kernel.
The last time my system hung up I had the nfs daemons off (including portmap). I did this hoping I could isolate the issue. Unfortunately I cannot permanently remove NFS because I am under pressure to get file sharing for a number of Linux boxes and I do this via NFS. I do not know if there is any other decent alternative to share /home complete with proper permissions. Samba's great, but it's just not designed to do such things. I don't know how Coda is, or AFS, or what have you. Maybe someone else on the list more authoritative can let us know.
In the meantime I am hoping my hard drives will hold up. I'm also upgrading to 2.4.9 (thanks Steve for that wonderful TAKE) as soon as I can take the server offline for a short while. Hopefully while keeping up with updates I'll be able to narrow down the issue to find out which darned part of the system is causing the fatal freezes.
--> Jijo
--
Federico Sevilla III :: jijo@leathercollection.ph
Network Administrator :: The Leather Collection, Inc.
GnuPG key: <http://www.leathercollection.ph/jijo.gpg>