Quoting Vincent Bernat <bernat@xxxxxxx>:
> I am currently using xfs on SCSI disks and I have some problem with the SCSI
> card : my box crashed several time under heavy condition. I am investigating
> the problem and I have a proper backup of all my data.
Just a thought:
I'm using a 3ware Escalade 6400 and not just once but at least twice and maybe
thrice (my brain logs are screwed up) my system running Linux 2.4.7 with XFS
froze. Not even the Magic SysRq key combinations would unfreeze the darned
thing. The only solution would be to reboot (cold).
100% of the time this happened was when a single faulty drive among a total of
four in my RAID5 system hit a bad sector causing the 3ware card to mark that
drive as "offline". For some reason this has been messing things up.
It is unfortunate that this is a server, and us being an SME in a 3rd world
country, is the only one with such "advanced" a controller as 3ware's (we got
it because we couldn't afford a SCSI RAID system but needed RAID). Thus I
cannot test how the system would fare if it were using ext2, ext3, ReiserFS or
JFS, and cannot conclude whether it's XFS-specific, or if it's an issue with
the controller.
Dan Yocum has been doing some work, though, basically doing really agressive
tests from what I gather. It looks like he's noticing issues with NFS, and in
his case, user space. I don't use user space NFS, but have support for v3
kernel NFS built into my kernel.
The last time my system hung up I had the nfs daemons off (including portmap).
I did this hoping I could isolate the issue. Unfortunately I cannot permanently
remove NFS because I am under pressure to get file sharing for a number of
Linux boxes and I do this via NFS. I do not know if there is any other decent
alternative to share /home complete with proper permissions. Samba's great, but
it's just not designed to do such things. I don't know how Coda is, or AFS, or
what have you. Maybe someone else on the list more authoritative can let us
know.
In the meantime I am hoping my hard drives will hold up. I'm also upgrading to
2.4.9 (thanks Steve for that wonderful TAKE) as soon as I can take the server
offline for a short while. Hopefully while keeping up with updates I'll be able
to narrow down the issue to find out which darned part of the system is causing
the fatal freezes.
--> Jijo
--
Federico Sevilla III :: jijo@xxxxxxxxxxxxxxxxxxxx
Network Administrator :: The Leather Collection, Inc.
GnuPG key: <http://www.leathercollection.ph/jijo.gpg>
|