Well...my hardware is as follows...
Dual Intel Pentium III 500 III processors
512mb PC133 (2x256dimms)
Gigabyte motherboard GA-6BX7
Highpoint Rocket RAID 404 IDE Controller card
2x250gb Maxtor drives
2x120gb Maxtor drives
(each drive in on the 4 primary channels on the Highpoint card
12gb drive for linux/Swap etc.
The problems were occurring when a file was being downloaded from the
machine at roughly 75KB/s, not exactly intense use.
I do have an old 2.4.21 kernel, which didn't give me any problems. I will
replace my current kernel with this tomorrow to see if the errors go away.
If it does turn out that this kernel fixes the issues then a hardware fault
seems unlikely perhaps?
Failing that and the kernel doesn't fix it Ill do the nasty process of
scanning 4 large drives with a low-level scan using the Maxtor utility.
Cheers,
Tom
-----Original Message-----
From: Seth Mos [mailto:seth.mos@xxxxxxxxx]
Sent: 30 June 2004 14:42
To: Tom
Subject: Re: LVM and XFS problems - xfs_force_shutdown
Tom wrote:
> Hi Seth,
>
> Well I did do a badblocks test on all of the drives twice now, and nothing
> showed up, Im assuming that my drives are ok.
That's not always the case unfortunately. I've got one of 'those' disks
in my drawer actually which I use for testing software raid...
What I see from your story you have scsi disks which are a bit more
reliable.
What disk controller are you using?
I know that some adaptec raid controllers (Dell Perc3/Di) can get their
panties in a knot under more then light I/O which results in the raid
controller throwing a disk I/O error. Which should not happen.
This is very common, see linux-poweredge@xxxxxxxxx
Sometimes the raid controller takes longer to recover from a disk error
then the Linux IO layer waits which results in an I/O error.
Some disks remap blocks on the fly without problem, however, the data
which was originally supposed to be written out is lost. Which is often
not noticed untill the fs driver reads in the (now corrupted) data.
See if you can get a utility for your disks to check for grown defects.
All disk suppliers have one.
Disks have relatively large caches now upto 8MB which is often in
writeback mode which result in a hefty loss of data on powerloss. It can
also plays parts in bad block remapping.
Conspiricay theory. Journaling filesystems wear out the disks faster on
the place where the journal lives.
> I have done an xfs_check, xfs_repair and nothing shows up, it just checks
> everything and doesn't report and any changes made so the assumption there
> is that none has been made.
Often the filesystem is not damaged because the filesystem trips early
on errors.
> I do doubt that my hardware is at fault due to the number of similar cases
I
> have seen over google about this problem, some even matching my report.
Yeah, well, I had a box failing often in the 2.4.0-test days. Which in
the end was led back to a faulty dimm. Ext2 filesystems gave no problems.
I have fixed someones problem on the list who had a K6-2 processor which
kept crashing with pentium kernels, this was fixed by using i486
compiled kernels.
The LVM history with respect to XFS is not exactly clean and I can not
comment on the use of LVM since I have not personally used it.
Try compiling a i386 kernel, is possible with an earlier compiler to
check things out.
> I would happily format the drives and start again, but I have over 700gb
of
> data there that is not backed up. Its not the end of the world if I have
to
> loose it all but if there is any other way to try and find out the cause
and
> solution to the problem Ill do it.
Please give me some hardware details.
Disk controller(s)
Disk make
Mainboard etc.
Highmem?
Kind regards,
Seth
|