xfs
[Top] [All Lists]

Re: Have the velociraptors in a test system now, checkout the errors.

To: Justin Piszcz <jpiszcz@xxxxxxxxxxxxxxx>
Subject: Re: Have the velociraptors in a test system now, checkout the errors.
From: Bill Davidsen <davidsen@xxxxxxx>
Date: Wed, 10 Dec 2008 19:11:00 -0500
Cc: linux-kernel@xxxxxxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, smartmontools-support@xxxxxxxxxxxxxxxxxxxxx
In-reply-to: <alpine.DEB.1.10.0812060426280.3494@xxxxxxxxxxxxxxxx>
Organization: TMR Associates Inc, Schenectady NY
References: <alpine.DEB.1.10.0812060426280.3494@xxxxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8.1.18) Gecko/20081112 Fedora/1.1.13-1.fc9 SeaMonkey/1.1.13
Justin Piszcz wrote:
Point of thread: Two problems, mentioned in detail below, NCQ in Linux when used in a RAID configuration and two, something with how Linux interacts with the drives causes lots of problems as when I run the WD tools on the disks, they do not show any errors.

If anyone has/would like me to run any debugging/patches/etc on this system feel free to suggest/send me things to try out. After I put the VR's in a test system, I left NCQ enabled and I made a 10 disk raid5 to see how fast I could get it to fail, I ran bonnie++ shown below as a disk benchmark/stress test:

For the next test I will repeat this one but with NCQ disabled, having NCQ enabled makes it fail very easily. Then I want to re-run the test with RAID6.

bonnie++ -d /r1/test -s 1000G -m p63 -n 16:100000:16:64

$ df -h
/dev/md3              2.5T  5.5M  2.5T   1% /r1

And the results? Two disk "failures" according to md/Linux within a few hours as shown below:

Note, the NCQ-related errors are what I talk about all of the time, if you use
NCQ and Linux in a RAID environment with WD drives, well-- good luck.

Two-disks failed out of the RAID5 and I currentlty cannot even 'see' one of the drives with smartctl, will reboot the host and check sde again.

After a reboot, it comes up and has no errors, really makes one wonder where/what the bugs is/are, there are two I can see:
1. NCQ issue on at least WD drives in Linux in SW md/RAID
2. Velociraptor/other disks reporting all kinds of sector errors etc, but when you use the WD 11.x disk tools program and run all of their tests it says the disks have no problems whatsoever! The smart statistics do confirm this. Currently, TLER is on for all disks, for the duration of these tests.

Just a few comments on this, I have several RAID arrays built on Seagate using NCQ, and yet to have a problem. I have NCQ on with my WD drives, non-RAID, and haven't had an issue with them either. The WDs run a lot cooler than the SG, but they are probably getting less use, as well. If the WD are still on sale after the holiday I may grab a few more and run RAID, by then I will have some small sense of trusting them.

--
Bill Davidsen <davidsen@xxxxxxx>
 "Woe unto the statesman who makes war without a reason that will still
be valid when the war is over..." Otto von Bismark

<Prev in Thread] Current Thread [Next in Thread>