xfs
[Top] [All Lists]

Write Verify and XFS

To: linux-xfs@xxxxxxxxxxx
Subject: Write Verify and XFS
From: "Buzbee, James" <James.Buzbee@xxxxxxxxxxxx>
Date: Thu, 13 Feb 2003 09:03:57 -0700
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.0.0) Gecko/20020529

Sorry for the long post, but I'm wrestling with several issues related to manufacturing flaws that are (evidently) commonly found with IDE drives.


What we are finding is that there are bad sectors on new drives that are not recognized until we try to read previously written data. We've been told by the drive manufacturer that this is "normal". You write data to the drive, the write succeeds. Sometime later you go to read your data and you get an error such as :

hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=24093, sector=23984
end_request: I/O error, dev 16:01 (hdc), sector 23984


Obviously by this time your data is lost. We're told that if we write to that bad sector again, the sector will be remapped away.

First question, is this -really- normal? I have a hard time believing that it is OK for a brand new drive to silently fail on some writes. Can this error be detected at write time, and if so, what would Linux/XFS do with the error?

Researching a bit, I find that some drives are shipped from the manufacturer with "Write Verify" turned on for the first "N" power cycles. Based on my limited understanding of Write Verify this implies to me that the manufacturer wants the consumer to use the drive for a while, and the "bad" sectors will turn up and get remapped. Obviously, this will have a impact on drive performance during this period. Is this because the drive manufacturers don't want to spend the time/money to verify the platter themselves?

Second question, when Write Verify is turned on and a bad sector is detected, will the drive firmware transparently take care of remapping and re-writing the data or does an error get passed up the chain for higher level software to take care of? If the error gets passed up the chain, what will Linux/XFS do with the error?

Last question, if this -really- is normal, and it is something that has to be dealt with, what is the proper solution? I know that leaving Write Verify turned on is not the solution. It cuts drive performance in half. One proposed solution (for us at least) is to turn Write Verify on for "critical" data and off for "non-critical" data. This seems like a hack. <RANT> If I write the data to the drive, and the drive says "OK" shouldn't the data be there? </RANT>


Thanks for any and all feedback!

Jim





<Prev in Thread] Current Thread [Next in Thread>