Sorry for the long post, but I'm wrestling with several issues related
to manufacturing flaws that are (evidently) commonly found with IDE drives.
What we are finding is that there are bad sectors on new drives that are
not recognized until we try to read previously written data. We've been
told by the drive manufacturer that this is "normal". You write data to
the drive, the write succeeds. Sometime later you go to read your data
and you get an error such as :
hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=24093,
sector=23984
end_request: I/O error, dev 16:01 (hdc), sector 23984
Obviously by this time your data is lost. We're told that if we write to
that bad sector again, the sector will be remapped away.
First question, is this -really- normal? I have a hard time believing
that it is OK for a brand new drive to silently fail on some writes. Can
this error be detected at write time, and if so, what would Linux/XFS do
with the error?
Researching a bit, I find that some drives are shipped from the
manufacturer with "Write Verify" turned on for the first "N" power
cycles. Based on my limited understanding of Write Verify this implies
to me that the manufacturer wants the consumer to use the drive for a
while, and the "bad" sectors will turn up and get remapped. Obviously,
this will have a impact on drive performance during this period. Is this
because the drive manufacturers don't want to spend the time/money to
verify the platter themselves?
Second question, when Write Verify is turned on and a bad sector is
detected, will the drive firmware transparently take care of remapping
and re-writing the data or does an error get passed up the chain for
higher level software to take care of? If the error gets passed up the
chain, what will Linux/XFS do with the error?
Last question, if this -really- is normal, and it is something that has
to be dealt with, what is the proper solution? I know that leaving
Write Verify turned on is not the solution. It cuts drive performance
in half. One proposed solution (for us at least) is to turn Write Verify
on for "critical" data and off for "non-critical" data. This seems like
a hack. <RANT> If I write the data to the drive, and the drive says
"OK" shouldn't the data be there? </RANT>
Thanks for any and all feedback!
Jim
|