xfs
[Top] [All Lists]

re: Write Verify and XFS

To: "Buzbee, James" <James.Buzbee@xxxxxxxxxxxx>, <linux-xfs@xxxxxxxxxxx>
Subject: re: Write Verify and XFS
From: Greg Freemyer <freemyer@xxxxxxxxxxxxxxxxx>
Date: Thu, 13 Feb 2003 11:43:31 -0500
Organization: Norcross Group
Sender: linux-xfs-bounce@xxxxxxxxxxx
James,

That is an interesting post, but it definitely all takes place below the XFS 
level.

Filesystems in general assume that the underlying disks/raid systems are 
reliable.

And just so you know, even write verify is not enough to truly ensure you won't 
have problems.

During the 80's it was common to have disk areas with weak magnetic 
characteristics.  The end result was that you had sectors that could hold data 
for long enough to pass write/read tests, but if you wrote something to disk, 
then tried to read it 6 months later, you got read errors.  

At that time I used to do a read only diskscan on all new drives.  That way any 
weak magnetic areas could be detected immediately and remapped.  After that, I 
would partition and mkfs.  (You do know every sector has a CRC, so the drive 
can tell if even one bit has changed from when it was written.)

I have not seen that behavior recently, but I always use RAID now if I care 
about the data.

If I were experiencing this problem today, I would at least do a "dd 
if=/dev/hdb of=/dev/null" on all new drives and let all the read errors occur 
and have automatic remapping take place.  (I've been told that IDE bad block 
remapping is automatic, but I am not a definitive source on that.)

I guess you could also write out a data pattern that stresses the magnetic 
fields, leave the drive alone for a day/week/month, then try to read it back 
off with the above dd command.  Unfortunately, I don't know what data pattern 
would stress a drive.

HTH
Greg Freemyer


 >>  Sorry for the long post, but I'm wrestling with several issues related 
 >>  to manufacturing flaws that are (evidently) commonly found with IDE
 >>  drives.

 >>  What we are finding is that there are bad sectors on new drives that are 
 >>  not recognized until we try to read previously written data. We've been 
 >>  told by the drive manufacturer that this is "normal".  You write data to 
 >>  the drive, the write succeeds.  Sometime later you go to read your data 
 >>  and you get an error such as :

 >>  hdc: dma_intr: status=0x51 { DriveReady SeekComplete Error }
 >>  hdc: dma_intr: error=0x40 { UncorrectableError }, LBAsect=24093, 
 >>  sector=23984
 >>  end_request: I/O error, dev 16:01 (hdc), sector 23984

 >>  Obviously by this time your data is lost. We're told that if we write to 
 >>  that bad sector again, the sector will be remapped away.

 >>  First question, is this -really- normal?  I have a hard time believing 
 >>  that it is OK for a brand new drive to silently fail on some writes. Can 
 >>  this error be detected at write time, and if so, what would Linux/XFS do 
 >>  with the error?

 >>  Researching a bit, I find that some drives are shipped from the 
 >>  manufacturer with "Write Verify" turned on for the first "N" power 
 >>  cycles. Based on my limited understanding of Write Verify this implies 
 >>  to me that the manufacturer wants the consumer to use the drive for a 
 >>  while, and the "bad" sectors will turn up and get remapped.  Obviously, 
 >>  this will have a impact on drive performance during this period. Is this 
 >>  because the drive manufacturers don't want to spend the time/money to 
 >>  verify the platter themselves?

 >>  Second question, when Write Verify is turned on and a bad sector is 
 >>  detected, will the drive firmware transparently take care of remapping 
 >>  and re-writing the data or does an error get passed up the chain for 
 >>  higher level software to take care of?  If the error gets passed up the 
 >>  chain, what will Linux/XFS do with the error?

 >>  Last question, if this -really- is normal, and it is something that has 
 >>  to be dealt with, what is the proper solution?  I know that leaving 
 >>  Write Verify turned on is not the solution.  It cuts drive performance 
 >>  in half. One proposed solution (for us at least) is to turn Write Verify 
 >>  on for "critical" data and off for "non-critical" data. This seems like 
 >>  a hack.  <RANT> If I write the data to the drive, and the drive says 
 >>  "OK" shouldn't the data be there? </RANT>


 >>  Thanks for any and all feedback!

 >>  Jim


<Prev in Thread] Current Thread [Next in Thread>