On 2011.06.20 at 14:31 +0200, Markus Trippelsdorf wrote:
> On 2011.06.20 at 13:45 +0200, Michael Monnerie wrote:
> > On Montag, 20. Juni 2011 Markus Trippelsdorf wrote:
> > > Here are two more examples. The time when the hang occurs is marked
> > Could it be that some sectors on the disk are not easy to read for the
> > drive, and that it simply retries several times until it works again?
> > SATA disks can show that behaviour. You could try with "dd" with
> > seek/skip parameters so you read 1gb at once, then skip 1gb and read 1gb
> > again etc, and compare the throughput over all 1gb areas. If there's one
> > slower, that might be the problem.
> > Maybe a check with "smartctl" could help, too.
> Thanks for the hint, Michael. I've just checked the SMART status on
> both disks and the 4kb drive looks indeed suspicious:
> 197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always
> - 8
> 198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline
> - 8
> The 512 byte drive appears to be fine. But I'm running the long
> SMART self test on both of them right now and will report back
> the result in a few hours.
Hmm, both tests ran fine without any errors. And the two SMART
attributes above are back to zero again (must have been a temporary
As you can see in the data I've posted, the disk workload consists
almost only of writes. And I don't think a disk retries writes several
times. On the contrary a write to a bad sector should fix it, because
the drive can then remap it safely. (Current_Pending_Sector would
decrease and Reallocated_Sector_Ct would increase. But
Reallocated_Sector_Ct is still 0 on both affected drives)
And shouldn't I see these "hangs" in situations other than "rm -fr", if
the disk drive would be responsible?