xfs
[Top] [All Lists]

Re: long hangs when deleting large directories (3.0-rc3)

To: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Subject: Re: long hangs when deleting large directories (3.0-rc3)
From: Markus Trippelsdorf <markus@xxxxxxxxxxxxxxx>
Date: Mon, 20 Jun 2011 23:16:07 +0200
Cc: xfs@xxxxxxxxxxx, Dave Chinner <david@xxxxxxxxxxxxx>
Dkim-signature: v=1; a=rsa-sha256; c=simple; d=mail.ud10.udmedia.de; h= date:from:to:cc:subject:message-id:references:mime-version: content-type:in-reply-to; q=dns/txt; s=beta; bh=GmAI6ueeecFUtofz T/dBAUMXnwKTX0EXuek6dKinLsM=; b=afAhW4Tz9DU05Z76ey7eeCmhOt+hEuZL mDi/T/SAZmncZ0PmV13b7IxfZlkglIa9zaSaWo7OjaJuGWDl4m7M3d/BWeF+3/sK yjAwAbI58yqDUO3Fg75oRbLr4OhZSCS4N8oRb7aE5GSebGO9kPTvix2HouymwPv5 tCwiC1ua+CA=
In-reply-to: <20110620123132.GA1717@xxxxxxxxxxxxxx>
References: <20110618141950.GA1685@xxxxxxxxxxxxxx> <20110620060351.GC1730@xxxxxxxxxxxxxx> <20110620111359.GA12632@xxxxxxxxxxxxxx> <201106201345.30271@xxxxxx> <20110620123132.GA1717@xxxxxxxxxxxxxx>
On 2011.06.20 at 14:31 +0200, Markus Trippelsdorf wrote:
> On 2011.06.20 at 13:45 +0200, Michael Monnerie wrote:
> > On Montag, 20. Juni 2011 Markus Trippelsdorf wrote:
> > > Here are two more examples. The time when the hang occurs is marked
> > 
> > Could it be that some sectors on the disk are not easy to read for the 
> > drive, and that it simply retries several times until it works again? 
> > SATA disks can show that behaviour. You could try with "dd" with 
> > seek/skip parameters so you read 1gb at once, then skip 1gb and read 1gb 
> > again etc, and compare the throughput over all 1gb areas. If there's one 
> > slower, that might be the problem.
> > 
> > Maybe a check with "smartctl" could help, too.
> 
> Thanks for the hint, Michael. I've just checked the SMART status on
> both disks and the 4kb drive looks indeed suspicious:
> 
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always      
>  -       8
> 198 Offline_Uncorrectable   0x0010   100   100   000    Old_age   Offline     
>  -       8
> 
> The 512 byte drive appears to be fine. But I'm running the long
> SMART self test on both of them right now and will report back
> the result in a few hours.

Hmm, both tests ran fine without any errors. And the two SMART
attributes above are back to zero again (must have been a temporary
firmware hiccup). 

As you can see in the data I've posted, the disk workload consists
almost only of writes. And I don't think a disk retries writes several
times. On the contrary a write to a bad sector should fix it, because
the drive can then remap it safely. (Current_Pending_Sector would
decrease and Reallocated_Sector_Ct would increase. But
Reallocated_Sector_Ct is still 0 on both affected drives)

And shouldn't I see these "hangs" in situations other than "rm -fr", if
the disk drive would be responsible?

-- 
Markus

<Prev in Thread] Current Thread [Next in Thread>