On 11-01-26 10:53 PM, Mark Lord wrote:
> On 11-01-26 10:43 PM, Dave Chinner wrote:
>> On Wed, Jan 26, 2011 at 08:43:43PM -0500, Mark Lord wrote:
>>> On 11-01-26 08:22 PM, Mark Lord wrote:
>>> Thinking about it some more: the first problem very much appears as if
>>> it is due to a filesystem check happening on the already-mounted filesystem,
>>> if that makes any kind of sense (?).
>> Not to me. You can check this simply by looking at the output of
>> top while the problem is occurring...
> Top doesn't show anything interesting, since disk I/O uses practically zero
>>> running xfs_check on the umounted drive takes about the same 30-60 seconds,
>>> with the disk activity light fully "on".
>> Well, yeah - XFS check reads all the metadata in the filesystem, so
>> of course it's going to thrash your disk when it is run. The fact it
>> takes the same length of time as whatever problem you are having is
>> likely to be coincidental.
> I find it interesting that the mount takes zero-time,
> as if it never actually reads much from the filesystem.
> Something has to eventually read the metadata etc.
>>> The other thought that came to mind: this behaviour has only been
>>> noticed recently, probably because I have recently added about
>>> 1000 new files (hundreds of MB each) to the videos/ directory on
>>> that filesystem. Whereas before, it had fewer than 500 (multi-GB)
>>> files in total.
>>> So if it really is doing some kind of internal filesystem check,
>>> then the time required has only recently become 3X larger than
>>> before.. so the behaviour may not be new/recent, but now is very
>> Where does that 3x figure come from?
> Well, it used to have about 500 files/subdirs on it,
> and now it has somewhat over 1500 files/subdirs.
> That's a ballpark estimate of 3X the amount of meta data.
> All of these files are at least large (hundreds of MB),
> and a lot are huge (many GB) in size.
I've rebuilt the kernel with the various config options to enable blktrace
and XFS_DEBUG, but in the meanwhile we have also watched and deleted
a few GB of recordings.
The result is that the mysterious first-write delay has vanished, for now,
so there's nothing to trace.
I think I'll pick up an extra 2TB drive, so that next time
it surfaces I can simply bit-clone the filesystem or something,
to preserve the buggered state for further examination.
The second issue is probably still there, and I'll blktrace that instead.
But it will have to wait a spell -- I've run out of time here right now.