On Tue, Oct 02, 2012 at 03:53:08AM +0200, Fredrik Tolf wrote:
> Dear list,
> I'm having some problems with a Linux system using XFS filesystems,
> on top of LVM, on top of mdraid, and I'm lacking ideas for how to
> proceed with debugging it. The problem manifests itself in that
> certain, simple I/O operations sometimes take extremely long to
> complete -- not seldomly up to 20-30 seconds!
What is a "simple IO operation"?
> I used to have lesser problems of a similar kind previously, but
> this extremeness only started showing up since I upgraded the system
> from Debian Lenny (using Linux 2.6.26) to Squeeze (using 2.6.32).
> I've since upgraded to 3.2.0, and now to 3.5.4, and they all exhibit
> the same problem.
> The process having the worst problems with it usually sees them when
> it calls upon Berkeley DB, the stack traces in which seems to tell
> me that it's trying to do mmap'ed I/O in its region files, so I can
> only assume that the stop happens when it's pulling in pages from
> disk. I can't say I know for sure, but I'm getting the feeling that
> it happens when some other process calls fdatasync() or somesuch
> operation. I get this feeling because the problems very often seem
> to happen exactly when I fetch a MySQL-backed webpage from the
> system's HTTP server (at which point mysqld syncs its data to disk
> after some session table update or the like).
So is causing random 4k write IO?
> Does anyone have any clue as to what might cause symptoms like
> these, or, if not, how I can debug the issue further? Admittedly,
> it's not as if I can be sure that the problem belongs with XFS
> proper rather than LVM or mdraid, but I have to being somewhere. At
> least XFS is the direct interface that my programs call before
> getting stuck. :)
More information about your setup needed and what is happening
during the hangs:
Also: ftrace or latencytop might point you at where the the latency
is occurring. Then we might have some idea of what is causing it.