[Top] [All Lists]

Extreme I/O latency

To: xfs@xxxxxxxxxxx
Subject: Extreme I/O latency
From: Fredrik Tolf <fredrik@xxxxxxxxxxxxx>
Date: Tue, 2 Oct 2012 03:53:08 +0200 (CEST)
User-agent: Alpine 2.02 (DEB 1266 2009-07-14)
Dear list,

I'm having some problems with a Linux system using XFS filesystems, on top of LVM, on top of mdraid, and I'm lacking ideas for how to proceed with debugging it. The problem manifests itself in that certain, simple I/O operations sometimes take extremely long to complete -- not seldomly up to 20-30 seconds!

I used to have lesser problems of a similar kind previously, but this extremeness only started showing up since I upgraded the system from Debian Lenny (using Linux 2.6.26) to Squeeze (using 2.6.32). I've since upgraded to 3.2.0, and now to 3.5.4, and they all exhibit the same problem.

The process having the worst problems with it usually sees them when it calls upon Berkeley DB, the stack traces in which seems to tell me that it's trying to do mmap'ed I/O in its region files, so I can only assume that the stop happens when it's pulling in pages from disk. I can't say I know for sure, but I'm getting the feeling that it happens when some other process calls fdatasync() or somesuch operation. I get this feeling because the problems very often seem to happen exactly when I fetch a MySQL-backed webpage from the system's HTTP server (at which point mysqld syncs its data to disk after some session table update or the like).

Does anyone have any clue as to what might cause symptoms like these, or, if not, how I can debug the issue further? Admittedly, it's not as if I can be sure that the problem belongs with XFS proper rather than LVM or mdraid, but I have to being somewhere. At least XFS is the direct interface that my programs call before getting stuck. :)

Your most obt. St. &c&c,
Fredrik Tolf

<Prev in Thread] Current Thread [Next in Thread>