xfs
[Top] [All Lists]

Re: Extreme I/O latency

To: Fredrik Tolf <fredrik@xxxxxxxxxxxxx>
Subject: Re: Extreme I/O latency
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 2 Oct 2012 15:08:06 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <alpine.DEB.2.02.1210020422540.3390@xxxxxxxxxxxxxxxxxxx>
References: <alpine.DEB.2.02.1210020338580.3390@xxxxxxxxxxxxxxxxxxx> <20121002022041.GN23520@dastard> <alpine.DEB.2.02.1210020422540.3390@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, Oct 02, 2012 at 05:25:53AM +0200, Fredrik Tolf wrote:
> On Tue, 2 Oct 2012, Dave Chinner wrote:
> >On Tue, Oct 02, 2012 at 03:53:08AM +0200, Fredrik Tolf wrote:
> >What is a "simple IO operation"?
> 
> Sorry, what I meant by "simple" is mostly on the interface level.
> Like, a single syscall (with far less than a page of data in the
> case of read or write), or, in this case, reading a single mmap'ed
> page.

Which might be "simple" by themselves, but when an application is
doing thousands of such operations a second, the result is far from
simple....

> 
> >>The process having the worst problems with it usually sees them when
> >>it calls upon Berkeley DB, the stack traces in which seems to tell
> >>me that it's trying to do mmap'ed I/O in its region files, so I can
> >>only assume that the stop happens when it's pulling in pages from
> >>disk. I can't say I know for sure, but I'm getting the feeling that
> >>it happens when some other process calls fdatasync() or somesuch
> >>operation. I get this feeling because the problems very often seem
> >>to happen exactly when I fetch a MySQL-backed webpage from the
> >>system's HTTP server (at which point mysqld syncs its data to disk
> >>after some session table update or the like).
> >
> >So is causing random 4k write IO?
> 
> Which one, do you mean? The mmap'ed I/O would be a random 4k read,
> rather than a write. Exactly what happens as a result of the
> fdatasync that MySQL calls is not something I am completely privy
> to.

fdatasync does not cause read operations to occur. It will cause
dirty pages to be written to disk. My assumption is that the pages
are being faulted by mmap access to modify the data within them, and
hence you then get them written back when a a fdatasync occurs...

> The point being that the fdatasync operation also seems to cause
> other, otherwise unrelated, processes to stop dead in their tracks
> when they try to do I/O while the fdatasync is running.

Which tends to imply that there is random write IO to a RAID5/6
volume occurring...

> Though, don't take my gut feeling that fdatasync is the cause too
> seriously. I haven't been able to debug it well enough to say
> conclusively that it only happens while syncs are running.

While writeback is occurring, hence...

> >More information about your setup needed and what is happening
> >during the hangs:
> >
> >http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F
> 
> Oh, sorry. I'll provide that if necessary, but...

stuff like iostat output will tell you what sort of IO is occurring.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>