xfs
[Top] [All Lists]

Re: xfs_iomap_write_unwritten stuck in congestion_wait?

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfs_iomap_write_unwritten stuck in congestion_wait?
From: Peter Watkins <treestem@xxxxxxxxx>
Date: Thu, 4 Apr 2013 11:50:15 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:x-received:in-reply-to:references:date:message-id :subject:from:to:cc:content-type; bh=OFZg+39cVM2UI5v37C8WxYK/7fD/K/gevKL8fQysvHM=; b=ewBKVFDjseiG1dXbR04wjTXOUTUtbERZDiEltYnMF2a45WpJa/48Ew/d8w0mEVZkA0 Zz1vzRcGnQO4P22kxhbuN6YboVhNXonJaVhfRTA16tcA+vBERurQcodlBGgE6p9rHRwu Sx7+E1+xgy43SM19UW3JNY3uAUq32stcWaVmymi5iP0ezj0gw1DgC88HnFhkW5O3LZ/i C9UQjyEEYHvYm/FKUca7WkiAzA93rvqGLuB/YWjgXH+6V+3/Iou7CMQySiRd0El99Gc6 +A7IqLxjqaCpDHBKZFbTfteOy6pxSU4jaPXxYW/kOmr/gj5Jnrbmfl77f6BVUUKxWqU0 J4uA==
In-reply-to: <20130404040041.GB12011@dastard>
References: <CAH4wwdF+gQTnbUR89e2KCUUJfS_cT_09wzmS55vys5rfbStW7Q@xxxxxxxxxxxxxx> <20130404040041.GB12011@dastard>
On Thu, Apr 4, 2013 at 12:00 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Apr 03, 2013 at 03:33:11PM -0400, Peter Watkins wrote:
>> Hello,
>>
>> Wondering if anyone has a suggestion for when
>> xfs_iomap_write_unwritten gets into congestion_wait.
>
> Do less IO?
>
>> In this case the system has almost half of normal zone pages in
>> NR_WRITEBACK with pretty much everybody held up in either
>> congestion_wait or balance_dirty_pages.
>
> Which is excessive - how are you getting to the point of having that
> many pages under IO at once? Writeback depth is limited by the IO
> elevator queue depths, so this shouldn't happen unless you've been
> tweaking block device parameters (i.e. nr_requests/max_sectors_kb)...
>
>> Since there are some free pages, seems like we'd be better off just
>> using a little more memory to finish this IO and in turn reduce pages
>> under write-back and add to free memory, rather than holding up here.
>> So maybe PF_MEMALLOC?
>
> Definitely not. Unwritten extent conversion can require hundreds of
> kilobytes of memory to complete, so all this will do is trigger even
> further exhaustion of memory reserves before we block on IO.
>
>> It also looks like this path allocates log vectors with KM_SLEEP but
>> lv_buf's with KM_SLEEP|KM_NOFS. Why is that?
>
> The transaction commit is copying the changes made into separate
> buffers to insert into the CIL for a later checkpoint to write to
> disk. This is normal behaviour - we can sleep there, but we cannot
> allow memory reclaim to recurse into the filesystem (for obvious
> reasons).
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

Thanks for the help.

There are other clues the VM system was rather quickly overwhelmed,
i.e. it couldn't even get bdi flush threads started without sending
threadd into congestion_wait.

So indeed there is a big multi-threaded writer which starts all at
once, and that can be smoothed out.

And nr_requests is dialed up from 128 to 1024. Is anyone really able
to resist that temptation?

-Peter

<Prev in Thread] Current Thread [Next in Thread>