xfs
[Top] [All Lists]

Re: memory reclaim problems on fs usage

To: arekm@xxxxxxxx
Subject: Re: memory reclaim problems on fs usage
From: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Date: Sun, 15 Nov 2015 11:35:11 +0900
Cc: htejun@xxxxxxxxx, cl@xxxxxxxxx, mhocko@xxxxxxxx, linux-mm@xxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <201511142140.38245.arekm@xxxxxxxx>
References: <201511102313.36685.arekm@xxxxxxxx> <56449E44.7020407@xxxxxxxxxxxxxxxxxxx> <201511122228.26399.arekm@xxxxxxxx> <201511142140.38245.arekm@xxxxxxxx>
Arkadiusz Miskiewicz wrote:
> > > vmstat_update() and submit_flushes() remained pending for about 110
> > > seconds. If xlog_cil_push_work() were spinning inside GFP_NOFS
> > > allocation, it should be reported as MemAlloc: traces, but no such lines
> > > are recorded. I don't know why xlog_cil_push_work() did not call
> > > schedule() for so long. Anyway, applying
> > > http://lkml.kernel.org/r/20151111160336.GD1432@xxxxxxxxxxxxxx should
> > > solve vmstat_update() part.
> > 
> > To apply that patch on top of 4.1.13 I also had to apply patches listed
> > below.
> > 
> > So in summary appllied:
> > http://sprunge.us/GYBb
> > http://sprunge.us/XWUX
> > http://sprunge.us/jZjV
> 
> I've tried more to trigger "page allocation failure" with usual actions that 
> triggered it previously but couldn't reproduce. With these patches applied it 
> doesn't happen.
> 
> Logs from my tests:
> 
> http://ixion.pld-linux.org/~arekm/log-mm-3.txt.gz
> http://ixion.pld-linux.org/~arekm/log-mm-4.txt.gz (with swap added)
> 
Good.

vmstat_update() and submit_flushes() are no longer pending for long.

log-mm-4.txt:Nov 14 16:40:08 srv kernel: [167753.393960]     pending: 
vmstat_shepherd, vmpressure_work_fn
log-mm-4.txt:Nov 14 16:40:08 srv kernel: [167753.393984]     pending: 
submit_flushes [md_mod]
log-mm-4.txt:Nov 14 16:41:08 srv kernel: [167813.439405]     pending: 
submit_flushes [md_mod]
log-mm-4.txt:Nov 14 17:17:19 srv kernel: [169985.104806]     pending: 
vmstat_shepherd

I think that the vmstat statistics now have correct values.

> But are these patches solving the problem or just hiding it?
> 
Excuse me but I can't judge.

If you are interested in monitoring how vmstat statistics are changing
under stalled condition, you can try below patch.

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35a46b4..3de3a14 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2794,8 +2794,7 @@ static int kmallocwd(void *unused)
        rcu_read_unlock();
        preempt_enable();
        show_workqueue_state();
-       if (dump_target_pid <= 0)
-               dump_target_pid = -pid;
+       show_mem(0);
        /* Wait until next timeout duration. */
        schedule_timeout_interruptible(kmallocwd_timeout);
        if (memalloc_counter[index])

<Prev in Thread] Current Thread [Next in Thread>