xfs
[Top] [All Lists]

Re: memory reclaim problems on fs usage

To: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>
Subject: Re: memory reclaim problems on fs usage
From: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx>
Date: Sun, 15 Nov 2015 12:29:23 +0100
Cc: htejun@xxxxxxxxx, cl@xxxxxxxxx, mhocko@xxxxxxxx, linux-mm@xxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=TmwjvbOi58MtUxQmPuPJBbQavo8v1tk+jDGjy1IFAb4=; b=F1dr0ZpSZz7Lz6pyLg5+p71t46N4Bn1J1/kkuLI5U1NlWJ+Ar0r1iY4EjG5535xTt6 K3+khwjLyguAAJeU756uMKipasQHI2KrBevpHavYotuLE8G7SPifAUrxA1pAEimAqAo8 t8ow2jhrXOGzBNOSQYsj4LJaqjGGJTZekwDyM=
In-reply-to: <201511151135.JGD81717.OFOOSMFJFQHVtL@xxxxxxxxxxxxxxxxxxx>
References: <201511102313.36685.arekm@xxxxxxxx> <201511142140.38245.arekm@xxxxxxxx> <201511151135.JGD81717.OFOOSMFJFQHVtL@xxxxxxxxxxxxxxxxxxx>
User-agent: KMail/1.13.7 (Linux/4.3.0; KDE/4.14.13; x86_64; ; )
On Sunday 15 of November 2015, Tetsuo Handa wrote:
> Arkadiusz Miskiewicz wrote:
> > > > vmstat_update() and submit_flushes() remained pending for about 110
> > > > seconds. If xlog_cil_push_work() were spinning inside GFP_NOFS
> > > > allocation, it should be reported as MemAlloc: traces, but no such
> > > > lines are recorded. I don't know why xlog_cil_push_work() did not
> > > > call schedule() for so long. Anyway, applying
> > > > http://lkml.kernel.org/r/20151111160336.GD1432@xxxxxxxxxxxxxx should
> > > > solve vmstat_update() part.
> > > 
> > > To apply that patch on top of 4.1.13 I also had to apply patches listed
> > > below.
> > > 
> > > So in summary appllied:
> > > http://sprunge.us/GYBb
> > > http://sprunge.us/XWUX
> > > http://sprunge.us/jZjV
> > 
> > I've tried more to trigger "page allocation failure" with usual actions
> > that triggered it previously but couldn't reproduce. With these patches
> > applied it doesn't happen.
> > 
> > Logs from my tests:
> > 
> > http://ixion.pld-linux.org/~arekm/log-mm-3.txt.gz
> > http://ixion.pld-linux.org/~arekm/log-mm-4.txt.gz (with swap added)
> 
> Good.
> 
> vmstat_update() and submit_flushes() are no longer pending for long.
> 
> log-mm-4.txt:Nov 14 16:40:08 srv kernel: [167753.393960]     pending:
> vmstat_shepherd, vmpressure_work_fn log-mm-4.txt:Nov 14 16:40:08 srv
> kernel: [167753.393984]     pending: submit_flushes [md_mod]
> log-mm-4.txt:Nov 14 16:41:08 srv kernel: [167813.439405]     pending:
> submit_flushes [md_mod] log-mm-4.txt:Nov 14 17:17:19 srv kernel:
> [169985.104806]     pending: vmstat_shepherd
> 
> I think that the vmstat statistics now have correct values.
> 
> > But are these patches solving the problem or just hiding it?
> 
> Excuse me but I can't judge.
>
> If you are interested in monitoring how vmstat statistics are changing
> under stalled condition, you can try below patch.


Here is log with this and all previous patches applied:
http://ixion.pld-linux.org/~arekm/log-mm-5.txt.gz


> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 35a46b4..3de3a14 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2794,8 +2794,7 @@ static int kmallocwd(void *unused)
>       rcu_read_unlock();
>       preempt_enable();
>       show_workqueue_state();
> -     if (dump_target_pid <= 0)
> -             dump_target_pid = -pid;
> +     show_mem(0);
>       /* Wait until next timeout duration. */
>       schedule_timeout_interruptible(kmallocwd_timeout);
>       if (memalloc_counter[index])


-- 
Arkadiusz MiÅkiewicz, arekm / ( maven.pl | pld-linux.org )

<Prev in Thread] Current Thread [Next in Thread>