xfs
[Top] [All Lists]

Re: memory reclaim problems on fs usage

To: Michal Hocko <mhocko@xxxxxxx>
Subject: Re: memory reclaim problems on fs usage
From: Arkadiusz MiÅkiewicz <arekm@xxxxxxxx>
Date: Wed, 18 Nov 2015 23:36:18 +0100
Cc: Tetsuo Handa <penguin-kernel@xxxxxxxxxxxxxxxxxxx>, htejun@xxxxxxxxx, cl@xxxxxxxxx, linux-mm@xxxxxxxxx, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=maven.pl; s=maven; h=from:to:subject:date:user-agent:cc:references:in-reply-to :mime-version:content-type:content-transfer-encoding:message-id; bh=9pJ4okAKG5vShqgQ3GkIqofXHXg2iN3Qffi+h97djBM=; b=SS8/N2wfPE7i5+X8qriBV+JiD9fO61qQm9YDPQc8UA8/Ljhuo6wxZHjl9Ye0+Mg+MA bsXYiTN5k6BQEEQ8or7oASbipubQEyHLzXirDCXb3EbSxBKK//vJ7ug4/eHSjNO+B9Z/ OQShMWn97Lv86yGeydkozoEogRf3jCQEGPc90=
In-reply-to: <20151116161518.GI14116@xxxxxxxxxxxxxx>
References: <201511102313.36685.arekm@xxxxxxxx> <201511151549.35299.arekm@xxxxxxxx> <20151116161518.GI14116@xxxxxxxxxxxxxx>
User-agent: KMail/1.13.7 (Linux/4.3.0; KDE/4.14.13; x86_64; ; )
On Monday 16 of November 2015, Michal Hocko wrote:
> On Sun 15-11-15 15:49:35, Arkadiusz MiÅkiewicz wrote:
> > On Sunday 15 of November 2015, Tetsuo Handa wrote:
> > > Arkadiusz Miskiewicz wrote:
> > > > On Sunday 15 of November 2015, Tetsuo Handa wrote:
> > > > > I think that the vmstat statistics now have correct values.
> > > > > 
> > > > > > But are these patches solving the problem or just hiding it?
> > > > > 
> > > > > Excuse me but I can't judge.
> > > > > 
> > > > > If you are interested in monitoring how vmstat statistics are
> > > > > changing under stalled condition, you can try below patch.
> > > > 
> > > > Here is log with this and all previous patches applied:
> > > > http://ixion.pld-linux.org/~arekm/log-mm-5.txt.gz
> > > 
> > > Regarding "Node 0 Normal" (min:7104kB low:8880kB high:10656kB),
> > > all free: values look sane to me. I think that your problem was solved.
> > 
> > Great, thanks!
> > 
> > Will all (or part) of these patches
> > 
> > http://sprunge.us/GYBb
> 
> Migrate reserves are not a stable material I am afraid. "vmstat:
> explicitly schedule per-cpu work on the CPU we need it to run on"
> was not marked for stable either but I am not sure why it should make
> any difference for your load. I understand that testing this is really
> tedious but it would be better to know which of the patches actually
> made a difference.

Ok. In mean time I've tried 4.3.0 kernel + patches (the same as before + one 
more) on second server which runs even more rsnapshot processes and also uses 
xfs on md raid 6.

Patches:
http://sprunge.us/DfIQ (debug patch from Tetsuo)
http://sprunge.us/LQPF (backport of things from git + one from ml)

The problem is now with high order allocations probably:
http://ixion.pld-linux.org/~arekm/log-mm-2srv-1.txt.gz

System is doing very slow progress and for example depmod run took 2 hours
http://sprunge.us/HGbE
Sometimes I was able to ssh-in, dmesg took 10-15 minutes but sometimes it 
worked fast for short period.

Ideas?

ps. I also had one problem with low order allocation but only once and wasn't 
able to reproduce so far. I was running kernel with backport patches but no 
debug patch, so got only this in logs:
http://sprunge.us/WPXi

-- 
Arkadiusz MiÅkiewicz, arekm / ( maven.pl | pld-linux.org )

<Prev in Thread] Current Thread [Next in Thread>