xfs
[Top] [All Lists]

Re: Failing XFS filesystem underlying Ceph OSDs

To: Alex Gorbachev <ag@xxxxxxxxxxxxxxxxxxx>
Subject: Re: Failing XFS filesystem underlying Ceph OSDs
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Sun, 5 Jul 2015 09:38:02 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CADb9451j0WEeL4m-fEHzrOf3ys+eSHJKNzcN9oHAOs2kDLL_kQ@xxxxxxxxxxxxxx>
References: <CADb9451tB71D3XCqcOkDxzpzbdEHqwj7XCZUpL8yg1DzYbpwBw@xxxxxxxxxxxxxx> <20150703235141.GQ7943@dastard> <CADb9451j0WEeL4m-fEHzrOf3ys+eSHJKNzcN9oHAOs2kDLL_kQ@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Sat, Jul 04, 2015 at 10:46:24AM -0400, Alex Gorbachev wrote:
> Hello Dave, thank you for the response.  I got some recommendations on the
> ceph-users list that essentially pointed to the problem with
> vm.swappiness=0 and its new behavior - described here
> https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/
> 
> Basically setting it to 0 creates these OOM conditions due to never
> swapping anything out.  So I changed these settings right away:
> 
> sysctl vm.swappiness=20 (can probably be 1 as per article)
> 
> sysctl vm.min_free_kbytes=262144

That's not an explanation for what looks to be page cache radix
tree coruption. Memory reclaim still occurs with the settings you
have now and, well, those changes occurred back in 3.5 - some
3 years ago - so it's not really an explanation for a problem with a
recent 4.1 kernel...

> So far no issues, but I need to wait a week to see if anything shows up.
> Thank you for reviewing the error codes.

I expect that you'll see the problems again...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>