Failing XFS filesystem underlying Ceph OSDs

Alex Gorbachev ag at iss-integration.com
Sat Jul 4 09:46:24 CDT 2015


Hello Dave, thank you for the response.  I got some recommendations on the
ceph-users list that essentially pointed to the problem with
vm.swappiness=0 and its new behavior - described here
https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/

Basically setting it to 0 creates these OOM conditions due to never
swapping anything out.  So I changed these settings right away:

sysctl vm.swappiness=20 (can probably be 1 as per article)

sysctl vm.min_free_kbytes=262144


So far no issues, but I need to wait a week to see if anything shows up.
Thank you for reviewing the error codes.


Alex

On Fri, Jul 3, 2015 at 7:51 PM, Dave Chinner <david at fromorbit.com> wrote:

> On Fri, Jul 03, 2015 at 05:07:29AM -0400, Alex Gorbachev wrote:
> > Hello, we are seeing this and similar errors on multiple Supermicro nodes
> > running Ceph.  OS is Ubuntu 14.04.2 with kernel 4.1
> >
> > Thank you for any info and troubleshooting advice.
>
> Nothing to suggest that this is an XFS problem. Memory reclaim
> triggered by network stack memory pressure is causing inode
> eviction. While removing the page cache it's falling over in
> the generic truncate code doing a radix tree lookup. That's all
> generic code - XFS never touches the page cache radix tree directly.
>
> I haven't seen this before - is this a new problem since you
> upgraded your kernel to 4.1? Is it repeatable? if yes to both, then
> a bisect may be in order to isolate the problematic commit...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150704/d553976f/attachment.html>


More information about the xfs mailing list