Failing XFS filesystem underlying Ceph OSDs
Alex Gorbachev
ag at iss-integration.com
Sat Jul 4 09:46:24 CDT 2015
Hello Dave, thank you for the response. I got some recommendations on the
ceph-users list that essentially pointed to the problem with
vm.swappiness=0 and its new behavior - described here
https://www.percona.com/blog/2014/04/28/oom-relation-vm-swappiness0-new-kernel/
Basically setting it to 0 creates these OOM conditions due to never
swapping anything out. So I changed these settings right away:
sysctl vm.swappiness=20 (can probably be 1 as per article)
sysctl vm.min_free_kbytes=262144
So far no issues, but I need to wait a week to see if anything shows up.
Thank you for reviewing the error codes.
Alex
On Fri, Jul 3, 2015 at 7:51 PM, Dave Chinner <david at fromorbit.com> wrote:
> On Fri, Jul 03, 2015 at 05:07:29AM -0400, Alex Gorbachev wrote:
> > Hello, we are seeing this and similar errors on multiple Supermicro nodes
> > running Ceph. OS is Ubuntu 14.04.2 with kernel 4.1
> >
> > Thank you for any info and troubleshooting advice.
>
> Nothing to suggest that this is an XFS problem. Memory reclaim
> triggered by network stack memory pressure is causing inode
> eviction. While removing the page cache it's falling over in
> the generic truncate code doing a radix tree lookup. That's all
> generic code - XFS never touches the page cache radix tree directly.
>
> I haven't seen this before - is this a new problem since you
> upgraded your kernel to 4.1? Is it repeatable? if yes to both, then
> a bisect may be in order to isolate the problematic commit...
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
>
> _______________________________________________
> xfs mailing list
> xfs at oss.sgi.com
> http://oss.sgi.com/mailman/listinfo/xfs
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150704/d553976f/attachment.html>
More information about the xfs
mailing list