Failing XFS filesystem underlying Ceph OSDs

Alex Gorbachev ag at iss-integration.com
Thu Aug 13 09:25:56 CDT 2015


Good morning,

We have experienced one more failure like the ones originally described.  I
am assuming the vm.min_free_kbytes at 256 MB helped (only one hit, OSD went
down but the rest of the cluster stayed up unlike the previous massive
storms).  So I went ahead and increased the vm.min_free_kbytes to 1 GB.

I do not know of any way to reproduce the problem, or what causes it.
There is no unusual IO pattern at the time that we are aware of.

Thanks,
Alex

On Wed, Jul 22, 2015 at 8:23 AM, Alex Gorbachev <ag at iss-integration.com>
wrote:

> Hi Dave,
>
> On Mon, Jul 6, 2015 at 8:35 PM, Dave Chinner <david at fromorbit.com> wrote:
>
>> On Mon, Jul 06, 2015 at 03:20:19PM -0400, Alex Gorbachev wrote:
>> > On Sun, Jul 5, 2015 at 7:24 PM, Dave Chinner <david at fromorbit.com>
>> wrote:
>> > > On Sun, Jul 05, 2015 at 12:25:47AM -0400, Alex Gorbachev wrote:
>> > > > > > sysctl vm.swappiness=20 (can probably be 1 as per article)
>> > > > > >
>> > > > > > sysctl vm.min_free_kbytes=262144
>> > > > >
>> > > [...]
>> > > >
>> > > > We have experienced the problem in various guises with kernels 3.14,
>> > > 3.19,
>> > > > 4.1-rc2 and now 4.1, so it's not new to us, just different error
>> stack.
>> > > > Below are some other stack dumps of what manifested as the same
>> error.
>> > > >
>> > > >  [<ffffffff817cf4b9>] schedule+0x29/0x70
>> > > >  [<ffffffffc07caee7>] _xfs_log_force+0x187/0x280 [xfs]
>> > > >  [<ffffffff810a4150>] ? try_to_wake_up+0x2a0/0x2a0
>> > > >  [<ffffffffc07cb019>] xfs_log_force+0x39/0xc0 [xfs]
>> > > >  [<ffffffffc07d6542>] xfsaild_push+0x552/0x5a0 [xfs]
>> > > >  [<ffffffff817d2264>] ? schedule_timeout+0x124/0x210
>> > > >  [<ffffffffc07d662f>] xfsaild+0x9f/0x140 [xfs]
>> > > >  [<ffffffffc07d6590>] ? xfsaild_push+0x5a0/0x5a0 [xfs]
>> > > >  [<ffffffff81095e29>] kthread+0xc9/0xe0
>> > > >  [<ffffffff81095d60>] ? flush_kthread_worker+0x90/0x90
>> > > >  [<ffffffff817d3718>] ret_from_fork+0x58/0x90
>> > > >  [<ffffffff81095d60>] ? flush_kthread_worker+0x90/0x90
>> > > >  INFO: task xfsaild/sdg1:2606 blocked for more than 120 seconds.
>> > > >        Not tainted 3.19.4-031904-generic #201504131440
>> > > >  "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
>> > > message.
>> > >
>> > > That's indicative of IO completion problems, but not a crash.
>> > >
>> > > >  BUG: unable to handle kernel NULL pointer dereference at
>> > >  (null)
>> > > >  IP: [<ffffffffc04be80f>] xfs_count_page_state+0x3f/0x70 [xfs]
>> > > ....
>> > > >   [<ffffffffc04be880>] xfs_vm_releasepage+0x40/0x120 [xfs]
>> > > >   [<ffffffff8118a7d2>] try_to_release_page+0x32/0x50
>> > > >   [<ffffffff8119fe6d>] shrink_page_list+0x69d/0x720
>> > > >   [<ffffffff811a058d>] shrink_inactive_list+0x1dd/0x5d0
>> > > ....
>> > >
>> > > Again, this is indicative of a page cache issue: a page without
>> > > buffers has been passed to xfs_vm_releasepage(), which implies the
>> > > page flags are not correct. i.e PAGE_FLAGS_PRIVATE is set but
>> > > page->private is null...
>> > >
>> > > Again, this is unlikely to be an XFS issue.
>> > >
>> >
>> > Sorry for my ignorance, but would this likely come from Ceph code or a
>> > hardware issue of some kind, such as a disk drive?  I have reached out
>> to
>> > RedHat and Ceph community on that as well.
>>
>> More likely a kernel bug somewhere in the page cache or memory
>> reclaim paths. The issue is that we only notice the problem long
>> after it has occurred. i.e. when XFS goes to tear down the page it has
>> been handed, the page is already in a bad state and so it doesn't
>> really tell us anything about the cause of the problem.
>>
>> Realisticaly, we need a script that reproduces the problem (that
>> doesn't require a Ceph cluster) to be able to isolate the cause.
>> In the mean time, you can always try running  CONFIG_XFS_WARN=y to
>> see if that catches problems earlier, and you might also want to do
>> things like turn on memory poisoning and other kernel debugging
>> options to try to isolate the cause of the issue....
>>
>
> We have been error free for almost 3 weeks now with these changes:
>
> vm.swappiness=1
> vm.min_free_kbytes=262144
>
> I wonder if this is related to us using high speed Areca HBAs with RAM
> writeback cache and having had vm.swappiness=0 previously.  POssibly the
> HBA handing down a large chunk of IO very fast and page cache not being to
> handle it with swappiness=0.  I will keep monitoring, but thank you very
> much for the analysis and info.
>
> Alex
>
>
>
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david at fromorbit.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150813/0a8ea7e0/attachment-0001.html>


More information about the xfs mailing list