[Top] [All Lists]

Re: deadlock with latest xfs

To: Nick Piggin <nickpiggin@xxxxxxxxxxxx>
Subject: Re: deadlock with latest xfs
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 28 Oct 2008 17:25:24 +1100
Cc: Lachlan McIlroy <lachlan@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>, linux-mm@xxxxxxxxx
In-reply-to: <200810281702.17135.nickpiggin@xxxxxxxxxxxx>
Mail-followup-to: Nick Piggin <nickpiggin@xxxxxxxxxxxx>, Lachlan McIlroy <lachlan@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs-oss <xfs@xxxxxxxxxxx>, linux-mm@xxxxxxxxx
References: <4900412A.2050802@xxxxxxx> <20081026005351.GK18495@disturbed> <20081026025013.GL18495@disturbed> <200810281702.17135.nickpiggin@xxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Tue, Oct 28, 2008 at 05:02:16PM +1100, Nick Piggin wrote:
> On Sunday 26 October 2008 13:50, Dave Chinner wrote:
> > [1] I don't see how any of the XFS changes we made make this easier to hit.
> > What I suspect is a VM regression w.r.t. memory reclaim because this is
> > the second problem since 2.6.26 that appears to be a result of memory
> > allocation failures in places that we've never, ever seen failures before.
> >
> > The other new failure is this one:
> >
> > http://bugzilla.kernel.org/show_bug.cgi?id=11805
> >
> > which is an alloc_pages(GFP_KERNEL) failure....
> >
> > mm-folk - care to weight in?
> order-0 alloc page GFP_KERNEL can fail sometimes. If it is called
> from reclaim or PF_MEMALLOC thread; if it is OOM-killed; fault
> injection.
> This is even the case for __GFP_NOFAIL allocations (which basically
> are buggy anyway).
> Not sure why it might have started happening, but I didn't see
> exactly which alloc_pages you are talking about? If it is via slab,
> then maybe some parameters have changed (eg. in SLUB) which is
> using higher order allocations.

In fs/xfs/linux-2.6/xfs_buf.c::xfs_buf_get_noaddr(). It's doing a
single page allocation at a time.

It may be that this failure is caused by an increase base memory
consumption of the kernel as this failure was reported in an lguest
and reproduced with a simple 'modprobe xfs ; mount /dev/xxx
/mnt/xfs' command. Maybe the lguest had very little memory available
to begin with and trying to allocate 2MB of pages for 8x256k log
buffers may have been too much for it...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>