xfs
[Top] [All Lists]

Re: XFS crash?

To: Austin Schuh <austin@xxxxxxxxxxxxxxxx>
Subject: Re: XFS crash?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 13 May 2014 19:03:21 +1000
Cc: xfs <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CANGgnMYn++1++UyX+D2d9GxPxtytpQJv0ThFwdxM-yX7xDWqiA@xxxxxxxxxxxxxx>
References: <CANGgnMYPLF+8616Rs9eQOXUc9He2NSgFnNrvHvepV-x+pWS6oQ@xxxxxxxxxxxxxx> <20140305233551.GK6851@dastard> <CANGgnMb=2dYGQO4K36pQ9LEb8E4rT6S_VskLF+n=ndd0_kJr_g@xxxxxxxxxxxxxx> <CANGgnMa80WwQ8zSkL52yYegmQURVQeZiBFv41=FQXMZJ_NaEDw@xxxxxxxxxxxxxx> <20140513034647.GA5421@dastard> <CANGgnMZ0q9uE3NHj2i0SBK1d0vdKLx7QBJeFNb+YwP-5EAmejQ@xxxxxxxxxxxxxx> <20140513063943.GQ26353@dastard> <CANGgnMYn++1++UyX+D2d9GxPxtytpQJv0ThFwdxM-yX7xDWqiA@xxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@xxxxxxxxxxxxxxxx> 
> >> >> wrote:
> >> >> > Hi Dave,
> >> >> >
> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@xxxxxxxxxxxxx> 
> >> >> > wrote:
> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >> >> >>> Howdy,
> >> >> >>>
> >> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 
> >> >> >>> kernel,
> >> >> >>> and I'm seeing a couple lockups and crashes which I think are 
> >> >> >>> related
> >> >> >>> to XFS.
> >> >> >>
> >> >> >> I think they ar emore likely related to RT issues....
> >> >> >>
> >> >> >
> >> >> > That very well may be true.
> >> >> >
> >> >> >> Cheers,
> >> >> >>
> >> >> >> Dave.
> >> >> >> --
> >> >> >> Dave Chinner
> >> >>
> >> >> I had the issue reproduce itself today with just the main SSD
> >> >> installed.  This was on a new machine that was built this morning.
> >> >> There is a lot less going on in this trace than the previous one.
> >> >
> >> > The three blocked threads:
> >> >
> >> >         1. kworker running IO completion waiting on an inode lock,
> >> >            holding locked pages.
> >> >         2. kworker running writeback flusher work waiting for a page lock
> >> >         3. direct flush work waiting for allocation, holding page
> >> >            locks and the inode lock.
> >> >
> >> > What's the kworker thread running the allocation work doing?
> >> >
> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
> >> > information...
> >>
> >> I was able to reproduce the lockup.  I ran `echo w >
> >> /proc/sysrq-trigger` per your suggestion.  I don't know how to figure
> >> out what the kworker thread is doing, but I'll happily do it if you
> >> can give me some guidance.
> >
> > There isn't a worker thread blocked doing an allocation in that
> > dump, so it doesn't shed any light on the problem at all. try
> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
> > and all the processes in the system...
> >
> > Cheers,
> >
> > Dave.
> 
> Attached is the output of the two commands you asked for.

Nothing there. There's lots of processes waiting for allocation to
run, and no kworkers running allocation work. This looks more
like a rt-kernel workqueue issue, not an XFS problem.

FWIW, it woul dbe really helpful if you compiled your kernels with
frame pointers enabled - the stack traces are much more precise and
readable (i.e. gets rid of all the false/stale entrys) and that
helps understanding where things are stuck immensely.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>