XFS crash?
Austin Schuh
austin at peloton-tech.com
Mon Jun 23 15:05:54 CDT 2014
Hi Dave,
On Tue, May 13, 2014 at 10:11 AM, Austin Schuh <austin at peloton-tech.com>
wrote:
> On Tue, May 13, 2014 at 2:03 AM, Dave Chinner <david at fromorbit.com> wrote:
> > On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
> >> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david at fromorbit.com>
> wrote:
> >> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
> >> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david at fromorbit.com>
> wrote:
> >> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
> >> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <
> austin at peloton-tech.com> wrote:
> >> >> >> > Hi Dave,
> >> >> >> >
> >> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <
> david at fromorbit.com> wrote:
> >> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
> >> >> >> >>> Howdy,
> >> >> >> >>>
> >> >> >> >>> I'm running a config_preempt_rt patched version of the
> 3.10.11 kernel,
> >> >> >> >>> and I'm seeing a couple lockups and crashes which I think are
> related
> >> >> >> >>> to XFS.
> >> >> >> >>
> >> >> >> >> I think they ar emore likely related to RT issues....
> >> >> >> >>
> >> >> >> >
> >> >> >> > That very well may be true.
> >> >> >> >
> >> >> >> >> Cheers,
> >> >> >> >>
> >> >> >> >> Dave.
> >> >> >> >> --
> >> >> >> >> Dave Chinner
> >> >> >>
> >> >> >> I had the issue reproduce itself today with just the main SSD
> >> >> >> installed. This was on a new machine that was built this morning.
> >> >> >> There is a lot less going on in this trace than the previous one.
> >> >> >
> >> >> > The three blocked threads:
> >> >> >
> >> >> > 1. kworker running IO completion waiting on an inode lock,
> >> >> > holding locked pages.
> >> >> > 2. kworker running writeback flusher work waiting for a
> page lock
> >> >> > 3. direct flush work waiting for allocation, holding page
> >> >> > locks and the inode lock.
> >> >> >
> >> >> > What's the kworker thread running the allocation work doing?
> >> >> >
> >> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
> >> >> > information...
> >> >>
> >> >> I was able to reproduce the lockup. I ran `echo w >
> >> >> /proc/sysrq-trigger` per your suggestion. I don't know how to figure
> >> >> out what the kworker thread is doing, but I'll happily do it if you
> >> >> can give me some guidance.
> >> >
> >> > There isn't a worker thread blocked doing an allocation in that
> >> > dump, so it doesn't shed any light on the problem at all. try
> >> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
> >> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
> >> > and all the processes in the system...
> >> >
> >> > Cheers,
> >> >
> >> > Dave.
> >>
> >> Attached is the output of the two commands you asked for.
> >
> > Nothing there. There's lots of processes waiting for allocation to
> > run, and no kworkers running allocation work. This looks more
> > like a rt-kernel workqueue issue, not an XFS problem.
> >
> > FWIW, it woul dbe really helpful if you compiled your kernels with
> > frame pointers enabled - the stack traces are much more precise and
> > readable (i.e. gets rid of all the false/stale entrys) and that
> > helps understanding where things are stuck immensely.
> >
> > Cheers,
> >
> > Dave.
>
> Thanks Dave.
>
> I'll go check with the rt-kernel guys and take it from there. Thanks
> for the frame pointers suggestion. I'll make that change the next
> time I build a kernel.
>
> Austin
>
I found 1 bug in XFS which I fixed, and I've uncovered something else that
I'm not completely sure how to fix.
In xfs_bmapi_allocate, you create a completion, and use that to wait until
the work has finished. Occasionally, I'm seeing a case where I get a
context switch after the completion has been completed, but before the
workqueue has finished doing it's internal book-keeping. This results in
the work being deleted before the workqueue is done using it, corrupting
the internal data structures. I fixed it by waiting using flush_work and
removing the completion entirely.
--- a/fs/xfs/xfs_bmap_util.c 2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.c 2014-06-23 12:59:14.116678239 -0700
@@ -263,7 +263,6 @@
current_set_flags_nested(&pflags, PF_FSTRANS);
args->result = __xfs_bmapi_allocate(args);
- complete(args->done);
current_restore_flags_nested(&pflags, PF_FSTRANS);
}
@@ -277,16 +276,13 @@
xfs_bmapi_allocate(
struct xfs_bmalloca *args)
{
- DECLARE_COMPLETION_ONSTACK(done);
-
if (!args->stack_switch)
return __xfs_bmapi_allocate(args);
- args->done = &done;
INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
queue_work(xfs_alloc_wq, &args->work);
- wait_for_completion(&done);
+ flush_work(&args->work);
destroy_work_on_stack(&args->work);
return args->result;
}
--- a/fs/xfs/xfs_bmap_util.h 2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.h 2014-06-23 12:59:11.708678340 -0700
@@ -57,7 +57,6 @@
char conv; /* overwriting unwritten extents */
char stack_switch;
int flags;
- struct completion *done;
struct work_struct work;
int result;
};
I enabled event tracing (and added a new event which lists the number of
workers running in a queue whenever that is changed).
To me, it looks like work is scheduled from irq/44-ahci-273 that will
acquire an inode lock. scp-3986 then acquires the lock, and then goes and
schedules work. That work is then scheduled behind the work from
irq/44-ahci-273 in the same pool. The first work blocks waiting on the
lock, and scp-3986 won't finish and release that lock until the second work
gets run.
Any ideas on how to deal with this? I think we need to create a new pool
to make sure that xfs_bmapi_allocate_worker gets run in a separate thread
to avoid this.
irq/44-ahci-273 [000] ....1.5 76.340300: workqueue_queue_work:
work struct=ffff880406a6b998 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042da63fc0 req_cpu=512 cpu=0
irq/44-ahci-273 [000] ....1.5 76.340301: workqueue_activate_work:
work struct ffff880406a6b998
scp-3986 [000] ....1.. 76.342711: xfs_ilock_nowait: dev
8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller
xfs_map_blocks
scp-3986 [000] ....1.. 76.342714: xfs_iunlock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller xfs_map_blocks
scp-3986 [000] ....1.. 76.342722: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller
xfs_iomap_write_allocate
scp-3986 [000] ....1.3 76.342729: workqueue_queue_work:
work struct=ffff8800a5b8d900 function=xfs_bmapi_allocate_worker
workqueue=ffff88040bb36000 pool=ffff88042da63fc0 req_cpu=512 cpu=0
scp-3986 [000] ....1.3 76.342730: workqueue_activate_work:
work struct ffff8800a5b8d900
scp-3986 [000] ....1.4 76.342754: workqueue_queue_work:
work struct=ffff88040a6a01c8 function=blk_delay_work
workqueue=ffff88040c9f4a00 pool=ffff88042da644c0 req_cpu=512 cpu=0
scp-3986 [000] ....1.4 76.342755: workqueue_activate_work:
work struct ffff88040a6a01c8
kworker/0:2-757 [000] ....1.1 76.342868: workqueue_nr_running:
pool=ffff88042da63fc0 nr_running=1
kworker/0:2-757 [000] ....1.. 76.342869: workqueue_execute_start:
work struct ffff880406a6b998: function xfs_end_io
kworker/0:2-757 [000] ....1.. 76.342870: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize
irq/44-ahci-273 [003] ....1.5 76.419037: workqueue_queue_work:
work struct=ffff8800aebc3588 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042dbe3fc0 req_cpu=512 cpu=3
irq/44-ahci-273 [003] ....1.5 76.419038: workqueue_activate_work:
work struct ffff8800aebc3588
kworker/3:1-119 [003] ....1.1 76.419105: workqueue_nr_running:
pool=ffff88042dbe3fc0 nr_running=1
kworker/3:1-119 [003] ....1.. 76.419106: workqueue_execute_start:
work struct ffff8800aebc3588: function xfs_end_io
kworker/3:1-119 [003] ....1.. 76.419107: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize
irq/44-ahci-273 [002] ....1.5 76.426120: workqueue_queue_work:
work struct=ffff880406a6b930 function=xfs_end_io workqueue=ffff88040af82000
pool=ffff88042db63fc0 req_cpu=512 cpu=2
irq/44-ahci-273 [002] .N..1.5 76.426127: workqueue_activate_work:
work struct ffff880406a6b930
kworker/2:1-72 [002] ....1.1 76.426242: workqueue_nr_running:
pool=ffff88042db63fc0 nr_running=1
kworker/2:1-72 [002] ....1.. 76.426243: workqueue_execute_start:
work struct ffff880406a6b930: function xfs_end_io
kworker/2:1-72 [002] ....1.. 76.426244: xfs_ilock: dev 8:5 ino
0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize
Thanks,
Austin
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20140623/f0c0a549/attachment.html>
More information about the xfs
mailing list