xfs
[Top] [All Lists]

Re: XFS crash?

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS crash?
From: Austin Schuh <austin@xxxxxxxxxxxxxxxx>
Date: Mon, 23 Jun 2014 13:05:54 -0700
Cc: xfs <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CANGgnMZqQc_NeaDpO_aX+bndmHrQ9VWo9mkfxhPBkRD-J=N6sQ@xxxxxxxxxxxxxx>
References: <CANGgnMYPLF+8616Rs9eQOXUc9He2NSgFnNrvHvepV-x+pWS6oQ@xxxxxxxxxxxxxx> <20140305233551.GK6851@dastard> <CANGgnMb=2dYGQO4K36pQ9LEb8E4rT6S_VskLF+n=ndd0_kJr_g@xxxxxxxxxxxxxx> <CANGgnMa80WwQ8zSkL52yYegmQURVQeZiBFv41=FQXMZJ_NaEDw@xxxxxxxxxxxxxx> <20140513034647.GA5421@dastard> <CANGgnMZ0q9uE3NHj2i0SBK1d0vdKLx7QBJeFNb+YwP-5EAmejQ@xxxxxxxxxxxxxx> <20140513063943.GQ26353@dastard> <CANGgnMYn++1++UyX+D2d9GxPxtytpQJv0ThFwdxM-yX7xDWqiA@xxxxxxxxxxxxxx> <20140513090321.GR26353@dastard> <CANGgnMZqQc_NeaDpO_aX+bndmHrQ9VWo9mkfxhPBkRD-J=N6sQ@xxxxxxxxxxxxxx>
Hi Dave,

On Tue, May 13, 2014 at 10:11 AM, Austin Schuh <austin@xxxxxxxxxxxxxxxx> wrote:
On Tue, May 13, 2014 at 2:03 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Tue, May 13, 2014 at 12:02:18AM -0700, Austin Schuh wrote:
>> On Mon, May 12, 2014 at 11:39 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> > On Mon, May 12, 2014 at 09:03:48PM -0700, Austin Schuh wrote:
>> >> On Mon, May 12, 2014 at 8:46 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> >> > On Mon, May 12, 2014 at 06:29:28PM -0700, Austin Schuh wrote:
>> >> >> On Wed, Mar 5, 2014 at 4:53 PM, Austin Schuh <austin@xxxxxxxxxxxxxxxx> wrote:
>> >> >> > Hi Dave,
>> >> >> >
>> >> >> > On Wed, Mar 5, 2014 at 3:35 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
>> >> >> >> On Wed, Mar 05, 2014 at 03:08:16PM -0800, Austin Schuh wrote:
>> >> >> >>> Howdy,
>> >> >> >>>
>> >> >> >>> I'm running a config_preempt_rt patched version of the 3.10.11 kernel,
>> >> >> >>> and I'm seeing a couple lockups and crashes which I think are related
>> >> >> >>> to XFS.
>> >> >> >>
>> >> >> >> I think they ar emore likely related to RT issues....
>> >> >> >>
>> >> >> >
>> >> >> > That very well may be true.
>> >> >> >
>> >> >> >> Cheers,
>> >> >> >>
>> >> >> >> Dave.
>> >> >> >> --
>> >> >> >> Dave Chinner
>> >> >>
>> >> >> I had the issue reproduce itself today with just the main SSD
>> >> >> installed. ÂThis was on a new machine that was built this morning.
>> >> >> There is a lot less going on in this trace than the previous one.
>> >> >
>> >> > The three blocked threads:
>> >> >
>> >> > Â Â Â Â 1. kworker running IO completion waiting on an inode lock,
>> >> > Â Â Â Â Â Âholding locked pages.
>> >> > Â Â Â Â 2. kworker running writeback flusher work waiting for a page lock
>> >> > Â Â Â Â 3. direct flush work waiting for allocation, holding page
>> >> > Â Â Â Â Â Âlocks and the inode lock.
>> >> >
>> >> > What's the kworker thread running the allocation work doing?
>> >> >
>> >> > You might need to run `echo w > proc-sysrq-trigger` to get this
>> >> > information...
>> >>
>> >> I was able to reproduce the lockup. ÂI ran `echo w >
>> >> /proc/sysrq-trigger` per your suggestion. ÂI don't know how to figure
>> >> out what the kworker thread is doing, but I'll happily do it if you
>> >> can give me some guidance.
>> >
>> > There isn't a worker thread blocked doing an allocation in that
>> > dump, so it doesn't shed any light on the problem at all. try
>> > `echo l > /proc/sysrq-trigger`, followed by `echo t >
>> > /proc/sysrq-trigger` so we can see all the processes running on CPUs
>> > and all the processes in the system...
>> >
>> > Cheers,
>> >
>> > Dave.
>>
>> Attached is the output of the two commands you asked for.
>
> Nothing there. There's lots of processes waiting for allocation to
> run, and no kworkers running allocation work. This looks more
> like a rt-kernel workqueue issue, not an XFS problem.
>
> FWIW, it woul dbe really helpful if you compiled your kernels with
> frame pointers enabled - the stack traces are much more precise and
> readable (i.e. gets rid of all the false/stale entrys) and that
> helps understanding where things are stuck immensely.
>
> Cheers,
>
> Dave.

Thanks Dave.

I'll go check with the rt-kernel guys and take it from there. ÂThanks
for the frame pointers suggestion. ÂI'll make that change the next
time I build a kernel.

Austin

I found 1 bug in XFS which I fixed, and I've uncovered something else that I'm not completely sure how to fix.

InÂxfs_bmapi_allocate, you create a completion, and use that to wait until the work has finished. ÂOccasionally, I'm seeing a case where I get a context switch after the completion has been completed, but before the workqueue has finished doing it's internal book-keeping. ÂThis results in the work being deleted before the workqueue is done using it, corrupting the internal data structures. ÂI fixed it by waiting usingÂflush_work and removing the completion entirely.

--- a/fs/xfs/xfs_bmap_util.c    Â2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.c   Â2014-06-23 12:59:14.116678239 -0700
@@ -263,7 +263,6 @@
    current_set_flags_nested(&pflags, PF_FSTRANS);
Â
    args->result = __xfs_bmapi_allocate(args);
- Â Â Â complete(args->done);
Â
    current_restore_flags_nested(&pflags, PF_FSTRANS);
Â}
@@ -277,16 +276,13 @@
Âxfs_bmapi_allocate(
    struct xfs_bmalloca   *args)
Â{
- Â Â Â DECLARE_COMPLETION_ONSTACK(done);
-
    if (!args->stack_switch)
        return __xfs_bmapi_allocate(args);
Â
Â
- Â Â Â args->done = &done;
    INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
    queue_work(xfs_alloc_wq, &args->work);
- Â Â Â wait_for_completion(&done);
+ Â Â Â flush_work(&args->work);
    destroy_work_on_stack(&args->work);
    return args->result;
Â}
--- a/fs/xfs/xfs_bmap_util.h    Â2014-06-23 12:59:10.008678410 -0700
+++ b/fs/xfs/xfs_bmap_util.h   Â2014-06-23 12:59:11.708678340 -0700
@@ -57,7 +57,6 @@
    char          Âconv;  /* overwriting unwritten extents */
    char          Âstack_switch;
    int           flags;
-    struct completion    *done;
    struct work_struct   Âwork;
    int           result;
Â};

I enabled event tracing (and added a new event which lists the number of workers running in a queue whenever that is changed).

To me, it looks like work is scheduled from irq/44-ahci-273 that will acquire an inode lock. Âscp-3986 then acquires the lock, and then goes and schedules work. ÂThat work is then scheduled behind the work from irq/44-ahci-273 in the same pool. ÂThe first work blocks waiting on the lock, and scp-3986 won't finish and release that lock until the second work gets run.

Any ideas on how to deal with this? ÂI think we need to create a new pool to make sure that xfs_bmapi_allocate_worker gets run in a separate thread to avoid this.

  Âirq/44-ahci-273  [000] ....1.5  Â76.340300: workqueue_queue_work: work struct=ffff880406a6b998 function=xfs_end_io workqueue=ffff88040af82000 pool=ffff88042da63fc0 req_cpu=512 cpu=0
  Âirq/44-ahci-273  [000] ....1.5  Â76.340301: workqueue_activate_work: work struct ffff880406a6b998

      Âscp-3986 Â[000] ....1..  Â76.342711: xfs_ilock_nowait: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller xfs_map_blocks
      Âscp-3986 Â[000] ....1..  Â76.342714: xfs_iunlock: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_SHARED caller xfs_map_blocks
      Âscp-3986 Â[000] ....1..  Â76.342722: xfs_ilock: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_iomap_write_allocate
      Âscp-3986 Â[000] ....1.3  Â76.342729: workqueue_queue_work: work struct=ffff8800a5b8d900 function=xfs_bmapi_allocate_worker workqueue=ffff88040bb36000 pool=ffff88042da63fc0 req_cpu=512 cpu=0
      Âscp-3986 Â[000] ....1.3  Â76.342730: workqueue_activate_work: work struct ffff8800a5b8d900
      Âscp-3986 Â[000] ....1.4  Â76.342754: workqueue_queue_work: work struct=ffff88040a6a01c8 function=blk_delay_work workqueue=ffff88040c9f4a00 pool=ffff88042da644c0 req_cpu=512 cpu=0
      Âscp-3986 Â[000] ....1.4  Â76.342755: workqueue_activate_work: work struct ffff88040a6a01c8

  Âkworker/0:2-757  [000] ....1.1  Â76.342868: workqueue_nr_running: pool=ffff88042da63fc0 nr_running=1
  Âkworker/0:2-757  [000] ....1..  Â76.342869: workqueue_execute_start: work struct ffff880406a6b998: function xfs_end_io
  Âkworker/0:2-757  [000] ....1..  Â76.342870: xfs_ilock: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

  Âirq/44-ahci-273  [003] ....1.5  Â76.419037: workqueue_queue_work: work struct=ffff8800aebc3588 function=xfs_end_io workqueue=ffff88040af82000 pool=ffff88042dbe3fc0 req_cpu=512 cpu=3
  Âirq/44-ahci-273  [003] ....1.5  Â76.419038: workqueue_activate_work: work struct ffff8800aebc3588

  Âkworker/3:1-119  [003] ....1.1  Â76.419105: workqueue_nr_running: pool=ffff88042dbe3fc0 nr_running=1
  Âkworker/3:1-119  [003] ....1..  Â76.419106: workqueue_execute_start: work struct ffff8800aebc3588: function xfs_end_io
  Âkworker/3:1-119  [003] ....1..  Â76.419107: xfs_ilock: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

  Âirq/44-ahci-273  [002] ....1.5  Â76.426120: workqueue_queue_work: work struct=ffff880406a6b930 function=xfs_end_io workqueue=ffff88040af82000 pool=ffff88042db63fc0 req_cpu=512 cpu=2
  Âirq/44-ahci-273  [002] .N..1.5  Â76.426127: workqueue_activate_work: work struct ffff880406a6b930
  Âkworker/2:1-72  Â[002] ....1.1  Â76.426242: workqueue_nr_running: pool=ffff88042db63fc0 nr_running=1
  Âkworker/2:1-72  Â[002] ....1..  Â76.426243: workqueue_execute_start: work struct ffff880406a6b930: function xfs_end_io
  Âkworker/2:1-72  Â[002] ....1..  Â76.426244: xfs_ilock: dev 8:5 ino 0x9794 lock_addr ffff880409fe0090 flags ILOCK_EXCL caller xfs_setfilesize

Thanks,
 Austin
<Prev in Thread] Current Thread [Next in Thread>