xfs
[Top] [All Lists]

Re: crash in xfs in current

To: xfs@xxxxxxxxxxx
Subject: Re: crash in xfs in current
From: Eric Sandeen <sandeen@xxxxxxxxxxx>
Date: Mon, 6 Jun 2016 23:39:59 -0500
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CAAA5faG3Ls0Dh_bx=950db9BV01zoLfmubKbM0UYkWpS0y60BA@xxxxxxxxxxxxxx>
References: <CAAA5faG3Ls0Dh_bx=950db9BV01zoLfmubKbM0UYkWpS0y60BA@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.11; rv:45.0) Gecko/20100101 Thunderbird/45.1.1
On 6/6/16 8:43 PM, Reinoud Koornstra wrote:
> Dear Dave and Everyone,
> 
> Today I crashed twice in a row in 4.7-rc1.
> This was the message and trace:
> 
> Jun  6 18:07:34 router-dev kernel: [  134.297442] XFS: Assertion failed: 
> args->op_flags & XFS_DA_OP_OKNOENT, file: fs/xfs/libxfs/xfs_dir2_leaf.c, 
> line: 1307

Ok, that ASSERT has been there since 2005 ...

> Jun  6 18:07:34 router-dev kernel: [  134.297459] ------------[ cut here 
> ]------------
> Jun  6 18:07:34 router-dev kernel: [  134.297474] kernel BUG at 
> fs/xfs/xfs_message.c:113!
> Jun  6 18:07:34 router-dev kernel: [  134.297485] invalid opcode: 0000 [#1] 
> SMP
> Jun  6 18:07:34 router-dev kernel: [  134.297494] Modules linked in:

... unlinewrapping...

> Jun  6 18:07:34 router-dev kernel: [  134.297962] Stack:
> Jun  6 18:07:34 router-dev kernel: [  134.297967]  ffff8803f7ebbc40 
> ffffffffc0322277 ffff8803f7ebbc58 00000000ffffffff
> Jun  6 18:07:34 router-dev kernel: [  134.297986]  ffff8803ffffffff 
> ffff880414673840 ffff8803f610b000 0000000000000000
> Jun  6 18:07:34 router-dev kernel: [  134.298004]  ffff880452bed380 
> 0000000000000000 ffff003c00a7d2f1 00000000b56d2e47
> Jun  6 18:07:34 router-dev kernel: [  134.298022] Call Trace:
> Jun  6 18:07:34 router-dev kernel: [  134.298039] [<ffffffffc0322277>] 
> xfs_dir2_leaf_lookup_int+0x237/0x350 [xfs]
> Jun  6 18:07:34 router-dev kernel: [  134.298064] [<ffffffffc03229e1>] 
> xfs_dir2_leaf_replace+0x41/0x190 [xfs]
> Jun  6 18:07:34 router-dev kernel: [  134.298088] [<ffffffffc031c06c>] 
> xfs_dir_replace+0x18c/0x1b0 [xfs]
> Jun  6 18:07:34 router-dev kernel: [  134.298114] [<ffffffffc035653f>] 
> xfs_rename+0x45f/0x9d0 [xfs]
> Jun  6 18:07:34 router-dev kernel: [  134.298155] [<ffffffffc034f892>] 
> xfs_vn_rename+0xb2/0xe0 [xfs]
> Jun  6 18:07:34 router-dev kernel: [  134.298169] [<ffffffff8122e824>] 
> vfs_rename+0x5a4/0x940
> Jun  6 18:07:34 router-dev kernel: [  134.298195] [<ffffffff81233475>] 
> SyS_rename+0x3d5/0x3f0
> Jun  6 18:07:34 router-dev kernel: [  134.298207] [<ffffffff819cd236>] 
> entry_SYSCALL_64_fastpath+0x1e/0xa8
> Jun  6 18:07:34 router-dev kernel: [  134.298221] Code: 00 66 2e 0f 1f


> I seem to be able to hit this bug rather frequently.
> So I put in some instrumentation to print out the flags next time I
> hit it and save a core file to another fs.
> Is this is known bug to you or?

xfs_dir2_leaf_lookup_int() only hits that ASSERT if it was given
a name to rename, and failed to find the original.  i.e. that should
not happen.

        /*
         * Loop over all the entries with the right hash value
         * looking to match the name.
         */

<do that loop>
<fail to find the hash value for the name>
<then:>

        ASSERT(args->op_flags & XFS_DA_OP_OKNOENT);
        /*
         * Here, we can only be doing a lookup (not a rename or remove).
         * If a case-insensitive match was found earlier, re-read the
         * appropriate data block if required and return it.
         */

A rename should never fail to find the original name.

Did this problem only show up after an update?

Do you have a reproducer?

Have you unmounted and run an "xfs_repair -n" and captured the output
to see if there is any on-disk corruption?

You might gather an xfs_metadump as well, before you do any live
repair that might change the filesystem.

-Eric

<Prev in Thread] Current Thread [Next in Thread>