[PATCH v2 00/11] xfs: introduce the free inode btree
Michael L. Semon
mlsemon35 at gmail.com
Sun Nov 17 16:43:17 CST 2013
On 11/13/2013 09:36 AM, Brian Foster wrote:
> Hi all,
>
> The free inode btree adds a new inode btree to XFS with the intent to
> track only inode chunks with at least one free inode. Patches 1-3 add
> the necessary support for the new XFS_BTNUM_FINOBT type and introduce a
> read-only v5 superblock flag. Patch 4 updates the transaction
> reservations for inode allocation operations to account for the finobt.
> Patches 5-9 add support to manage the finobt on inode chunk allocation,
> inode allocation, inode free (and chunk deletion) and growfs. Patch 10
> adds support to report finobt status in the fs geometry. Patch 11 adds
> the feature bit to the associated mask. Thoughts, reviews, flames
> appreciated.
>
> Brian
>
> v2:
> - Rebase to latest xfs tree (minor shifting around of some header bits).
> - Added "xfs: report finobt status in fs geometry" patch to series.
Very nice rebase! There might have been a whitespace issue on patch #6
for kernel and xfsprogs, but it was easy going after that.
I'm halfway through testing 4k finobt CRC filesystems on a 2.2-GB, 2-disk
md RAID-0, x86 Pentium 4, 512 MB of RAM. The current nasty setup is
kernel 3.12.0+, less the 5 most recent AIO commits/merges, and me trying
to get in the few not-merged Dave Chinner kernel/xfsprogs patches along
with your patches.
I meant to be done with 4k by now, but generic/224 caused the kernel OOM
killer to halt testing, much like it does in 256 MB RAM without finobt.
No problem: I'll thank Stan in advance for introducing me to the term
O_PONIES.
The rest of this letter is random junk that hasn't been re-tested, to
give a flavor of what might lie ahead. I'm missing a stack trace to the
effect of "Error 117: offline filesystem operation in progress" as
something later than xfstests xfs/296 was running. None of this letter
needs a reply.
Good luck!
Michael
[NOISE FOLLOWS]
***** I don't know if this one is an xfstests issue or an xfsprogs
issue. Something like this also happened in a non-finobt
`./check -g auto`...
xfs/033 [failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/033.out.bad)
--- tests/xfs/033.out 2013-11-11 13:46:22.367412935 -0500
+++ /var/lib/xfstests/results//xfs/033.out.bad 2013-11-17 12:57:28.010382465 -0500
@@ -17,9 +17,10 @@
- process known inodes and perform inode discovery...
bad magic number 0x0 on inode INO
bad version number 0x0 on inode INO
+inode identifier 0 mismatch on inode INO
bad magic number 0x0 on inode INO, resetting magic number
bad version number 0x0 on inode INO, resetting version number
-imap claims a free inode INO is in use, correcting imap and clearing inode
...
(Run 'diff -u tests/xfs/033.out /var/lib/xfstests/results//xfs/033.out.bad' to see the entire diff)
***** The diff for xfs/033:
19a20
> inode identifier 0 mismatch on inode INO
22c23
< imap claims a free inode INO is in use, correcting imap and clearing inode
---
> inode identifier 0 mismatch on inode INO
33,194c34,37
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime bitmap inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to 0
< Wrote X.XXKb (value 0x0)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0x0 on inode INO
< bad version number 0x0 on inode INO
< bad magic number 0x0 on inode INO, resetting magic number
< bad version number 0x0 on inode INO, resetting version number
< imap claims a free inode INO is in use, correcting imap and clearing inode
< cleared realtime summary inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting root inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared root inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< root inode lost
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing root directory
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< resetting inode INO nlinks from 1 to 2
< done
< Corrupting rt bitmap inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime bitmap inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime bitmap inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
< Corrupting rt summary inode - setting bits to -1
< Wrote X.XXKb (value 0xffffffff)
< Phase 1 - find and verify superblock...
< Phase 2 - using <TYPEOF> log
< - zero log...
< - scan filesystem freespace and inode maps...
< - found root inode chunk
< Phase 3 - for each AG...
< - scan and clear agi unlinked lists...
< - process known inodes and perform inode discovery...
< bad magic number 0xffff on inode INO
< bad version number 0xffffffff on inode INO
< bad (negative) size -1 on inode INO
< bad magic number 0xffff on inode INO, resetting magic number
< bad version number 0xffffffff on inode INO, resetting version number
< bad (negative) size -1 on inode INO
< cleared realtime summary inode INO
< - process newly discovered inodes...
< Phase 4 - check for duplicate blocks...
< - setting up duplicate extent list...
< - check for inodes claiming duplicate blocks...
< Phase 5 - rebuild AG headers and trees...
< - reset superblock...
< Phase 6 - check inode connectivity...
< reinitializing realtime summary inode
< - resetting contents of realtime bitmap and summary inodes
< - traversing filesystem ...
< - traversal finished ...
< - moving disconnected inodes to lost+found ...
< Phase 7 - verify and correct link counts...
< done
---
> xfs_imap_to_bp: xfs_trans_read_buf() returned error 117.
>
> fatal error -- could not iget root inode -- error - 117
> _check_xfs_filesystem: filesystem on /dev/md126 is inconsistent (r) (see /var/lib/xfstests/results//xfs/033.full)
***** This is the lone segfault so far:
xfs/291 [12832.846621] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[12832.846621] Use of these features in this kernel is at your own risk!
[12832.872608] XFS (md126): Mounting Filesystem
[12833.063779] XFS (md126): Ending clean mount
[13153.675046] XFS (md126): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13153.675046] Use of these features in this kernel is at your own risk!
[13153.694128] XFS (md126): Mounting Filesystem
[13154.105167] XFS (md126): Ending clean mount
[13201.470358] xfs_db[17902]: segfault at 9c157f8 ip 0809b6b0 sp bfe97950 error 4 in xfs_db[8048000+90000]
[failed, exit status 1] - output mismatch (see /var/lib/xfstests/results//xfs/291.out.bad)
--- tests/xfs/291.out 2013-11-11 13:46:26.652264785 -0500
+++ /var/lib/xfstests/results//xfs/291.out.bad 2013-11-17 16:28:05.133832908 -0500
@@ -1 +1,11 @@
QA output created by 291
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+xfs_dir3_data_read_verify: XFS_CORRUPTION_ERROR
+__read_verify: XFS_CORRUPTION_ERROR
...
(Run 'diff -u tests/xfs/291.out /var/lib/xfstests/results//xfs/291.out.bad' to see the entire diff)
[13202.293470] XFS (md127): Version 5 superblock detected. This kernel has EXPERIMENTAL support enabled!
[13202.293470] Use of these features in this kernel is at your own risk!
[13202.309944] XFS (md127): Mounting Filesystem
[13202.587663] XFS (md127): Ending clean mount
***** I might not have seen this lockdep splat yet, but this
is a new merge window. This splat is repeatable and may be
independent of finobt.
xfs/078 [87803.635893]
======================================================
[ INFO: possible circular locking dependency detected ]
3.12.0+ #2 Not tainted
-------------------------------------------------------
xfs_repair/12944 is trying to acquire lock:
(timekeeper_seq){------}, at: [<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
but task is already holding lock:
(hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #5 (hrtimer_bases.lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
[<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
[<c1055b01>] start_bandwidth_timer+0x60/0x6f
[<c105b1c2>] enqueue_task_rt+0xd3/0xfd
[<c10546aa>] enqueue_task+0x45/0x60
[<c1055813>] __sched_setscheduler+0x243/0x372
[<c1056a21>] sched_setscheduler+0x17/0x19
[<c108ae53>] watchdog_enable+0x69/0x7d
[<c1053063>] smpboot_thread_fn+0x93/0x130
[<c104c4ab>] kthread+0xb3/0xc7
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #4 (&rt_b->rt_runtime_lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c105b1ac>] enqueue_task_rt+0xbd/0xfd
[<c10546aa>] enqueue_task+0x45/0x60
[<c1055813>] __sched_setscheduler+0x243/0x372
[<c1056a21>] sched_setscheduler+0x17/0x19
[<c108ae53>] watchdog_enable+0x69/0x7d
[<c1053063>] smpboot_thread_fn+0x93/0x130
[<c104c4ab>] kthread+0xb3/0xc7
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #3 (&rq->lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c10561da>] wake_up_new_task+0x3b/0x147
[<c102d132>] do_fork+0x116/0x305
[<c102d34e>] kernel_thread+0x2d/0x33
[<c161f0b2>] rest_init+0x22/0x128
[<c19f39da>] start_kernel+0x2df/0x2e5
[<c19f3378>] i386_start_kernel+0x12e/0x131
-> #2 (&p->pi_lock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162d072>] _raw_spin_lock_irqsave+0x4a/0x7a
[<c1055e1a>] try_to_wake_up+0x23/0x138
[<c1055f60>] wake_up_process+0x1f/0x33
[<c104411c>] start_worker+0x25/0x28
[<c10451cc>] create_and_start_worker+0x37/0x5d
[<c1a03b34>] init_workqueues+0xd4/0x2c4
[<c19f3a99>] do_one_initcall+0xb9/0x153
[<c19f3b7e>] kernel_init_freeable+0x4b/0x17d
[<c161f1c8>] kernel_init+0x10/0xf2
[<c162e4b7>] ret_from_kernel_thread+0x1b/0x28
-> #1 (&(&pool->lock)->rlock){-.-.-.}:
[<c106577c>] lock_acquire+0x7f/0x15e
[<c162cffb>] _raw_spin_lock+0x41/0x6e
[<c104575a>] __queue_work+0x12b/0x393
[<c1045c26>] queue_work_on+0x2f/0x6a
[<c104f4b6>] clock_was_set_delayed+0x1d/0x1f
[<c1075a67>] do_adjtimex+0xf4/0x145
[<c1030ce0>] SYSC_adjtimex+0x30/0x62
[<c1030f67>] SyS_adjtimex+0x10/0x12
[<c162e53f>] sysenter_do_call+0x12/0x36
-> #0 (timekeeper_seq){------}:
[<c10648b9>] __lock_acquire+0x13a4/0x17ac
[<c106577c>] lock_acquire+0x7f/0x15e
[<c1073688>] ktime_get+0x4f/0x169
[<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
[<c104faff>] hrtimer_start_range_ns+0x26/0x2c
[<c104b17f>] common_timer_set+0xf5/0x164
[<c104bd58>] SyS_timer_settime+0xbe/0x183
[<c162dcc8>] syscall_call+0x7/0xb
other info that might help us debug this:
Chain exists of:
timekeeper_seq --> &rt_b->rt_runtime_lock --> hrtimer_bases.lock
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(hrtimer_bases.lock);
lock(&rt_b->rt_runtime_lock);
lock(hrtimer_bases.lock);
lock(timekeeper_seq);
*** DEADLOCK ***
2 locks held by xfs_repair/12944:
#0: (&(&new_timer->it_lock)->rlock){......}, at: [<c104b292>] __lock_timer+0xa4/0x1af
#1: (hrtimer_bases.lock){-.-.-.}, at: [<c104f7a4>] __hrtimer_start_range_ns+0x28/0x35d
stack backtrace:
CPU: 0 PID: 12944 Comm: xfs_repair Not tainted 3.12.0+ #2
Hardware name: Dell Computer Corporation Dimension 2350/07W080, BIOS A01 12/17/2002
c1cb2e70 c1cb2e70 deb01dd8 c162748d deb01df8 c162303c c17a3306 deb01e3c
deaad0c0 deaad550 deaad550 00000002 deb01e6c c10648b9 deaad528 0000006f
c106269b deb01e20 c1c8bd08 00000003 00000000 0000000e 00000002 00000001
Call Trace:
[<c162748d>] dump_stack+0x16/0x18
[<c162303c>] print_circular_bug+0x1b8/0x1c2
[<c10648b9>] __lock_acquire+0x13a4/0x17ac
[<c106269b>] ? trace_hardirqs_off+0xb/0xd
[<c106577c>] lock_acquire+0x7f/0x15e
[<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
[<c1073688>] ktime_get+0x4f/0x169
[<c104f843>] ? __hrtimer_start_range_ns+0xc7/0x35d
[<c162d098>] ? _raw_spin_lock_irqsave+0x70/0x7a
[<c104f7a4>] ? __hrtimer_start_range_ns+0x28/0x35d
[<c104f843>] __hrtimer_start_range_ns+0xc7/0x35d
[<c104faff>] hrtimer_start_range_ns+0x26/0x2c
[<c104b17f>] common_timer_set+0xf5/0x164
[<c104b08a>] ? __posix_timers_find+0xa7/0xa7
[<c104bd58>] SyS_timer_settime+0xbe/0x183
[<c162dcc8>] syscall_call+0x7/0xb
More information about the xfs
mailing list