status of userspace release
Ben Myers
bpm at sgi.com
Fri Nov 2 13:59:23 CDT 2012
Hi Dave,
On Fri, Nov 02, 2012 at 04:51:02PM +1100, Dave Chinner wrote:
> On Thu, Oct 25, 2012 at 10:15:01AM -0500, Ben Myers wrote:
> > Hi Folks,
> >
> > We're working toward a userspace release this month. There are several patches
> > that need to go in first, including backing out the xfsdump format version bump
> > from Eric, fixes for the makefiles from Mike, and the Polish language update
> > for xfsdump from Jakub. If anyone knows of something else we need, now is the
> > time to flame about it. I will take a look around for other important patches
> > too.
> >
> > This time I'm going to tag an -rc1 (probably later today or tomorrow). We'll
> > give everyone a few working days to do a final test and/or pipe up if we have
> > missed something important. Then if all goes well we'll cut the release next
> > Tuesday.
>
> I think that dump/restore need more work/testing.
Sounds good. AFAIK there is no blazing hurry to release immediately.
> I've already pointed Eric to the header checksum failures (forkoff
> patch being needed), and that fixes the failures I've been seeing on
> normal xfstests runs.
I've pulled that patch in. Interesting that it doesn't reproduce on i586 but
is so reliable on x86_64. It's a good excuse to do some testing on a wider set
of arches before the release.
> Running some large filesystem testing, however, I see more problems.
> I'm using a 17TB filesytsem and the --largefs patch series. This
> results in a futex hang in 059 like so:
>
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5504 3926 3487 0x00000000
> [ 4770.007858] ffff880212ea9c68 0000000000000082 ffff880207830140 ffff880212ea9fd8
> [ 4770.007858] ffff880212ea9fd8 ffff880212ea9fd8 ffff880216cec2c0 ffff880207830140
> [ 4770.007858] ffff880212ea9d08 ffff880212ea9d58 ffff880207830140 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff8113acf7>] ? __free_pages+0x47/0x70
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc52d40 5656 3927 3487 0x00000000
> [ 4770.007858] ffff880208f29c68 0000000000000082 ffff880208f84180 ffff880208f29fd8
> [ 4770.007858] ffff880208f29fd8 ffff880208f29fd8 ffff880216cec2c0 ffff880208f84180
> [ 4770.007858] ffff880208f29d08 ffff880208f29d58 ffff880208f84180 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
> [ 4770.007858] xfsrestore S ffff88021fc92d40 5848 3928 3487 0x00000000
> [ 4770.007858] ffff880212d0dc68 0000000000000082 ffff880208e76240 ffff880212d0dfd8
> [ 4770.007858] ffff880212d0dfd8 ffff880212d0dfd8 ffff880216cf2300 ffff880208e76240
> [ 4770.007858] ffff880212d0dd08 ffff880212d0dd58 ffff880208e76240 0000000000000000
> [ 4770.007858] Call Trace:
> [ 4770.007858] [<ffffffff81b8a009>] schedule+0x29/0x70
> [ 4770.007858] [<ffffffff810db089>] futex_wait_queue_me+0xc9/0x100
> [ 4770.007858] [<ffffffff810db809>] futex_wait+0x189/0x290
> [ 4770.007858] [<ffffffff810dd41c>] do_futex+0x11c/0xa80
> [ 4770.007858] [<ffffffff810abbd5>] ? hrtimer_try_to_cancel+0x55/0x110
> [ 4770.007858] [<ffffffff810abcb2>] ? hrtimer_cancel+0x22/0x30
> [ 4770.007858] [<ffffffff81b88f44>] ? do_nanosleep+0xa4/0xd0
> [ 4770.007858] [<ffffffff810dde0d>] sys_futex+0x8d/0x1b0
> [ 4770.007858] [<ffffffff810ab6e0>] ? update_rmtp+0x80/0x80
> [ 4770.007858] [<ffffffff81b93a99>] system_call_fastpath+0x16/0x1b
>
> I can't reliably reproduce it at this point, but there does appear
> to be some kind of locking problem in the multistream support.
One of my machines hit this overnight without --largefs. I wasn't able to get
a dump though. Just another data point.
> Speaking of which, most large filesystems dump/restore tests are
> failing because of this output:
>
> 026 20s ... - output mismatch (see 026.out.bad)
> --- 026.out 2012-10-05 11:37:51.000000000 +1000
> +++ 026.out.bad 2012-11-02 16:20:17.000000000 +1100
> @@ -20,6 +20,7 @@
> xfsdump: media file size NUM bytes
> xfsdump: dump size (non-dir files) : NUM bytes
> xfsdump: dump complete: SECS seconds elapsed
> +xfsdump: stream 0 DUMP_FILE OK (success)
> xfsdump: Dump Status: SUCCESS
> Restoring from file...
> xfsrestore -f DUMP_FILE -L stress_026 RESTORE_DIR
> @@ -32,6 +33,7 @@
> xfsrestore: directory post-processing
> xfsrestore: restoring non-directory files
> xfsrestore: restore complete: SECS seconds elapsed
> +xfsrestore: stream 0 DUMP_FILE OK (success)
> xfsrestore: Restore Status: SUCCESS
> Comparing dump directory with restore directory
> Files DUMP_DIR/big and RESTORE_DIR/DUMP_SUBDIR/big are identical
>
> Which looks like output from the multistream code. Why it is
> emitting this for large filesystem testing and not for small
> filesystems, I'm not sure yet.
>
> In fact, with --largefs, I see this for the dump group:
>
> Failures: 026 028 046 047 056 059 060 061 063 064 065 066 266 281
> 282 283
> Failed 16 of 19 tests
>
> And this for the normal sized (10GB) scratch device:
>
> Passed all 18 tests
>
> So there's something funky going on here....
Rich also reported some golden output related changes with --largefs awhile
back. I don't think he saw this one though.
The TODO list for userspace release currently stands at:
1) fix the header checksum failures... which is resolved
2) fix a futex hang in 059
3) fix the golden output changes related to multistream support in xfsdump
and --largefs
4) test on more platforms
Regards,
Ben
More information about the xfs
mailing list