xfs
[Top] [All Lists]

Re: [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: [PATCH 0/3] xfs: allocation worker causes freelist buffer lock hang
From: Mark Tinguely <tinguely@xxxxxxx>
Date: Wed, 26 Sep 2012 09:14:14 -0500
Cc: bpm@xxxxxxx, xfs@xxxxxxxxxxx
In-reply-to: <20120925220110.GF29154@dastard>
References: <20120924171159.GG1140@xxxxxxx> <201209241809.q8OI94s3003323@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20120925005632.GB23520@dastard> <5061CA48.3040202@xxxxxxx> <20120925220110.GF29154@dastard>
User-agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0
On 09/25/12 17:01, Dave Chinner wrote:
On Tue, Sep 25, 2012 at 10:14:16AM -0500, Mark Tinguely wrote:

<deletes>


As a bonus, consolidating the loops into one worker actually gives a slight
performance advantage.

Can you quantify it?

I was comparing the bonnie and iozone benchmarks outputs. I will see
if someone can enlighten me on how to quantify those numbers.

Ugh.

Don't bother. Those are two of the worst offenders in the "useless
benchmarks for regression testing" category. Yeah, they *look* like
they give decent numbers, but I've wasted so much time looking at
results from these benhmarks only to find they do basic things wrong
and give numbers that vary simple because you've made a change that
increases or decreases the CPU cache footprint of a code path.

e.g. IOZone uses the same memory buffer as the source/destination of
all it's IO, and does not touch the contents of it at all. Hence for
small IO, the buffer stays resident in the CPU caches and gives
unrealsitically high throughput results. Worse is the fact that CPU
cache residency of the buffer can change according to the kernel
code path taken, so you can get massive changes in throughput just
by changing the layout of the code without changing any logic....

IOZone can be useful if you know exactly what you are doing and
using it to test a specific code path with a specific set of
configurations. e.g. comparing ext3/4/xfs/btrfs on the same kernel
and storage is fine. However, the moment you start using it to
compare different kernels, it's a total crap shoot....

does anyone have a good benchmark XFS should use to share performance results? A number that we can agree a series does not degrade the filesystem..

lies, damn lies, statistics and then filesystem benchmarks?! :)


I guess I don't understand what you mean by "loop on
xfs_alloc_vextent()" then.

The problem I see above is this:

thread 1                worker 1                worker 2..max
xfs_bmapi_write(userdata)

    loops here calling xfs_bmapi_alloc()

   xfs_bmapi_allocate(user)
     xfs_alloc_vextent(user)
       wait

                        _xfs_alloc_vextent()
                        locks AGF

                         first loop it takes the lock
                         one of the next times through the above
                         loop it cannot get a worker. deadlock here.

                         I saved the xfs_bmalloca and fs_alloc_arg when
                         allocating a buffer to verify the paths.

                                                _xfs_alloc_vextent()
                                                blocks on AGF lock

                        completes allocation

       <returns with AGF locked in transaction>
     xfs_bmap_add_extent_hole_real
       xfs_bmap_extents_to_btree
         xfs_alloc_vextent(user)
           wait

           this does not need a worker, and since in the same
           transaction all locks to the AGF buffer are recursive locks.
           no wait here.

<deadlock as no more workers available>

<deletes>

--Mark.

<Prev in Thread] Current Thread [Next in Thread>