xfs
[Top] [All Lists]

Re: hole punching performance

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: hole punching performance
From: "Bradley C. Kuszmaul" <kuszmaul@xxxxxxxxx>
Date: Thu, 3 Jan 2013 13:25:48 -0500
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=0xBvNmFAstQHOzOVxHf6MQiQPb7/lT552OyIB9GjstE=; b=XgVFiBbHamyWVFBL0YoYrV6IQRR4gAlF+BRsAD5Ij1aFoYsQVt4hadNJiG3H9h2tLK ytDTRf8NEgJtQY8Ygmr3tfH3BDDpIIf9FW4dHcldatF1CyK2DqRyRYD+PXlF+hJNw8Gh vTmA7rU+ZQOrDtoKfpyBcnMAx5JKZR/S8VJRPZUVxfdtAQNoCqiTIlF+VMwShRGYfQ8D FxvP4VhblqirEP+UJrsHhTLvrZCmUTKkV/fps4G0uObzGcAI7DC991slEa/5TCXaP6S4 jIicfF6Z9hUTcWAgoc5YeK0Sm2g9X7n4JpuzTWUTXoRWRYPfQ+hSDLsM/pNlc82gT/AT aOhg==
In-reply-to: <20130103055101.GE3120@dastard>
References: <CAKSyJXf66H2U-BF-aYnSr2fF24_6LJw6swOx1RhUc_3Eqayaiw@xxxxxxxxxxxxxx> <20130102232706.GD3120@dastard> <CAKSyJXf5bs4wfM4k-o+1p6zOLyH46U4eorFDS8Zzsb5W7uvwPg@xxxxxxxxxxxxxx> <20130103055101.GE3120@dastard>
Thanks Dave, this is very helpful information.  I have a much better
sense of what the benchmark (e.g., for regression testing).

-Bradley


On Thu, Jan 3, 2013 at 12:51 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> On Wed, Jan 02, 2013 at 08:45:22PM -0500, Bradley C. Kuszmaul wrote:
>> Thanks for the help.  I got results similar to yours.  However, the
>> hole punching is much faster if you create the file with fallocate
>> than if you actually write some data into it.
>>  fallocate and then hole-punch is about 1us per hole punch.
>>  write and then hole-punch is about 90us per hole punch.
>
> No surprise - after a write the hole punch has a lot more to do.
> I modified the test program to not use O_TRUNC, then ran:
>
> $ /usr/sbin/xfs_io -f -c "truncate 0" -c "pwrite -b 1m 0 20g" 
> /mnt/scratch/blah
> wrote 21474836480/21474836480 bytes at offset 0
> 20.000 GiB, 20480 ops; 0:00:30.00 (675.049 MiB/sec and 675.0491 ops/sec)
> $ sync
> $ time ./a.out
>
> real    0m1.664s
> user    0m0.000s
> sys     0m1.656s
> $
>
> Why? perf top indicates that pretty quickly:
>
>  12.80%  [kernel]  [k] free_hot_cold_page
>  10.62%  [kernel]  [k] block_invalidatepage
>  10.62%  [kernel]  [k] _raw_spin_unlock_irq
>   8.35%  [kernel]  [k] kmem_cache_free
>   6.07%  [kernel]  [k] _raw_spin_unlock_irqrestore
>   3.65%  [kernel]  [k] put_page
>   3.51%  [kernel]  [k] __wake_up_bit
>   3.27%  [kernel]  [k] find_get_pages
>   2.84%  [kernel]  [k] get_pageblock_flags_group
>   2.66%  [kernel]  [k] cancel_dirty_page
>   2.09%  [kernel]  [k] truncate_inode_pages_range
>
> The page cache has to have holes punched in it after the write. So,
> lets rule that out by discarding it separately, and see just what
> the extent manipulation overhead is:
>
> $ rm -f /mnt/scratch/blah
> $ /usr/sbin/xfs_io -f -c "truncate 0" -c "pwrite -b 1m 0 20g" 
> /mnt/scratch/blah
> wrote 21474836480/21474836480 bytes at offset 0
> 20.000 GiB, 20480 ops; 0:00:27.00 (749.381 MiB/sec and 749.3807 ops/sec)
> $ sudo sh -c "echo 3 > /proc/sys/vm/drop_caches"
> $ time ./a.out
>
> real    0m0.347s
> user    0m0.000s
> sys     0m0.332s
> $
>
> Which is the same as the fallocate/punch method gives....
>
>> But 90us is likely to be plenty fast, so it's looking good.  ( I'll
>> try to track down why my other program was slow.)
>
> If you open the file O_SYNC or O_DSYNC, then you'll still get
> synchronous behaviour....
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>