On 11/13/2013 04:10 PM, Dave Chinner wrote:
...
>
> The problem can be demonstrated with a single CPU and a single
> spindle. Create a single AG filesystem of a 100GB, and populate it
> with 10 million inodes.
>
> Time how long it takes to create another 10000 inodes in a new
> directory. Measure CPU usage.
>
> Randomly delete 10,000 inodes from the original population to
> sparsely populate the inobt with 10000 free inodes.
>
> Time how long it takes to create another 10000 inodes in a new
> directory. Measure CPU usage.
>
> The difference in time and CPU will be diretly related to the
> addition time spent searching the inobt for free inodes...
>
Thanks for the suggestion, Dave. I've run some fs_mark tests along the
lines of what is described here. I create 10m files, randomly remove
~10k from that dataset and measure the process of allocating 10k new
inodes in both finobt and non-finobt scenarios (after a clean remount).
The tests run from a 4xcpu VM with 4GB RAM and against an isolated SATA
drive I had lying around (mapped directly via virtio). The drive is
formatted with a single VG/LV and as follows with xfs:
meta-data=/dev/mapper/testvg-testlv isize=512 agcount=1,
agsize=26214400 blks
= sectsz=512 attr=2, projid32bit=1
= crc=1 finobt=0
data = bsize=4096 blocks=26214400, imaxpct=25
= sunit=0 swidth=0 blks
naming =version 2 bsize=4096 ascii-ci=0 ftype=1
log =internal bsize=4096 blocks=12800, version=2
= sectsz=512 sunit=0 blks, lazy-count=1
realtime =none extsz=4096 blocks=0, rtextents=0
Once the fs has been prepared with a random set of free inodes, the
following command is used to measure performance:
fs_mark -k -S 0 -D 4 -L 10 -n 1000 -s 0 -d /mnt/testdir
I've also collected some perf record data of these commands to compare
CPU usage. I can make the full/raw data available if desirable. Snippets
of the results are included below.
--- non-finobt, agi freecount = 9961 after random removal
- fs_mark
FSUse% Count Size Files/sec App Overhead
5 1000 0 1020.1 10811
5 2000 0 361.4 19498
5 3000 0 230.1 12154
5 4000 0 166.7 12816
5 5000 0 129.7 27409
5 6000 0 105.7 13946
5 7000 0 87.6 31792
5 8000 0 77.8 14921
5 9000 0 67.3 15597
5 10000 0 62.4 15835
- time
real 1m26.579s
user 0m0.120s
sys 1m26.113s
- perf report
6.21% :1994 [kernel.kallsyms] [k] memcmp
5.66% :1993 [kernel.kallsyms] [k] memcmp
4.84% :1992 [kernel.kallsyms] [k] memcmp
4.76% :1994 [xfs] [k] xfs_btree_check_sblock
4.46% :1993 [xfs] [k] xfs_btree_check_sblock
4.39% :1991 [kernel.kallsyms] [k] memcmp
3.88% :1992 [xfs] [k] xfs_btree_check_sblock
3.54% :1990 [kernel.kallsyms] [k] memcmp
3.38% :1991 [xfs] [k] xfs_btree_check_sblock
2.91% :1989 [kernel.kallsyms] [k] memcmp
2.89% :1990 [xfs] [k] xfs_btree_check_sblock
2.44% :1988 [kernel.kallsyms] [k] memcmp
2.31% :1989 [xfs] [k] xfs_btree_check_sblock
1.84% :1988 [xfs] [k] xfs_btree_check_sblock
1.65% :1987 [kernel.kallsyms] [k] memcmp
1.28% :1987 [xfs] [k] xfs_btree_check_sblock
1.12% :1994 [xfs] [k] xfs_btree_increment
1.08% :1994 [xfs] [k] xfs_btree_get_rec
1.04% :1993 [xfs] [k] xfs_btree_increment
1.00% :1993 [xfs] [k] xfs_btree_get_rec
0.99% :1986 [kernel.kallsyms] [k] memcmp
0.89% :1992 [xfs] [k] xfs_btree_increment
0.85% :1994 [xfs] [k] xfs_inobt_get_rec
0.84% :1992 [xfs] [k] xfs_btree_get_rec
0.77% :1991 [xfs] [k] xfs_btree_increment
0.77% :1986 [xfs] [k] xfs_btree_check_sblock
0.77% :1993 [xfs] [k] xfs_inobt_get_rec
0.75% :1991 [xfs] [k] xfs_btree_get_rec
0.69% :1992 [xfs] [k] xfs_inobt_get_rec
0.64% :1990 [xfs] [k] xfs_btree_increment
0.62% :1994 [xfs] [k] xfs_inobt_get_maxrecs
0.61% :1990 [xfs] [k] xfs_btree_get_rec
0.58% :1991 [xfs] [k] xfs_inobt_get_rec
...
--- finobt, agi freecount = 10137 after random removal
- fs_mark
FSUse% Count Size Files/sec App Overhead
5 1000 0 9210.0 8587
5 2000 0 5592.1 14933
5 3000 0 7095.4 11355
5 4000 0 5371.1 13613
5 5000 0 4919.3 14534
5 6000 0 4375.7 15813
5 7000 0 5011.3 15095
5 8000 0 4629.8 17902
5 9000 0 5622.9 12975
5 10000 0 5761.4 12203
- time
real 0m1.831s
user 0m0.104s
sys 0m1.384s
- perf report
1.82% :2520 [kernel.kallsyms] [k] lock_acquire
1.65% :2519 [kernel.kallsyms] [k] lock_acquire
1.65% :2525 [kernel.kallsyms] [k] lock_acquire
1.45% :2523 [kernel.kallsyms] [k] lock_acquire
1.44% :2524 [kernel.kallsyms] [k] lock_acquire
1.34% :2521 [kernel.kallsyms] [k] lock_acquire
1.27% :2522 [kernel.kallsyms] [k] lock_acquire
1.18% :2526 [kernel.kallsyms] [k] lock_acquire
1.15% :2527 [kernel.kallsyms] [k] lock_acquire
1.09% :2525 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
1.03% :2524 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.88% :2520 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.83% :2523 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.81% :2521 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.79% :2519 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.79% :2522 [kernel.kallsyms] [k] _raw_spin_unlock_irqrestore
0.76% :2519 [kernel.kallsyms] [k] kmem_cache_free
0.76% :2520 [kernel.kallsyms] [k] kmem_cache_free
0.73% :2526 [kernel.kallsyms] [k] kmem_cache_free
...
0.30% :2525 [xfs] [k] xfs_dir3_leaf_check_int
0.28% :2525 [kernel.kallsyms] [k] memcpy
0.27% :2527 [kernel.kallsyms] [k] security_compute_sid.part.14
0.26% :2520 [kernel.kallsyms] [k] memcpy
0.26% :2523 [xfs] [k] _xfs_buf_find
0.26% :2526 [xfs] [k] _xfs_buf_find
Summarized, the results show a nice improvement for inode allocation
into a set of inode chunks with random free inode availability. The 10k
inode allocation reduces from ~90s to ~2s and CPU usage from XFS drops
way down in the perf profile.
I haven't extensively tested the following, but a quick 1 million inode
allocation test on a fresh, single AG fs shows a slight degradation with
the finobt enabled in terms of time to complete:
fs_mark -k -S 0 -D 4 -L 10 -n 100000 -s 0 -d /mnt/bigdir
- non-finobt
real 1m35.349s
user 0m4.555s
sys 1m29.749s
- finobt
real 1m42.396s
user 0m4.326s
sys 1m37.152s
Brian
|