Updates:
(1) The bug in bonnie++ is to do with memory allocation, and you can work
around it by putting '-n' before '-s' on the command line and using the same
custom chunk size before both (or by using '-n' with '-s 0')
# time bonnie++ -d /data/sdc -n 98:800k:500k:1000:32k -s 16384k:32k -u root
Version 1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
storage1 16G:32k 2061 91 101801 3 49405 4 5054 97 126748 6 130.9
3
Latency 15446us 222ms 412ms 23149us 83913us 452ms
Version 1.96 ------Sequential Create------ --------Random Create--------
storage1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
98:819200:512000/1000 128 3 37 1 10550 25 108 3 38 1 8290
33
Latency 6874ms 99117us 45394us 4462ms 12582ms 4027ms
1.96,1.96,storage1,1,1328002525,16G,32k,2061,91,101801,3,49405,4,5054,97,126748,6,130.9,3,98,819200,512000,,1000,128,3,37,1,10550,25,108,3,38,1,8290,33,15446us,222ms,412ms,23149us,83913us,452ms,6874ms,99117us,45394us,4462ms,12582ms,4027ms
This shows that using 32k transfers instead of 8k doesn't really help; I'm
still only seeing 37-38 reads per second, either sequential or random.
(2) In case extents aren't being kept in the inode, I decided to build a
filesystem with '-i size=1024'
# time bonnie++ -d /data/sdb -n 98:800k:500k:1000:32k -s0 -u root
Version 1.96 ------Sequential Create------ --------Random Create--------
storage1 -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP
98:819200:512000/1000 110 3 131 5 3410 10 110 3 33 1 387
1
Latency 6038ms 92092us 87730us 5202ms 117ms 7653ms
1.96,1.96,storage1,1,1328003901,,,,,,,,,,,,,,98,819200,512000,,1000,110,3,131,5,3410,10,110,3,33,1,387,1,,,,,,,6038ms,92092us,87730us,5202ms,117ms,7653ms
Wow! The sequential read just blows away the previous results. What's even
more amazing is the number of transactions per second reported by iostat
while bonnie++ was sequentially stat()ing and read()ing the files:
# iostat 5
...
sdb 820.80 86558.40 0.00 432792 0
!!
820 tps on a bog-standard hard-drive is unbelievable, although the total
throughput of 86MB/sec is. It could be that either NCQ or drive read-ahead
is scoring big-time here.
However during random stat()+read() the performance drops:
# iostat 5
...
sdb 225.40 21632.00 0.00 108160 0
Here we appear to be limited by real seeks. 225 seeks/sec is still very good
for a hard drive, but it means the filesystem is generating about 7 seeks
for every file (stat+open+read+close). Indeed the random read performance
appears to be a bit worse than the default (-i size=256) filesystem, where
I was getting 25MB/sec on iostat, and 38 files per second instead of 33.
There are only 1000 directories in this test, and I would expect those to
become cached quickly.
According to Wikipedia, XFS has variable-length extents. I think that as
long as the file data is contiguous then each file should only be taking a
single extent, and this is what xfs_bmap seems to be telling me:
# xfs_bmap -n1 -l -v /data/sdc/Bonnie.25448/00449/* | head
/data/sdc/Bonnie.25448/00449/000000b125mpBap4gg7U:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1559]: 4446598752..4446600311 3 (51198864..51200423) 1560
/data/sdc/Bonnie.25448/00449/000000b1262hBudG6gV:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1551]: 1484870256..1484871807 1 (19736960..19738511) 1552
/data/sdc/Bonnie.25448/00449/000000b127fM:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
0: [0..1111]: 2954889944..2954891055 2 (24623352..24624463) 1112
/data/sdc/Bonnie.25448/00449/000000b128:
It looks like I need to get familiar with xfs_db and
http://oss.sgi.com/projects/xfs/papers/xfs_filesystem_structure.pdf
to find out what's going on.
(These filesystems are mounted with noatime,nodiratime incidentally)
Regards,
Brian.
|