xfs
[Top] [All Lists]

Re: Performance problem - reads slower than writes

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: Performance problem - reads slower than writes
From: Brian Candler <B.Candler@xxxxxxxxx>
Date: Tue, 31 Jan 2012 14:16:04 +0000
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from:to :cc:subject:message-id:references:mime-version:content-type :in-reply-to; s=sasl; bh=JkGEL71ubQy0LXaDzu0v1VpNSsA=; b=rxA1xxW qVv0a1HTiHXMNQY7xW6y/v9vgvMCuwb9A1V0jE8PbMYWaxsJUrXO+ssD9mGD9FSF DF+iC5P6577Ud3DhW1CaGFm67xCW0MWb99TF3xwQJqXLE9XdzfyONLmNnEAds0yE 3atFDjRhNDIvXjRKansFDYMrGtvoWdePsEtk=
Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from:to:cc :subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=sasl; b=qsN7UiZTiVRDCkJBd9Xq4Itv4a7P8Bagl 3ZRh+Lg5pwNfEQMhQgxi4u5tWm6Qau+JUzWwbC/LvwSjqvAz6HHP0A1KgsuhY3gc IubjcGyMEnA9miKzvCoC/76rXlHkU0ky5KIpP53YzDSA+cAyFlAMpsyEEczsPVHI S1Xt/BE+ts=
In-reply-to: <20120131103126.GA46170@xxxxxxxx>
References: <20120130220019.GA45782@xxxxxxxx> <20120131020508.GF9090@dastard> <20120131103126.GA46170@xxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
Updates:

(1) The bug in bonnie++ is to do with memory allocation, and you can work
around it by putting '-n' before '-s' on the command line and using the same
custom chunk size before both (or by using '-n' with '-s 0')

# time bonnie++ -d /data/sdc -n 98:800k:500k:1000:32k -s 16384k:32k -u root

Version  1.96       ------Sequential Output------ --Sequential Input- --Random-
Concurrency   1     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine   Size:chnk K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec %CP
storage1    16G:32k  2061  91 101801   3 49405   4  5054  97 126748   6 130.9   
3
Latency             15446us     222ms     412ms   23149us   83913us     452ms
Version  1.96       ------Sequential Create------ --------Random Create--------
storage1            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
98:819200:512000/1000   128   3    37   1 10550  25   108   3    38   1  8290  
33
Latency              6874ms   99117us   45394us    4462ms   12582ms    4027ms
1.96,1.96,storage1,1,1328002525,16G,32k,2061,91,101801,3,49405,4,5054,97,126748,6,130.9,3,98,819200,512000,,1000,128,3,37,1,10550,25,108,3,38,1,8290,33,15446us,222ms,412ms,23149us,83913us,452ms,6874ms,99117us,45394us,4462ms,12582ms,4027ms

This shows that using 32k transfers instead of 8k doesn't really help; I'm
still only seeing 37-38 reads per second, either sequential or random.


(2) In case extents aren't being kept in the inode, I decided to build a
filesystem with '-i size=1024'

# time bonnie++ -d /data/sdb -n 98:800k:500k:1000:32k -s0 -u root

Version  1.96       ------Sequential Create------ --------Random Create--------
storage1            -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete--
files:max:min        /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP
98:819200:512000/1000   110   3   131   5  3410  10   110   3    33   1   387   
1
Latency              6038ms   92092us   87730us    5202ms     117ms    7653ms
1.96,1.96,storage1,1,1328003901,,,,,,,,,,,,,,98,819200,512000,,1000,110,3,131,5,3410,10,110,3,33,1,387,1,,,,,,,6038ms,92092us,87730us,5202ms,117ms,7653ms

Wow! The sequential read just blows away the previous results. What's even
more amazing is the number of transactions per second reported by iostat
while bonnie++ was sequentially stat()ing and read()ing the files:

# iostat 5
...
sdb             820.80     86558.40         0.00     432792          0
                  !!

820 tps on a bog-standard hard-drive is unbelievable, although the total
throughput of 86MB/sec is.  It could be that either NCQ or drive read-ahead
is scoring big-time here.

However during random stat()+read() the performance drops:

# iostat 5
...
sdb             225.40     21632.00         0.00     108160          0

Here we appear to be limited by real seeks. 225 seeks/sec is still very good
for a hard drive, but it means the filesystem is generating about 7 seeks
for every file (stat+open+read+close).  Indeed the random read performance
appears to be a bit worse than the default (-i size=256) filesystem, where
I was getting 25MB/sec on iostat, and 38 files per second instead of 33.

There are only 1000 directories in this test, and I would expect those to
become cached quickly.

According to Wikipedia, XFS has variable-length extents. I think that as
long as the file data is contiguous then each file should only be taking a
single extent, and this is what xfs_bmap seems to be telling me:

# xfs_bmap -n1 -l -v /data/sdc/Bonnie.25448/00449/* | head
/data/sdc/Bonnie.25448/00449/000000b125mpBap4gg7U:
 EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET            TOTAL
   0: [0..1559]:       4446598752..4446600311  3 (51198864..51200423)  1560
/data/sdc/Bonnie.25448/00449/000000b1262hBudG6gV:
 EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET            TOTAL
   0: [0..1551]:       1484870256..1484871807  1 (19736960..19738511)  1552
/data/sdc/Bonnie.25448/00449/000000b127fM:
 EXT: FILE-OFFSET      BLOCK-RANGE            AG AG-OFFSET            TOTAL
   0: [0..1111]:       2954889944..2954891055  2 (24623352..24624463)  1112
/data/sdc/Bonnie.25448/00449/000000b128:

It looks like I need to get familiar with xfs_db and
http://oss.sgi.com/projects/xfs/papers/xfs_filesystem_structure.pdf
to find out what's going on.

(These filesystems are mounted with noatime,nodiratime incidentally)

Regards,

Brian.

<Prev in Thread] Current Thread [Next in Thread>