On 10/18/2014 04:26 AM, quanjun hu wrote:
> I am using xfs on a raid 5 (~100TB) and put log on external ssd device,
> the mount information is:
> /dev/sdc on /data/fhgfs/fhgfs_storage type xfs
> when doing only reading / only writing , the speed is very fast(~1.5G), but
> when do both the speed is very slow (100M), and high r_await(160) and
> 1. how can I reduce average request time?
> 2. can I use ssd as write/read cache for xfs?
You apparently have 31 effective SATA 7.2k RPM spindles with 256 KiB chunk,
7.75 MiB stripe width, in RAID5. That should yield 3-4.6 GiB/s of streaming
throughput assuming no cable, expander, nor HBA limitations. You're achieving
only 1/3rd to 1/2 of this. Which hardware RAID controller is this? What are
the specs? Cache RAM, host and back end cable count and type?
When you say read or write is fast individually, but read+write is slow, what
types of files are you reading and writing, and how many in parallel? This
combined pattern is likely the cause of the slowdown due to excessive seeking
in the drives.
As others mentioned this isn't an XFS problem. The problem is that your RAID
geometry doesn't match your workload. Your very wide parity stripe is
apparently causing excessive seeking with your read+write workload due to
read-modify-write operations. To mitigate this, and to increase resiliency,
you should switch to RAID6 with a smaller chunk. If you need maximum capacity
make a single RAID6 array with 16 KiB chunk size. This will yield a 496 KiB
stripe width, increasing the odds that all writes are a full stripe, and
hopefully eliminating much of the RMW problem.
A better option might be making three 10 drive RAID6 arrays (two spares) with
32 KiB chunk, 256 KiB stripe width, and concatenating the 3 arrays with mdadm
--linear. You'd have 24 spindles of capacity and throughput instead of 31, but
no more RMW operations, or at least very few. You'd format the linear md
# mkfs.xfs -d su=32k,sw=8 /dev/mdX
As long as your file accesses are spread fairly evenly across at least 3
directories you should achieve excellent parallel throughput, though single
file streaming throughput will peak at 800-1200 MiB/s, that of 8 drives. With
a little understanding of how this setup works, you can write two streaming
files and read a third without any of the 3 competing with one another for disk
seeks/bandwidth--which is your current problem. Or you could do one read and
one write to each of 3 directories, and no pair of two would interfere with the
other pairs. Scale up from here.
Basically what we're doing is isolating each RAID LUN into a set of
directories. When you write to one of those directories the file goes into
only one of the 3 RAID arrays. Doing this isolates RMWs for a given write to
only a subset of your disks, and minimizes the amount of seeks generated by