[Top] [All Lists]

Re: Problem about very high Average Read/Write Request Time

To: quanjun hu <huquanjun@xxxxxxxxx>, xfs@xxxxxxxxxxx
Subject: Re: Problem about very high Average Read/Write Request Time
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Sun, 19 Oct 2014 16:16:56 -0500
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <CALSoAzD4ccHXBuD6mT3ggqMf1j_kDEK-RNMOeRLq+N+NiWVQXg@xxxxxxxxxxxxxx>
References: <CALSoAzD4ccHXBuD6mT3ggqMf1j_kDEK-RNMOeRLq+N+NiWVQXg@xxxxxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Icedove/24.7.0
On 10/18/2014 04:26 AM, quanjun hu wrote:
> Hi,
>    I am using xfs on a raid 5 (~100TB) and put log on external ssd device, 
> the mount information is:
> /dev/sdc on /data/fhgfs/fhgfs_storage type xfs 
> (rw,relatime,attr2,delaylog,logdev=/dev/sdb1,sunit=512,swidth=15872,noquota).
>   when doing only reading / only writing , the speed is very fast(~1.5G), but 
> when do both the speed is very slow (100M), and high r_await(160) and 
> w_await(200000).
>    1. how can I reduce average request time?
>    2. can I use ssd as write/read cache for xfs?

You apparently have 31 effective SATA 7.2k RPM spindles with 256 KiB chunk, 
7.75 MiB stripe width, in RAID5.  That should yield 3-4.6 GiB/s of streaming 
throughput assuming no cable, expander, nor HBA limitations.  You're achieving 
only 1/3rd to 1/2 of this.  Which hardware RAID controller is this?  What are 
the specs?  Cache RAM, host and back end cable count and type?

When you say read or write is fast individually, but read+write is slow, what 
types of files are you reading and writing, and how many in parallel?  This 
combined pattern is likely the cause of the slowdown due to excessive seeking 
in the drives.

As others mentioned this isn't an XFS problem.  The problem is that your RAID 
geometry doesn't match your workload.  Your very wide parity stripe is 
apparently causing excessive seeking with your read+write workload due to 
read-modify-write operations.  To mitigate this, and to increase resiliency, 
you should switch to RAID6 with a smaller chunk.  If you need maximum capacity 
make a single RAID6 array with 16 KiB chunk size.  This will yield a 496 KiB 
stripe width, increasing the odds that all writes are a full stripe, and 
hopefully eliminating much of the RMW problem.

A better option might be making three 10 drive RAID6 arrays (two spares) with 
32 KiB chunk, 256 KiB stripe width, and concatenating the 3 arrays with mdadm 
--linear.  You'd have 24 spindles of capacity and throughput instead of 31, but 
no more RMW operations, or at least very few.  You'd format the linear md 
device with

# mkfs.xfs -d su=32k,sw=8 /dev/mdX

As long as your file accesses are spread fairly evenly across at least 3 
directories you should achieve excellent parallel throughput, though single 
file streaming throughput will peak at 800-1200 MiB/s, that of 8 drives.  With 
a little understanding of how this setup works, you can write two streaming 
files and read a third without any of the 3 competing with one another for disk 
seeks/bandwidth--which is your current problem.  Or you could do one read and 
one write to each of 3 directories, and no pair of two would interfere with the 
other pairs.  Scale up from here.

Basically what we're doing is isolating each RAID LUN into a set of 
directories.  When you write to one of those directories the file goes into 
only one of the 3 RAID arrays.  Doing this isolates RMWs for a given write to 
only a subset of your disks, and minimizes the amount of seeks generated by 
parallel accesses.


<Prev in Thread] Current Thread [Next in Thread>