RAID setups, usage, Q's' effect of spindle groups...etc...blah blahblah...
Linda Walsh
xfs at tlinx.org
Sun Jan 20 21:19:20 CST 2013
Stan Hoeppner wrote:
> On 1/19/2013 6:46 PM, Dave Chinner wrote:
>> On Sat, Jan 19, 2013 at 03:55:17PM -0800, Linda Walsh wrote:
>
>>> All that talk about RAIDs recently, got me depressed a bit
>>> when I realize that while I can get fast speeds, type speeds in seeking
>>> around are about 1/10-1/20th the speed...sigh.
>>>
>>> Might that indicate that I should go with smaller RAIDS with more
>>> spindles? I.e. instead of 3 groups of RAID5 striped as 0, go for 4-5 groups
>>> of RAID5 striped as a 0? Just aligning the darn things nearly takes a rocket
>>> scientist! But then start talking about multiple spindles and optimizing
>>> IOP's...ARG!...;-) (If it wasn't challenging, I'd find it boring...)...
>> Somebody on the list might be able to help you with this - I don't
>> have the time right now as I'm deep in metadata CRC changes...
>
> I have time Dave. Hay Linda, if you're to re-architect your storage,
> the first thing I'd do is ditch that RAID50 setup. RAID50 exists
> strictly to reduce some of the penalties of RAID5. But then you find
> new downsides specific to RAID50, including the alignment issues you
> mentioned.
----
Well, like I said, I don't have a pressing need, since it works
fairly well for most of my activities. But it's hard to characterize my
workload as I do development and experimenting. My most recent failure (well
not entirely), that I wouldn't say is 'closed', was trying to up the BW between
my workstation and my server. I generally run my server as a 'backend' file store
for my workstation, though I do linux devel, I work & play through a Win7
workstation. It provides content for my living room 'TV'[sic] as well as
music. Those are relatively low drain. I take breaks throughout the day from
doing software work/programming to watching a video or playing the occasional game.
If I do any one thing for too long, I'm liable to worsen back and RSI problems.
I ran a diff between my primary media disk and a primary duplicate of it that
I use as a backup. I'd just finished synchronizing them with rsync, then decided to
use my media library (~6.5T) as a test bed for my 'dedup' program I'm working on.
(I screwed myself once before when I thought it was working but it wasn't and I
didn't catch it immediately - which is why I used it as a test bed after doing a
full sync of it, and then ran a diff -r on the two disks. Fortunately, even though
the 'dedup'r prog found and linked about 140 files on the Media disk, they
were all correct. The diff ran in about 5 hours, or averaged around 400+MB/s,
likely limited by the Media disk as it's a simple 5-disk/4-spindle RAID 5.
I have 4 separate RAIDs:
1) Boot+OS: RAID5 2-data spindles; short-stroked at 50% w/68gB, 15K SAS.. Hitachi
-- just
noticed today, they aren't exactly matched. 2 are MUA3073RC, 1 is MBA3073RC. Odd.
This array is optimized more for faster seeking (the 50% usage limited to the
outside
tracks) than linear speed -- I may migrate those to SSD's at some point.
2) Downloaded+online media+SW. RAID5: 4-data spindles using 2tB(1.819TB) Hitachi
Ultrastar 7.2K SATA's (note, the disks in #3 & #4 are the same type).
3) Main data+devel disk: RAID50 12-data spindles in 3 groups of 4. NOTE: I tried
and benched RAID60 but wasn't happy with the performance, not to mention the
diskspace
hit RAID10 would be a bit too decadent for my usage/budget.
4) Backups: RAID6: 6-data spindles. Not the fastest config, but it is not
bad for backups.
#3 is my play/devel/experimentation RAID, it's divided with LVM. #4 and #2 have
an LVM layer as well, but since it's currently a 1:1 mapping, it doesn't come into
play much other than eating a few MB and possibly allowing me to more easily reorg
them in the future. On #3 currently using 12.31tB in 20 partitions (but only 3
work partitions)... the rest are snapshots (only 1 live snapshot others are copies
of diffs for those dates... ).
-------------
NOTE:
One thing that had me being less happy than usual with the speed -- the internal
battery on #3 was going through reconditioning, and that meant the internal cache
policy went to WT (write-through) instead of (WB) write-back.
I think that's something that was causing me some noticeable slowdown
-- just found out about that last night in reviewing the controller log.
Note -- I generally Like the RAID50's, they don't "REALLY" have a stripe size of
768k -- that's just optimal speed/write amount before it hits the same disk
again. But since it is a RAID50, Any small write only needs to update 1 of the
RAID5
groups, so 256k stripe size, which is far more reasonable/normal.
Cards, 1 internal: Dell Perc 6/i (serving #1 & #2 above -- all internal)
1 LSI MR9280DE-8e (serving #3+4)
2 Enclosures LSI-DE1600-SAS (12x3.5" ea)
> Briefly describe your workload(s), the total capacity you have now, and
> (truly) need now and project to need 3 years from now.
---
3 years from now? Ha!. Lets just say that with the dollar dropping
as fast as disk prices over the past 4 years has flamboozled any normal
planning.
I was mostly interested in how increasing number of spindles
in a Raid50 would help parallelism. My thoughts on that
was that since each member of a RAID0, can be read or written independently
of any other member (as there is no parity to check), that IF I wanted to
increase parallelism (while hurting maximum throughput AND disk space), I
**could** reconfigure to .. well extreme would be 5 groups of 2-data/3disk
RAID5's. That would, I think, theoretically (and if the controller is up to
it, which I think it is), allow *up_to* 5 separate reads/writes to be served
in parallel, vs. now, I think it should be 3.
A middling approach is to use an extra disk (total 16 instead of 15)
to go with 4 groups of RAID5 @ 3data disks each -- which would give the same
space, but consume my spare. Am unclear about what it would do to maximum
throughput, but likely it would go down a bit on writes due to write-overhead
increasing from 25% to 33%.
It was, I thought, a fairly simply question, but I have a history
of sometimes thinking things will be easier than they are proportional to
how far away (in future or someone else doing it! ;-)) something is...
> If it is needed, I'll recommend vendor specific hardware if you like
> that will plug into your existing gear, or I can provide information on
> new dissimilar brand storage gear. And of course I'll provide necessary
> Linux and XFS configuration information optimized to the workload and
> hardware. I'm not trying to consult here, just providing
> information/recommendations.
----
My **GENERAL** plan if prices had cooperated was to move
to 3TB SATA's and **mabye** a 3rd enclosure -- I sorta like the LSI ones..
they seem pretty solid. Have tried a few others and generally found them
not as good, but have looked on the economical side since this is for
a home office^h^h^h^h^h^hlab^h^h^hplay setup....
>
> In general, yes, more spindles will always be faster if utilized
> properly. But depending on your workload(s) you might be able to fix
> your performance problems by simply moving your current array to non
> parity RAID10, layered stripe over RAID1 pairs, concat, etc, thus
> eliminating the RMW penalty entirely.
----
Consider this -- my max read and write (both), on my
large array is 1GB/s. There's no way I could get that with a RAID10 setup
without a much larger number of disks. Though I admit, concurrency would
rise... but I generate most of my workload, so usually I don't have
too many things going on at the same time... a few maybe...
When an xfs_fsr kicks in and starts swallowing disk-cache, *ahem*,
and the daily backup kicks in, AND the daily 'rsync' to create a static
snapshot... things can slow down a bit.. but rare am I up at those hours...
The most intensive is the xfs_fsr, partly due to it swallowing
up disk cache (it runs at nice -19 ionice -c3, and I can still feel it!)...
I might play more with putting it in it's own blkio cgroup.
and just limiting the overall disk transactions...(not to mention
fixing that disk-buffer usage issue)...
> You'll need more drives to maintain the same usable capacity,
---
(oh, a minor detail! ;^))...
;-)
Don't spend much time on this.. (well if you read it, that might be too much
already! ;-))... As I said it's not THAT important...and was mostly about
the effect of groups in a RAID50 relating to performance tradeoffs.
Thanks for any insights...(I'm always open to learning how wrong I am! ;-))...
More information about the xfs
mailing list