On Sat, Feb 04, 2012 at 03:44:25PM -0500, Joe Landman wrote:
> >Sure it can. A gluster volume consists of "bricks". Each brick is served by
> >a glusterd process listening on a different TCP port. Those bricks can be on
> >the same server or on different servers.
> I seem to remember that the Gluster folks abandoned this model
> (using their code versus MD raid) on single servers due to
> performance issues. We did play with this a few times, and the
> performance wasn't that good. Basically limited by single disk
> seek/write speed.
I did raise the same question on the gluster-users list recently and there
seemed to be no clear-cut answer; some people were using Gluster to
aggregate RAID nodes, and some were using it to mirror individual disks
I do like the idea of having individual filesystems per disk, making data
recovery much more straightforward and allowing for efficient
However I also like the idea of low-level RAID which lets you pop out and
replace a disk invisibly to the higher levels, and is perhaps better
battle-tested than gluster file-level replication.
> RAID in this case can protect you from some of these issues (single
> disk failure issues, being replaced by RAID issues), but unless you
> are building mirror pairs of bricks on separate units, this magical
> "automatic" isn't quite so.
That was the idea: having mirror bricks on different nodes.
server1:/brick1 <-> server2:/brick1
server2:/brick2 <-> server2:/brick2 etc
> Moreover, last I checked, Gluster made no guarantees as to the
> ordering of the layout for mirrors. So if you have more than one
> brick per node, and build mirror pairs with the "replicate" option,
> you have to check the actual hashing to make sure it did what you
> expect. Or build up the mirror pairs more carefully.
AFAICS it does guarantee the ordering:
"Note: The number of bricks should be a multiple of the replica count for a
distributed replicated volume. Also, the order in which bricks are specified
has a great effect on data protection. Each replica_count consecutive bricks
in the list you give will form a replica set, with all replica sets combined
into a volume-wide distribute set. To make sure that replica-set members are
not placed on the same node, list the first brick on every server, then the
second brick on every server in the same order, and so on."
> At this point, it sounds like there is a gluster side of this
> discussion that I'd recommend you take to the gluster list. There
> is an xfs portion as well which is fine here.
Understood. Whatever the final solution looks like, I'm totally sold on XFS.
> Disclosure: we build/sell/support gluster (and other) based systems
> atop xfs based RAID units (both hardware and software RAID;
> 1,10,6,60,...) so we have inherent biases.
You have also inherent experience, and that is extremely valuable as I try
to pick the best storage model which will work for us going forward.