Performance problem - reads slower than writes

Joe Landman landman at scalableinformatics.com
Sat Feb 4 14:44:25 CST 2012


On 02/04/2012 03:04 PM, Brian Candler wrote:
> On Sat, Feb 04, 2012 at 06:49:23AM -0600, Stan Hoeppner wrote:

[...]

> Sure it can. A gluster volume consists of "bricks". Each brick is served by
> a glusterd process listening on a different TCP port. Those bricks can be on
> the same server or on different servers.

I seem to remember that the Gluster folks abandoned this model (using 
their code versus MD raid) on single servers due to performance issues. 
  We did play with this a few times, and the performance wasn't that 
good.  Basically limited by single disk seek/write speed.

>
>> Even if what you describe can be done with Gluster, the performance will
>> likely be significantly less than a properly setup mdraid or hardware
>> raid.  Again, if it can be done, I'd test it head-to-head against RAID.
>
> I'd expect similar throughput but higher latency. Given that I'm using low

My recollection is that this wasn't the case.  Performance was 
suboptimal in all cases we tried.

> RPM drives which already have high latency, I'm hoping the additional
> latency will be insignificant.  Anyway, I'll know more once I've done the
> measurements.

I did this with the 3.0.x and the 2.x series of Gluster.  Usually atop 
xfs of some flavor.

>
>> I've never been a fan of parity RAID, let alone double parity RAID.
>
> I'm with you on that one.

RAID's entire purpose in life is to give an administrator time to run in 
and change a disk.  RAID isn't a backup, or even a guarantee of data 
retention.  Many do treat it this way though, to their (and their 
data's) peril.

> The attractions of gluster are:
> - being able to scale a volume across many nodes, transparently to
>    the clients

This does work, though rebalance is as much a function of the seek and 
bandwidth of the slowest link as other things.  So if you have 20 
drives, and you do a rebalance to add 5 more, its gonna be slow for a 
while.

> - being able to take a whole node out of service, while clients
>    automatically flip over to the other
>

I hate to put it like this, but this is true for various definitions of 
the word "automatically".  You need to make sure that your definitions 
line up with the reality of "automatic".

If a brick goes away, and you have a file on this brick you want to 
overwrite, it doesn't (unless you have a mirror) flip over to another 
unit "automatically" or otherwise.

RAID in this case can protect you from some of these issues (single disk 
failure issues, being replaced by RAID issues), but unless you are 
building mirror pairs of bricks on separate units, this magical 
"automatic" isn't quite so.

Moreover, last I checked, Gluster made no guarantees as to the ordering 
of the layout for mirrors.  So if you have more than one brick per node, 
and build mirror pairs with the "replicate" option, you have to check 
the actual hashing to make sure it did what you expect.  Or build up the 
mirror pairs more carefully.

At this point, it sounds like there is a gluster side of this discussion 
that I'd recommend you take to the gluster list.  There is an xfs 
portion as well which is fine here.

Disclosure:  we build/sell/support gluster (and other) based systems 
atop xfs based RAID units (both hardware and software RAID; 
1,10,6,60,...) so we have inherent biases.  Your mileage may vary.  See 
your doctor if your re-balance exceeds 4 hours.

> Regards,
>
> Brian.

Joe

-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics Inc.
email: landman at scalableinformatics.com
web  : http://scalableinformatics.com
        http://scalableinformatics.com/sicluster
phone: +1 734 786 8423 x121
fax  : +1 866 888 3112
cell : +1 734 612 4615



More information about the xfs mailing list