[Top] [All Lists]

Re: performance over multiple disks

To: James Rich <james@xxxxxxxxxxxxx>
Subject: Re: performance over multiple disks
From: Ragnar Kjørstad <xfs@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 31 Oct 2002 20:42:33 +0100
Cc: XFS mailing list <linux-xfs@xxxxxxxxxxx>
In-reply-to: <Pine.LNX.4.44.0210311039310.21908-100000@xxxxxxxxxxxxxxxxxxxx>; from james@xxxxxxxxxxxxx on Thu, Oct 31, 2002 at 10:49:47AM -0700
References: <Pine.LNX.4.44.0210311039310.21908-100000@xxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/
On Thu, Oct 31, 2002 at 10:49:47AM -0700, James Rich wrote:
> On another mailing list a debate arose about performance over a single
> disk vs. multiple disks.  It goes something like this:
> Suppose you have a 6 megabyte file stored on disk.  Would it be read
> faster if it were stored contiguously on a single disk or spread over
> multiple (say 4) disks?
> It seems to me that as you get smaller it is faster for the single disk
> case (remember that we are assuming the file is stored contiguously - not
> spread all over the disk).  At some size it seems natural that it would be
> faster if the file were spread over multiple disks.  Can anyone comment on
> how XFS would perform?  I don't have the equipment available to test this,
> but I'm not too concerned with actual benchmark numbers.  Mostly I'm just
> wondering if I understand the filesystem correctly.

If the file is stored on a single disk the read-time will be that of a
single seek + file_size/transfer_rate (approxemately). If the file is
spread over multiple disks the transfer-time will be reduced by a factor
equal to the number of disks, but depending on read-ahead and other
parameters the operation may happen serially or in parallell. If it
happens in perfect parellell it will naturally be faster on multiple
disks, but if it happens 100% serial order it will be slower because it
requires more seeks. 

So, the most obvious answer would be that for files with several MBs of
data the multiple-disk solution is faster, but it's not that simple. The
problem is that this is based on the assumption that only a single
operation is executed at once on the system - that's usually not the
case on real-life systems. 

If you do the same calculations when multiple operations are going on at
the same time you'll find that if you're reading 10 files from 10 disks
it's going to be a lot quicker if the files are _not_ spread out. 

It's a common mistake to make, because if you run simple (single
threaded) benchmarks the raid0 solution is the fastest one, but in real
life it's not always so.

Ragnar Kjørstad
Big Storage

<Prev in Thread] Current Thread [Next in Thread>