xfs
[Top] [All Lists]

Re: mount: Function not implemented?

To: Anthony Biacco <ABiacco@xxxxxxxxx>
Subject: Re: mount: Function not implemented?
From: Steve Lord <lord@xxxxxxx>
Date: Tue, 20 Jul 2004 10:59:10 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <74918D8CA17F7C418753F01078F10B6BD08616@xxxxxxxxxxxxxxxxxxxxxxxx>
References: <74918D8CA17F7C418753F01078F10B6BD08616@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mozilla Thunderbird 0.7.1 (X11/20040626)
Anthony Biacco wrote:

Raw devices don't do any bigger I/O's, this is merely the unit of

allocation used by the filesystem, not the unit of
I/O to the disk drives. XFS will still allocate disk space in large

contiguous chunks.

They ALLOW bigger I/Os.
Raw devices aren't limited by the page size or the OS cache. That's the
whole purpose of the raw device. I could use them, but I don't want the
maintenance nightmare of 1 oracle DB file per device.

If you read the linux kernel code, you will see that raw devices
are in exactly the same boat as the filesystem is when it comes
to actual I/Os.

Data is submitted to the block device in page sized chunks from
both. Up until a recent patch, which I think is only in the 2.6-mm
tree right now, the order memory was allocated in means that the
chances of two pages of memory in a user application being physically
contiguous and hence mergable into a single dma scatter gather
element was minimal to say the least.

The upshot of all of this is that, you submit a 100 Mbyte I/O from
user space using raw I/O, or O_DIRECT in a filesystem like XFS
which probably puts the whole 100 Mbytes in one spot on disk.
Because of memory allocation and the linux block layer, you end
up splitting it into SCSI commands which contain at most 128
pages each (you run out of scatter gather elements). Given a
large enough memory and channel bandwidth, you are much more
likely to saturate your controller than the channel bandwidth.

Oh, and O_DIRECT and raw device access will actually chop that
large I/O up anyway because they both use the same code to limit
how much user memory you can have pinned down for I/O at once.



Large block sizes I think helped Irix more than they would Linux.


Agreed.

*sigh* maybe I'll check out OCFS.


Which is sitting on exactly the same infrastructure here.

Steve



<Prev in Thread] Current Thread [Next in Thread>