[Top] [All Lists]

Re: How to deal with XFS stripe geometry mismatch with hardware RAID5

To: troby <Thorn.Roby@xxxxxxxxxxxxx>
Subject: Re: How to deal with XFS stripe geometry mismatch with hardware RAID5
From: Brian Candler <B.Candler@xxxxxxxxx>
Date: Wed, 14 Mar 2012 21:05:14 +0000
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=date:from:to :cc:subject:message-id:references:mime-version:content-type :in-reply-to; s=sasl; bh=vgi6vS5KuBE+Fj6ShV4HweKe8tk=; b=qE88g/S pLt5WSMhmc2ILAou7TS/TIgrE7MYGnjP3sOM8zWxn809bkKYu08etsv5YB8p3jYB f5bFiBYywf+HjeBfCfXN/T2VJ//PI66EHhVrkwx9mgGrgvgFCSudTdVI1P3TotPL PREuNOKXvxdtgOcCUnRBDqBPvnVgrmJQv4Cw=
Domainkey-signature: a=rsa-sha1; c=nofws; d=pobox.com; h=date:from:to:cc :subject:message-id:references:mime-version:content-type :in-reply-to; q=dns; s=sasl; b=kCC/oPMwjBtQrakN6z/V2d2uEDLjScPOC fmJMZu+41d6l6+ZFKOClezgDdODiwqCTC3fO/FdkYoDH7HBSDlza3IvBdd0lc94t kGBJb3+U7pZ8s20hgQ3fHQXXN9kfX2REc9ez5PosK3Q+3RDXQAJh3lfgs/MSev/B YHNWBZCdM8=
In-reply-to: <33504048.post@xxxxxxxxxxxxxxx>
References: <33498437.post@xxxxxxxxxxxxxxx> <4F605877.2030304@xxxxxxxxxxxxxxxxx> <33504048.post@xxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Mar 14, 2012 at 10:43:44AM -0700, troby wrote:
> Mongo pre-allocates its datafiles and zero-fills them (there is a short
> header at the start of each, not rewritten as far as I know)  and then
> writes to them sequentially, wrapping around when it hits the end. In this
> case the entire load is inserts, no updates, hence the sequential writes.
> The data will not wrap around for about 6 months, at which time old files
> will be overwritten starting from the beginning. The BBU is functioning and
> the cache is set to write-back. The files are memory-mapped, I'll check
> whether fsync is used. Flushing is done about every 30 seconds and takes
> about 8 seconds.

How much data has been added to mongodb in those 30 seconds?

If everything really was being written sequentially then I reckon you could
write about 6.6GB in that time (11 disks x 75MB/sec x 8 seconds). From your
posting I suspect you are not achieving that level of performance :-)

If it really is being written sequentially to a continguous file then the
stripe alignment won't make any difference, because this is just a big
pre-allocated file, and XFS will do its best to give one big contiguous
chunk of space for it.

Anwyay, you don't need to guess these things, you can easily find out.

(1) Is the file preallocated and contiguous, or fragmented?

    # xfs_bmap /path/to/file

This will show you if you get one huge extent. If you get a number of large
extents (say 100MB+) that would be fine for performance too.  If you get
lots of shrapnel then there's a problem.

(2) Are you really writing sequentially?

    # btrace /dev/whatever | grep ' [DC] '

This will show you block requests dispatched [D] and completed [C] to the

And at a higher level:

    # strace -p <pid-of-mongodb-process>

will show you the seek/write/read operations that the application is

Once you have the answers to those, you can make a better judgement as to
what's happening.

(3) One other thing to check:

cat /sys/block/xxx/bdi/read_ahead_kb
cat /sys/block/xxx/queue/max_sectors_kb

Increasing those to 1024 (echo 1024 > ....) may make some improvement.

> One thing I'm wondering is whether the incorrect stripe structure I
> specified with mkfs is actually written into the file system structure

I am guessing that probably things like chunks of inodes are stripe-aligned. 
But if you're really writing sequentially to a huge contiguous file then it
won't matter anyway.



<Prev in Thread] Current Thread [Next in Thread>