xfs
[Top] [All Lists]

Re: XFS IO multiplication problem on centos/rhel 6 using hp p420i raid c

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: XFS IO multiplication problem on centos/rhel 6 using hp p420i raid controllers
From: Dennis Kaarsemaker <dennis.kaarsemaker@xxxxxxxxxxx>
Date: Fri, 08 Mar 2013 12:49:00 +0100
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=booking.com; s=bk; h=Mime-Version:Content-Transfer-Encoding:Content-Type:References:In-Reply-To:Date:Cc:To:From:Subject:Message-ID; bh=lFIUVizvh9Gh1gKxAOXPvxr0xazVi+PWYvFNUtkE7Is=; b=YwNPXWQy6O/Anwsn+Uh75MIjAdFmmJqDUuU3SF99pynXxxseAWs0+q+ZpPqhEeoXIn/xYYCRijK9buL9YmYmfH3062n/wtsuSqYq7GH4J+eiSBQwRIP0Z90DkUk6dSGeXVt9IDlYChJdMKL9ZMt+ezqwvFIyyDiPSVC0kcvC2oo=;
In-reply-to: <1362733748.20926.6.camel@seahawk>
Organization: Booking.com
References: <1362060736.1247.30.camel@seahawk> <20130228194023.GQ5551@dastard> <1362577992.1247.84.camel@seahawk> <20130307035737.GC6369@dastard> <1362651128.16657.13.camel@seahawk> <20130307230020.GX23616@dastard> <1362733748.20926.6.camel@seahawk>
On Fri, 2013-03-08 at 10:09 +0100, Dennis Kaarsemaker wrote:
> On Fri, 2013-03-08 at 10:00 +1100, Dave Chinner wrote:
> > On Thu, Mar 07, 2013 at 11:12:08AM +0100, Dennis Kaarsemaker wrote:
> > > On Thu, 2013-03-07 at 14:57 +1100, Dave Chinner wrote:
> > > > On Wed, Mar 06, 2013 at 02:53:12PM +0100, Dennis Kaarsemaker wrote:
> > ....
> > > > > #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
> > > > > #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  
> > > > > KBOut  PktOut 
> > > > >    1   0  1636   4219     16      1   2336    313    184    195     
> > > > > 12     133 
> > > > >    1   0  1654   2804     64      3   2919    432    391    352     
> > > > > 20     208 
> > > > > 
> > > > > [root@bc291bprdb-01 ~]# collectl
> > > > > #<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
> > > > > #cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  
> > > > > KBOut  PktOut 
> > > > >    1   0  2220   3691    332     13  39992    331    112    122      
> > > > > 6      92 
> > > > >    0   0  1354   2708      0      0  39836    335    103    125      
> > > > > 9      99 
> > > > >    0   0  1563   3023    120      6  44036    369    399    317     
> > > > > 13     188 
> > > > > 
> > > > > Notice the KBWrit difference. These are two identical hp gen 8 
> > > > > machines,
> > > > > doing the same thing (replicating the same mysql schema). The one
> > > > > writing ten times as many bytes in the same amount of transactions is
> > > > > running centos 6 (and was running rhel 6).
> > > > 
> > > > So what is the problem? it is writing too much on the on the centos
> > > > 6 machine? Either way, this doesn't sound like a filesystem problem
> > > > - the size and amount of data writes is entirely determined by the
> > > > application.
> > > 
> > > For performing the same amount of work (processing the same mysql
> > > transactions, the same amount of IO transactions resulting from them),
> > > the 'broken' case writes ten-ish times as many bytes.
> > 
> > Thanks for clarifying.
> > 
> > > > > /dev/mapper/sysvm-mysqlVol /mysql/bp xfs 
> > > > > rw,relatime,attr2,delaylog,allocsize=1024k,logbsize=256k,sunit=512,swidth=1536,noquota
> > > > >  0 0
> > > > 
> > > > What is the reason for using allocsize, sunit/swidth? Are you using
> > > > them on other machines?
> > > 
> > > xfs autodetects them from the hpsa driver. They seem to be correct for
> > > the raid layout (256 strips, 3 drives per mirror pool) and I don't seem
> > > to be able to override them.
> > 
> > That's fine, they're set correctly. I'd forgotten that the number
> > are emitted in /proc/mounts even when they are not specified as
> > mount options.
> > 
> > > > And if you remove the allocsize mount option, does the behaviour on
> > > > centos6.3 change? What happens if you set allocsize=4k?
> > > 
> > > The allocsize parameter has no effect. It was put in place to correct a
> > > monitoring issue: due to mysql's access patterns, using the default
> > > large allocsize on rhel 6 makes our monitoring report the filesystem as
> > > much fuller than it actually is.
> > 
> > Which is due to speculative EOF preallocation, and so it is only set
> > on the CentOS box that is showing the larger write behaviour? Have
> > you tried setting it to 4k? If not, please do - EOF preallocation for
> > sparse extending writes can result in extra zeroing occurring, and
> > so if it is anything related to the filesystem, this is the likely
> > culprit. Setting it to 4k sets it back to the default value used
> > on older versions of Linux....
> 
> I've set it to 4k, but no change, though I haven't rebuilt the files yet
> with this setting (doing that as we speak, takes 90 minutes). I'm also
> wondering how this could cause the increasing bytes out as reported by
> vmstat, should zeroing do that?

Unfortunately, even on a rebuilt filesystem, the symptoms did not
change.
-- 
Dennis Kaarsemaker, Systems Architect
Booking.com
Herengracht 597, 1017 CE Amsterdam
Tel external +31 (0) 20 715 3409
Tel internal (7207) 3409

<Prev in Thread] Current Thread [Next in Thread>