xfs
[Top] [All Lists]

Re: XFS IO multiplication problem on centos/rhel 6 using hp p420i raid c

To: xfs@xxxxxxxxxxx
Subject: Re: XFS IO multiplication problem on centos/rhel 6 using hp p420i raid controllers
From: Dennis Kaarsemaker <dennis.kaarsemaker@xxxxxxxxxxx>
Date: Wed, 06 Mar 2013 14:53:12 +0100
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=booking.com; s=bk; h=Mime-Version:Content-Type:References:In-Reply-To:Date:To:From:Subject:Message-ID; bh=+w4+YyGqzUR9vNOHfvw0+E9AyqwVDaQU4AhnPlRMPxM=; b=DU+4efAWaKwyQAldTEZfNAz50tV6Hm8dDJdzzKGog9PGMIpqUTLJZyCMB6UuJXAgWWYpgm4pKC+o9bPuvkRK2xx4ZtYt88IY91mmRcz+emuc3SdB6/qb/gmzIWMn8YNvgwCX9pHYbw6Ngh0XmfUH0AqtvLt0CljtOPOP11vp/R4=;
In-reply-to: <20130228194023.GQ5551@dastard>
Organization: Booking.com
References: <1362060736.1247.30.camel@seahawk> <20130228194023.GQ5551@dastard>
On Fri, 2013-03-01 at 06:40 +1100, Dave Chinner wrote:
> On Thu, Feb 28, 2013 at 03:12:16PM +0100, Dennis Kaarsemaker wrote:
> > Hello XFS developers,
> > 
> > I have a problem as described in the subject. If I read the xfs website
> > correctly, this would be a place to ask for support with that problem.
> > Before I spam you all with details, please confirm if this is true or
> > direct me to a better place. Thanks!
> 
> CentOS/RHEL problems can be triaged up to a point here. i.e. we will
> make an effort to pinpoint the problem, but we give no guarantees
> and we definitely can't fix it. If you want a btter triage guarantee
> and to talk to someone who is able to fix the problem, you need to
> work through the problem with your RHEL support contact.

Hi Dave,

Thanks for responding. We have filed support tickets with HP and Red Hat
as well, I was trying to parallelize the search for an answer as the
problem is really getting in the way here. So much so that I've offered
a bottle of $favourite_drink on a serverfault question to the one who
solves it, that offer applies here too :)

> Either way:
> 
> http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

A summary of the problem is this:

[root@bc290bprdb-01 ~]# collectl
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
   1   0  1636   4219     16      1   2336    313    184    195     12     133 
   1   0  1654   2804     64      3   2919    432    391    352     20     208 

[root@bc291bprdb-01 ~]# collectl
#<----CPU[HYPER]-----><----------Disks-----------><----------Network---------->
#cpu sys inter  ctxsw KBRead  Reads KBWrit Writes   KBIn  PktIn  KBOut  PktOut 
   1   0  2220   3691    332     13  39992    331    112    122      6      92 
   0   0  1354   2708      0      0  39836    335    103    125      9      99 
   0   0  1563   3023    120      6  44036    369    399    317     13     188 

Notice the KBWrit difference. These are two identical hp gen 8 machines,
doing the same thing (replicating the same mysql schema). The one
writing ten times as many bytes in the same amount of transactions is
running centos 6 (and was running rhel 6).

Changing to gen7 hardware (hp p410 controller instead of p420), or to
centos 5 on newer hardware (older xfs version obviously) or using ext3
instead of xfs on either makes the writes "normal sized" again. We're
most likely doing something wrong with XFS but can't figure out what.

Any hint to get us moving in the right direction would be most helpful.

Now all the info asked for in that wikipage:

uname -a: Linux bc291bprdb-01.lhr4.prod.booking.com 2.6.32-279.1.1.el6.x86_64 
#1 SMP Tue Jul 10 13:47:21 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

xfsprogs version: xfs_repair version 3.1.1

number of CPU's: 2 x 8-core hyperthreaded Intel(R) Xeon(R) CPU E5-2670 0 @ 
2.60GHz

meminfo:

MemTotal:       99026008 kB
MemFree:          502692 kB
Buffers:          176964 kB
Cached:         44630620 kB
SwapCached:            0 kB
Active:         71178012 kB
Inactive:       24698980 kB
Active(anon):   48478524 kB
Inactive(anon):  2591228 kB
Active(file):   22699488 kB
Inactive(file): 22107752 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:       1048312 kB
SwapFree:        1048312 kB
Dirty:              4740 kB
Writeback:             0 kB
AnonPages:      51069600 kB
Mapped:            32992 kB
Shmem:               184 kB
Slab:            1517096 kB
SReclaimable:    1444392 kB
SUnreclaim:        72704 kB
KernelStack:        6152 kB
PageTables:       105940 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    50561316 kB
Committed_AS:   87014212 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      483464 kB
VmallocChunk:   34308527752 kB
HardwareCorrupted:     0 kB
AnonHugePages:  49096704 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:        8120 kB
DirectMap2M:     3102720 kB
DirectMap1G:    97517568 kB

mounts:

rootfs / rootfs rw 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
devtmpfs /dev devtmpfs rw,relatime,size=49503776k,nr_inodes=12375944,mode=755 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime 0 0
/dev/mapper/sysvm-root / ext4 rw,relatime,barrier=1,stripe=192,data=ordered 0 0
/proc/bus/usb /proc/bus/usb usbfs rw,relatime 0 0
/dev/sda1 /boot ext4 rw,relatime,barrier=1,stripe=768,data=ordered 0 0
none /proc/sys/fs/binfmt_misc binfmt_misc rw,relatime 0 0
debugfs /sys/kernel/debug debugfs rw,relatime 0 0
/dev/mapper/sysvm-mysqlVol /mysql/bp xfs 
rw,relatime,attr2,delaylog,allocsize=1024k,logbsize=256k,sunit=512,swidth=1536,noquota
 0 0

raid layout:

hp p420i raid controller
7 x 600 GB SAS disk (HP EG0600FBLSH)
raid 1+0 with one hot spare

LVM: default red hat config (lvm.conf attached)
The raid array is partitioned into /boot swap and a PV for LVM
One volume group containing that PV
2 LV's: root and mysql. Root is ext4, mysql xfs

[root@bc291bprdb-01 ~]# lvdisplay 
  --- Logical volume ---
  LV Path                /dev/sysvm/root
  LV Name                root
  VG Name                sysvm
  LV UUID                2xXk8Q-gor3-Ql0S-EKI3-dA20-E9el-FH8eDX
  LV Write Access        read/write
  LV Creation host, time bc291bprdb-01.lhr4.prod.booking.com, 2013-02-28 
09:36:01 +0100
  LV Status              available
  # open                 1
  LV Size                39.06 GiB
  Current LE             1250
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:0
   
  --- Logical volume ---
  LV Path                /dev/sysvm/mysqlVol
  LV Name                mysqlVol
  VG Name                sysvm
  LV UUID                v2yezw-Ry8i-wy2d-PjZD-QHeJ-refb-96oAH8
  LV Write Access        read/write
  LV Creation host, time bc291bprdb-01.lhr4.prod.booking.com, 2013-02-28 
09:49:58 +0100
  LV Status              available
  # open                 1
  LV Size                300.00 GiB
  Current LE             9600
  Segments               1
  Allocation             inherit
  Read ahead sectors     auto
  - currently set to     256
  Block device           253:1

type of disks: see raid config

write cache status: no disk write cache but raid controller cache

size of bbwc and mode:
   Cache Board Present: True
   Cache Status: OK
   Accelerator Ratio: 10% Read / 90% Write
   Cache Backup Power Source: Capacitors
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK

xfs_info:

[root@bc291bprdb-01 ~]# xfs_info /mysql/bp/
meta-data=/dev/mapper/sysvm-mysqlVol isize=256    agcount=16, agsize=4915136 
blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=78642176, imaxpct=25
         =                       sunit=64     swidth=192 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=38400, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0


And for reference, xfs_info on centos 5:

[root@bc290bprdb-01 ~]# xfs_info /mysql/bp/
meta-data=/dev/sysvm/mysqlVol    isize=256    agcount=22, agsize=4915200 blks
         =                       sectsz=512   attr=0
data     =                       bsize=4096   blocks=104857600, imaxpct=25
         =                       sunit=0      swidth=0 blks, unwritten=1
naming   =version 2              bsize=4096  
log      =internal               bsize=4096   blocks=32768, version=1
         =                       sectsz=512   sunit=0 blks, lazy-count=0
realtime =none                   extsz=4096   blocks=0, rtextents=0

dmesg output: there are no errors on the filesystem or in dmesg

iostat/vmstat output: attached
-- 
Dennis Kaarsemaker, Systems Architect
Booking.com
Herengracht 597, 1017 CE Amsterdam
Tel external +31 (0) 20 715 3409
Tel internal (7207) 3409

Attachment: lvm.conf
Description: Text document

Attachment: vmstat.rh5.txt
Description: Text document

Attachment: iostat.rh5.txt
Description: Text document

Attachment: vmstat.rh6.txt
Description: Text document

Attachment: iostat.rh6.txt
Description: Text document

<Prev in Thread] Current Thread [Next in Thread>