xfs
[Top] [All Lists]

Re: definitions for /proc/fs/xfs/stat

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: definitions for /proc/fs/xfs/stat
From: Mark Seger <mjseger@xxxxxxxxx>
Date: Sun, 16 Jun 2013 18:31:13 -0400
Cc: Nathan Scott <nathans@xxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=LgMnEkDVNhLVlhuBGBo+js7z7wLavsqkPLySoRxpcEM=; b=kTISq9vSVie2XHp88PRvzbLTlOWViYIEWrmCvWFx5EYVOa1Jh0DSUreFuZNb093HZM Mf8oGnI/OEF+ZPPHXB2qv+8ZMqmwHaRl+QbvLCjN6cOhPg2wP9aDbbl59VZcUjRHAo0j vEETbGAE1/3/viPwVuylkuUgL46IwKOQ1qsOO4GBO00o5knJ0B2EAJW9qYcTyOU1mfWI xdAkbvdFshvM91lnLXndnYkhwCBQHiT8I3US13ovcaaEdLXG4sQCWyD5lW10Pyd+Zcm9 JpJyHxjMGPX1Qu/s8NhjLaAzNcYj6bpyZf+xqOptH7zfHev+gR6N+rhhMRhfCnSVGSX0 qKkA==
In-reply-to: <20130616220648.GG29338@dastard>
References: <CAC2B=ZFP_Fg34aFpk857stgB7MGcrYs9tybRS-ttw1CXNeU41Q@xxxxxxxxxxxxxx> <91017249.1356192.1371248207334.JavaMail.root@xxxxxxxxxx> <CAC2B=ZHYV6d-1PO_=-jXsQidZnYPHVwcVAaQh2mxJt=5K03AEA@xxxxxxxxxxxxxx> <504625587.1365681.1371255450937.JavaMail.root@xxxxxxxxxx> <CAC2B=ZF+eMyNLPQmhA_onDPEUqgNfcgCdZVvobNH9pofvioN7Q@xxxxxxxxxxxxxx> <20130615020414.GB29338@dastard> <CAC2B=ZEUkd+ADnQLUKj9S-3rdo2=93WbW0tbLbwwHUvkh6v7Rw@xxxxxxxxxxxxxx> <CAC2B=ZGgr5WPWOEehHDHKekM8yHgQ3QS4HMzM8+j217AfEoPyQ@xxxxxxxxxxxxxx> <20130616001130.GE29338@dastard> <CAC2B=ZFZskLnp5baVJK+R1xrpOfTkr1QXpA9jyHvxfk5Wd4yDg@xxxxxxxxxxxxxx> <20130616220648.GG29338@dastard>
There is no way that fallocate() of 1000x1k files should be causing
450MB/s of IO for 5 seconds.

I agree and that's what has me puzzled as well.
 
However, I still have no idea what you are running this test on - as
I asked in another email, can you provide some information about
the system your are seeing this problem on so we can try to work out
what might be causing this?

sorry about that.  This is an HP box with 192GB RAM and 6 2-core hyperthreaded CPUs, running ubuntu/precise

segerm@az1-sw-object-0006:~$ uname -a
Linux az1-sw-object-0006 2.6.38-16-server #68hf1026116v20120926-Ubuntu SMP Wed Sep 26 14:34:13 UTC 2012 x86_64 x86_64 x86_64 GNU/Linux

segerm@az1-sw-object-0006:~$ python --version
Python 2.7.1+

segerm@az1-sw-object-0006:~$ xfs_repair -V
xfs_repair version 3.1.4

segerm@az1-sw-object-0006:~$ cat /proc/meminfo
MemTotal:       198191696 kB
MemFree:        166202324 kB
Buffers:          193268 kB
Cached:         21595332 kB
SwapCached:           36 kB
Active:          7992268 kB
Inactive:       14068428 kB
Active(anon):     719408 kB
Inactive(anon):  5244092 kB
Active(file):    7272860 kB
Inactive(file):  8824336 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      126889980 kB
SwapFree:       126863972 kB
Dirty:                 0 kB
Writeback:             0 kB
AnonPages:        232156 kB
Mapped:            11384 kB
Shmem:           5731388 kB
Slab:            5755820 kB
SReclaimable:    4792292 kB
SUnreclaim:       963528 kB
KernelStack:        6240 kB
PageTables:         7328 kB
NFS_Unstable:          0 kB
Bounce:                0 kB
WritebackTmp:          0 kB
CommitLimit:    225985828 kB
Committed_AS:    8851976 kB
VmallocTotal:   34359738367 kB
VmallocUsed:      725112 kB
VmallocChunk:   34159749436 kB
HardwareCorrupted:     0 kB
HugePages_Total:       0
HugePages_Free:        0
HugePages_Rsvd:        0
HugePages_Surp:        0
Hugepagesize:       2048 kB
DirectMap4k:      118648 kB
DirectMap2M:     5089280 kB
DirectMap1G:    196083712 kB

over 60 mounts, but here's the one I'm writing to:

segerm@az1-sw-object-0006:~$ mount | grep disk0
/dev/sdc1 on /srv/node/disk0 type xfs (rw,nobarrier)

not sure what you're looking for here so here's it all

segerm@az1-sw-object-0006:~$ cat /proc/partitions
major minor  #blocks  name

   8        0  976762584 sda
   8        1     248976 sda1
   8        2          1 sda2
   8        5  976510993 sda5
 251        0   41943040 dm-0
 251        1    8785920 dm-1
 251        2    2928640 dm-2
   8       16  976762584 sdb
   8       17  976760832 sdb1
 251        3  126889984 dm-3
 251        4     389120 dm-4
 251        5   41943040 dm-5
   8       32 2930233816 sdc
   8       33 2930233344 sdc1
   8       48 2930233816 sdd
   8       49 2930233344 sdd1
   8       64 2930233816 sde
   8       65 2930233344 sde1
   8       80 2930233816 sdf
   8       81 2930233344 sdf1
   8       96 2930233816 sdg
   8       97 2930233344 sdg1
   8      112 2930233816 sdh
   8      113 2930233344 sdh1
   8      128 2930233816 sdi
   8      129 2930233344 sdi1
   8      144 2930233816 sdj
   8      145 2930233344 sdj1
   8      160 2930233816 sdk
   8      161 2930233344 sdk1
   8      176 2930233816 sdl
   8      177 2930233344 sdl1
   8      192 2930233816 sdm
   8      193 2930233344 sdm1
   8      208 2930233816 sdn
   8      209 2930233344 sdn1
   8      240 2930233816 sdp
   8      241 2930233344 sdp1
   8      224 2930233816 sdo
   8      225 2930233344 sdo1
  65        0 2930233816 sdq
  65        1 2930233344 sdq1
  65       16 2930233816 sdr
  65       17 2930233344 sdr1
  65       32 2930233816 sds
  65       33 2930233344 sds1
  65      176 2930233816 sdab
  65      177 2930233344 sdab1
  66        0 2930233816 sdag
  66        1 2930233344 sdag1
  65       64 2930233816 sdu
  65       65 2930233344 sdu1
  65      128 2930233816 sdy
  65      129 2930233344 sdy1
  65      224 2930233816 sdae
  65      225 2930233344 sdae1
  65      144 2930233816 sdz
  65      145 2930233344 sdz1
  65      208 2930233816 sdad
  65      209 2930233344 sdad1
  65       48 2930233816 sdt
  65       49 2930233344 sdt1
  65      240 2930233816 sdaf
  65      241 2930233344 sdaf1
  65       96 2930233816 sdw
  65       97 2930233344 sdw1
  66       32 2930233816 sdai
  66       33 2930233344 sdai1
  66       48 2930233816 sdaj
  66       49 2930233344 sdaj1
  65      112 2930233816 sdx
  65      113 2930233344 sdx1
  65      160 2930233816 sdaa
  65      161 2930233344 sdaa1
  65       80 2930233816 sdv
  65       81 2930233344 sdv1
  65      192 2930233816 sdac
  65      193 2930233344 sdac1
  66       16 2930233816 sdah
  66       17 2930233344 sdah1
  66       64 2930233816 sdak
  66       65 2930233344 sdak1
  66      144 2930233816 sdap
  66      145 2930233344 sdap1
  66      224 2930233816 sdau
  66      225 2930233344 sdau1
  67      208 2930233816 sdbj
  67      209 2930233344 sdbj1
  66      112 2930233816 sdan
  66      113 2930233344 sdan1
  66       96 2930233816 sdam
  66       97 2930233344 sdam1
  66       80 2930233816 sdal
  66       81 2930233344 sdal1
  67       96 2930233816 sdbc
  67       97 2930233344 sdbc1
  66      128 2930233816 sdao
  66      129 2930233344 sdao1
  67      128 2930233816 sdbe
  67      129 2930233344 sdbe1
  66      176 2930233816 sdar
  66      177 2930233344 sdar1
  67      160 2930233816 sdbg
  67      161 2930233344 sdbg1
  66      160 2930233816 sdaq
  66      161 2930233344 sdaq1
  67       48 2930233816 sdaz
  67       49 2930233344 sdaz1
  66      208 2930233816 sdat
  66      209 2930233344 sdat1
  67      112 2930233816 sdbd
  67      113 2930233344 sdbd1
  67      144 2930233816 sdbf
  67      145 2930233344 sdbf1
  67       64 2930233816 sdba
  67       65 2930233344 sdba1
  67       16 2930233816 sdax
  67       17 2930233344 sdax1
  67       32 2930233816 sday
  67       33 2930233344 sday1
  66      192 2930233816 sdas
  66      193 2930233344 sdas1
  67      192 2930233816 sdbi
  67      193 2930233344 sdbi1
  66      240 2930233816 sdav
  66      241 2930233344 sdav1
  67       80 2930233816 sdbb
  67       81 2930233344 sdbb1
  67      176 2930233816 sdbh
  67      177 2930233344 sdbh1
  67        0 2930233816 sdaw
  67        1 2930233344 sdaw1

segerm@az1-sw-object-0006:~$ xfs_info /srv/node/disk0
meta-data=""              isize=1024   agcount=32, agsize=22892416 blks
         =                       sectsz=512   attr=2
data     =                       bsize=4096   blocks=732557312, imaxpct=5
         =                       sunit=64     swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal               bsize=4096   blocks=357696, version=2
         =                       sectsz=512   sunit=64 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

Is this enough to describe my environment?


http://xfs.org/index.php/XFS_FAQ#Q:_What_information_should_I_include_when_reporting_a_problem.3F

> This is the same behavior I'm seeing on swift.  10K 1KB files X 4kb minimal
> block size still comes out to a lot less than the multiple GB of writes
> being reported.  Actually since these whole thing only takes a few seconds
> and I know a single disk can't write that fast maybe it's just a bug in the
> way the kernel is reported allocated preallocated blocks and nothing to do
> with XFS?  Or iis xfs responsible for the stats?

I don't know what stats you are getting your information from. XFS
is only responsible for what is in /proc/fs/xfs/,,,

I'm getting my stats from 2 places, collectl reading /proc/diskstats AND xfs from /proc/fs/xfs though I've only shown you the collectl stats.  It's easy enough to send you the /proc/fs/xfs stats if you want them as both formatted output as well as the raw /proc/fs/xfs contents every second from which they're derived.
 
> If I remove the fallocate call I see the expected amount of disk traffic.

Have you run strace to confirm that your python script is actually
calling fallocate(), and it's not being emulated by python or libc?

I haven't.  However when I saw this exact behavior with swift, we did see both fallocate being called and here's a small section showing a complete 10K PUT operation and the beginning of the next one:

    0.067874 cpu=0 pid=41977 fallocate [285] entry fd=15 mode=0x1 offset=0x0 len=10240
    0.067980 cpu=0 pid=41977 block_rq_insert dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144
    0.067983 cpu=0 pid=41977 block_rq_issue dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144
    0.068010 cpu=0 pid=41977 sched_switch prio=120 state=SSLEEP next_pid=0 next_prio=120
    0.068125 cpu=0 pid=0 block_rq_complete dev_t=0x04100030 wr=write flags=SYNC sector=0xaec11a00 len=262144 qpid=41977 spid=41977 qtm=  0.000004 svtm=  0.000142
    0.068175 cpu=0 pid=48168 sched_wakeup target_pid=41977 prio=120 target_cpu=12 success=1
    0.068180 cpu=12 pid=0 sched_switch prio=n/a state=n/a next_pid=41977 next_prio=120
    0.068185 cpu=12 pid=41977 fallocate [285] ret=0x0 syscallbeg=    0.000311 fd=15 mode=0x1 offset=0x0 len=10240

I hope this helps but if there's any more I can provide I'll be happy to do so.
-mark

 
Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>