xfs
[Top] [All Lists]

A huge latency in ext4 and xfs because of stable page write

To: linux-ext4@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, tytso@xxxxxxx, david@xxxxxxxxxxxxx, hch@xxxxxxxxxxxxx, darrick.wong@xxxxxxxxxx
Subject: A huge latency in ext4 and xfs because of stable page write
From: Zheng Liu <gnehzuil.liu@xxxxxxxxx>
Date: Tue, 11 Dec 2012 16:45:21 +0800
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=date:from:to:subject:message-id:mail-followup-to:mime-version :content-type:content-disposition:user-agent; bh=h/ainV14yKD/MMNNKrdwRXyr8129+Bd7CBLytTv1rRA=; b=CYibJGHeqHBTlZ2Ji2OTHHD489R8t22T3w0FvquzW6EX2j+bZIT0IUj7sYpIl5eK5G EYVAX6/XdPut02VBUZskFbW9rgeQYV9gwiteQJyeKZDQrh3bejcjuF5pKHccuf32v83J P2p+ygx4tns8s4SmXk7whzH6TwOHIdnZXrHqPKWmunb/IEkH5IY4ectBZrSZK9OO8tri WJra04Uq1pZ8FVyKhgcuuahvXGZxZxpyAC3LGL1XUd2cxCkU3DDrP2S6NnRV+y9Vxyrl +89gDS+ZRtFDrW3jOgBdF21K4Xs38ejbXPNBo79MdoW9jLNykbzcrF/L5x2Xx2SAqfCx jgQg==
Mail-followup-to: linux-ext4@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, tytso@xxxxxxx, david@xxxxxxxxxxxxx, hch@xxxxxxxxxxxxx, darrick.wong@xxxxxxxxxx
User-agent: Mutt/1.5.21 (2010-09-15)
Hi all,

At Tao Bao we meet a problem in our product system which causes a huge latency
because of stable page write.  This problem is easy to reproduce in a testing
environment, and I can reproduce it in my desktop with a SATA disk.  Here is the
fio config file that is used to reproduce this problem.

config file
-----------
[global]
iodepth=1
directory=/mnt/sda3
direct=0
group_reporting
thread
fallocate=0
runtime=120

[log-append]
ioengine=sync
rw=write
bs=1k
size=10g
filesize=10g
rate=5m
numjobs=1

I run this test case in ext4 and xfs, and both of them are affected by stable
page write.  The result shows as below.  Please notice the result of latency.

ext4 w/ stable page write
-------------------------
log-append: (groupid=0, jobs=1): err= 0: pid=3455: Tue Dec 11 14:41:27 2012
  write: io=614401KB, bw=5119.1KB/s, iops=5119 , runt=120001msec
    clat (usec): min=3 , max=225702 , avg= 7.32, stdev=411.95
     lat (usec): min=3 , max=225702 , avg= 7.50, stdev=411.95
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4], 
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    5], 
     | 70.00th=[    6], 80.00th=[   12], 90.00th=[   13], 95.00th=[   14],
     | 99.00th=[   15], 99.50th=[   16], 99.90th=[   18], 99.95th=[   19],
     | 99.99th=[   25] 
    bw (KB/s)  : min= 5108, max= 5134, per=100.00%, avg=5120.92, stdev= 1.58
    lat (usec) : 4=15.93%, 10=59.05%, 20=24.98%, 50=0.03%, 100=0.01%
    lat (usec) : 250=0.01%
    lat (msec) : 20=0.01%, 250=0.01%
  cpu          : usr=0.18%, sys=4.31%, ctx=118850, majf=18446744073709551605,
minf=18446744073709538970
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=614401/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=614401KB, aggrb=5119KB/s, minb=5119KB/s, maxb=5119KB/s,
mint=120001msec, maxt=120001msec

Disk stats (read/write):
  sda: ios=79/1287, merge=5/151907, ticks=702/147202, in_queue=147861, 
util=6.25%

ext4 w/o stable page write
--------------------------
log-append: (groupid=0, jobs=1): err= 0: pid=3062: Tue Dec 11 14:52:32 2012
  write: io=614400KB, bw=5119.1KB/s, iops=5119 , runt=120001msec
    clat (usec): min=3 , max=14565 , avg= 6.15, stdev=25.35
     lat (usec): min=3 , max=14566 , avg= 6.29, stdev=25.35
    clat percentiles (usec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    3], 20.00th=[    4], 
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    4], 
     | 70.00th=[    5], 80.00th=[   12], 90.00th=[   12], 95.00th=[   13],
     | 99.00th=[   15], 99.50th=[   15], 99.90th=[   17], 99.95th=[   18],
     | 99.99th=[   64] 
    bw (KB/s)  : min= 5108, max= 5140, per=100.00%, avg=5122.26, stdev= 4.54
    lat (usec) : 4=17.33%, 10=57.65%, 20=24.98%, 50=0.02%, 100=0.01%
    lat (usec) : 250=0.01%
    lat (msec) : 2=0.01%, 20=0.01%
  cpu          : usr=4.12%, sys=0.00%, ctx=58904, majf=18446744073709551605,
minf=18446744073709538969
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=614400/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=614400KB, aggrb=5119KB/s, minb=5119KB/s, maxb=5119KB/s,
mint=120001msec, maxt=120001msec

Disk stats (read/write):
  sda: ios=34/1254, merge=4/152032, ticks=157/144066, in_queue=147725, 
util=5.82%

xfs w/ stable page write
------------------------
log-append: (groupid=0, jobs=1): err= 0: pid=2848: Tue Dec 11 15:39:25 2012
  write: io=614401KB, bw=5119.1KB/s, iops=5119 , runt=120002msec
    clat (usec): min=1 , max=635643 , avg= 5.02, stdev=1141.37
     lat (usec): min=1 , max=635643 , avg= 5.17, stdev=1141.37
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    1], 10.00th=[    1], 20.00th=[    1], 
     | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    2], 
     | 70.00th=[    4], 80.00th=[    5], 90.00th=[    6], 95.00th=[    6], 
     | 99.00th=[    9], 99.50th=[   10], 99.90th=[   11], 99.95th=[   12],
     | 99.99th=[   13] 
    bw (KB/s)  : min= 1388, max=11632, per=100.00%, avg=5147.03, stdev=675.82
    lat (usec) : 2=21.76%, 4=46.03%, 10=31.65%, 20=0.56%, 50=0.01%
    lat (msec) : 20=0.01%, 750=0.01%
  cpu          : usr=1.09%, sys=1.18%, ctx=58250, majf=18446744073709551605,
minf=18446744073709538970
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=614401/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=614401KB, aggrb=5119KB/s, minb=5119KB/s, maxb=5119KB/s,
mint=120002msec, maxt=120002msec

Disk stats (read/write):
  sda: ios=15/1101, merge=0/28, ticks=150/554585, in_queue=554731, util=4.18%

xfs w/o stable page write
-------------------------
log-append: (groupid=0, jobs=1): err= 0: pid=3678: Tue Dec 11 15:01:10 2012
  write: io=614401KB, bw=5119.1KB/s, iops=5119 , runt=120001msec
    clat (usec): min=1 , max=20866 , avg= 3.50, stdev=26.72
     lat (usec): min=1 , max=20866 , avg= 3.60, stdev=26.72
    clat percentiles (usec):
     |  1.00th=[    1],  5.00th=[    1], 10.00th=[    1], 20.00th=[    2], 
     | 30.00th=[    2], 40.00th=[    2], 50.00th=[    2], 60.00th=[    2], 
     | 70.00th=[    5], 80.00th=[    6], 90.00th=[    6], 95.00th=[    8], 
     | 99.00th=[   10], 99.50th=[   10], 99.90th=[   12], 99.95th=[   12],
     | 99.99th=[   14] 
    bw (KB/s)  : min= 5110, max= 5132, per=100.00%, avg=5120.84, stdev= 2.25
    lat (usec) : 2=15.33%, 4=44.93%, 10=38.33%, 20=1.41%, 50=0.01%
    lat (usec) : 500=0.01%
    lat (msec) : 50=0.01%
  cpu          : usr=2.83%, sys=0.00%, ctx=119423, majf=18446744073709551605,
minf=18446744073709538968
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=614401/d=0, short=r=0/w=0/d=0

Run status group 0 (all jobs):
  WRITE: io=614401KB, aggrb=5119KB/s, minb=5119KB/s, maxb=5119KB/s,
mint=120001msec, maxt=120001msec

Disk stats (read/write):
  sda: ios=106/1055, merge=3/35, ticks=1042/522739, in_queue=523780, util=4.56%

We can see that the max of latency is very high with stable page write in ext4
and xfs, and it reduces to a lower value after stable page write is reverted.  I
summarize as below.

      ext4    xfs  (usec, lower is better)
w/  225702 635643
w/o  14565  20866

Hence, I wonder whether or not we could revert stable page write temporarily.
After it is improved, we could add it back again.

Thanks,
                                                - Zheng

<Prev in Thread] Current Thread [Next in Thread>