xfs
[Top] [All Lists]

[PATCH 2/2][v2] blk-plug: don't flush nested plug lists

To: Jens Axboe <axboe@xxxxxxxxx>, Ming Lei <tom.leiming@xxxxxxxxx>
Subject: [PATCH 2/2][v2] blk-plug: don't flush nested plug lists
From: Jeff Moyer <jmoyer@xxxxxxxxxx>
Date: Tue, 07 Apr 2015 14:55:13 -0400
Cc: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx>, "Roger Pau Monn\?\?" <roger.pau@xxxxxxxxxx>, Alasdair Kergon <agk@xxxxxxxxxx>, Mike Snitzer <snitzer@xxxxxxxxxx>, Neil Brown <neilb@xxxxxxx>, "Nicholas A. Bellinger" <nab@xxxxxxxxxxxxxxx>, Alexander Viro <viro@xxxxxxxxxxxxxxxxxx>, Chris Mason <clm@xxxxxx>, Josef Bacik <jbacik@xxxxxx>, David Sterba <dsterba@xxxxxxx>, "Theodore Ts'o" <tytso@xxxxxxx>, Andreas Dilger <adilger.kernel@xxxxxxxxx>, Jaegeuk Kim <jaegeuk@xxxxxxxxxx>, Changman Lee <cm224.lee@xxxxxxxxxxx>, Steven Whitehouse <swhiteho@xxxxxxxxxx>, Mikulas Patocka <mikulas@xxxxxxxxxxxxxxxxxxxxxxxx>, Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx>, Rik van Riel <riel@xxxxxxxxxx>, Johannes Weiner <hannes@xxxxxxxxxxx>, Mel Gorman <mgorman@xxxxxxx>, Trond Myklebust <trond.myklebust@xxxxxxxxxxxxxxx>, Anna Schumaker <anna.schumaker@xxxxxxxxxx>, Dave Chinner <david@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Christoph Hellwig <hch@xxxxxx>, Weston Andros Adamson <dros@xxxxxxxxxxxxxxx>, "Martin K. Petersen" <martin.petersen@xxxxxxxxxx>, Sagi Grimberg <sagig@xxxxxxxxxxxx>, Tejun Heo <tj@xxxxxxxxxx>, Fabian Frederick <fabf@xxxxxxxxx>, Matthew Wilcox <matthew.r.wilcox@xxxxxxxxx>, Ming Lei <ming.lei@xxxxxxxxxxxxx>, "Kirill A. Shutemov" <kirill.shutemov@xxxxxxxxxxxxxxx>, Wang Sheng-Hui <shhuiw@xxxxxxxxx>, Michal Hocko <mhocko@xxxxxxx>, Joe Perches <joe@xxxxxxxxxxx>, Miklos Szeredi <mszeredi@xxxxxxx>, Namjae Jeon <namjae.jeon@xxxxxxxxxxx>, Mark Rustad <mark.d.rustad@xxxxxxxxx>, Jianyu Zhan <nasa4836@xxxxxxxxx>, Fengguang Wu <fengguang.wu@xxxxxxxxx>, Vladimir Davydov <vdavydov@xxxxxxxxxxxxx>, Vlastimil Babka <vbabka@xxxxxxx>, Suleiman Souhlal <suleiman@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, dm-devel@xxxxxxxxxx, xen-devel@xxxxxxxxxxxxxxxxxxxx, linux-raid@xxxxxxxxxxxxxxx, linux-scsi@xxxxxxxxxxxxxxx, target-devel@xxxxxxxxxxxxxxx, linux-fsdevel@xxxxxxxxxxxxxxx, linux-aio@xxxxxxxxx, linux-btrfs@xxxxxxxxxxxxxxx, linux-ext4@xxxxxxxxxxxxxxx, linux-f2fs-devel@xxxxxxxxxxxxxxxxxxxxx, cluster-devel@xxxxxxxxxx, linux-nfs@xxxxxxxxxxxxxxx, linux-mm@xxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <1428347694-17704-2-git-send-email-jmoyer@xxxxxxxxxx> (Jeff Moyer's message of "Mon, 6 Apr 2015 15:14:54 -0400")
References: <1428347694-17704-1-git-send-email-jmoyer@xxxxxxxxxx> <1428347694-17704-2-git-send-email-jmoyer@xxxxxxxxxx>
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (gnu/linux)
The way the on-stack plugging currently works, each nesting level
flushes its own list of I/Os.  This can be less than optimal (read
awful) for certain workloads.  For example, consider an application
that issues asynchronous O_DIRECT I/Os.  It can send down a bunch of
I/Os together in a single io_submit call, only to have each of them
dispatched individually down in the bowels of the dirct I/O code.
The reason is that there are blk_plug-s instantiated both at the upper
call site in do_io_submit and down in do_direct_IO.  The latter will
submit as little as 1 I/O at a time (if you have a small enough I/O
size) instead of performing the batching that the plugging
infrastructure is supposed to provide.

Now, for the case where there is an elevator involved, this doesn't
really matter too much.  The elevator will keep the I/O around long
enough for it to be merged.  However, in cases where there is no
elevator (like blk-mq), I/Os are simply dispatched immediately.

Try this, for example:

fio --rw=read --bs=4k --iodepth=128 --iodepth_batch=16 
--iodepth_batch_complete=16 --runtime=10s --direct=1 --filename=/dev/vdd 
--name=job1 --ioengine=libaio --time_based

If you run that on a current kernel, you will get zero merges.  Zero!
After this patch, you will get many merges (the actual number depends
on how fast your storage is, obviously), and much better throughput.
Here are results from my test systems:

First, I tested in a VM using a virtio-blk device:

Unpatched kernel:
 Throughput: 280,262 KB/s
avg latency: 14,587.72 usec

Patched kernel:
 throughput: 832,158 KB/s
avg latency:   4,901.95 usec

Next, I tesetd using a micron p320h on bare metal:

Unpatched kernel:
 Throughput: 688,967 KB/s
avg latency: 5,933.92 usec

Patched kernel:
 Throughput: 1,160.6 MB/s
avg latency: 3,437.01 usec

As you can see, both throughput and latency improved dramatically.
I've included the full fio output below, so you can also see the
marked improvement in standard deviation as well.

I considered several approaches to solving the problem:
1)  get rid of the inner-most plugs
2)  handle nesting by using only one on-stack plug
2a) #2, except use a per-cpu blk_plug struct, which may clean up the
    code a bit at the expense of memory footprint

Option 1 will be tricky or impossible to do, since inner most plug
lists are sometimes the only plug lists, depending on the call path.
Option 2 is what this patch implements.  Option 2a may add unneeded
complexity.

Much of the patch involves modifying call sites to blk_finish_plug,
since its signature is changed.  The meat of the patch is actually
pretty simple and constrained to block/blk-core.c and
include/linux/blkdev.h.  The only tricky bits were places where plugs
were finished and then restarted to flush out I/O.  There, I left things
as-is.  So long as they are the outer-most plugs, they should continue
to function as before.

NOTE TO SUBSYSTEM MAINTAINERS: Before this patch, blk_finish_plug
would always flush the plug list.  After this patch, this is only the
case for the outer-most plug.  If you require the plug list to be
flushed, you should be calling blk_flush_plug(current).  Btrfs and dm
maintainers should take a close look at this patch and ensure they get
the right behavior in the end.

Signed-off-by: Jeff Moyer <jmoyer@xxxxxxxxxx>

---
Changelog:
v1->v2: Keep the blk_start_plug interface the same, suggested by Ming Lei.

Test results
------------
Virtio-blk:

unpatched:

job1: (groupid=0, jobs=1): err= 0: pid=8032: Tue Apr  7 13:33:53 2015
  read : io=2736.1MB, bw=280262KB/s, iops=70065, runt= 10000msec
    slat (usec): min=40, max=10472, avg=207.82, stdev=364.02
    clat (usec): min=211, max=35883, avg=14379.83, stdev=2213.95
     lat (usec): min=862, max=36000, avg=14587.72, stdev=2223.80
    clat percentiles (usec):
     |  1.00th=[11328],  5.00th=[12096], 10.00th=[12480], 20.00th=[12992],
     | 30.00th=[13376], 40.00th=[13760], 50.00th=[14144], 60.00th=[14400],
     | 70.00th=[14784], 80.00th=[15168], 90.00th=[15936], 95.00th=[16768],
     | 99.00th=[24448], 99.50th=[25216], 99.90th=[28544], 99.95th=[35072],
     | 99.99th=[36096]
    bw (KB  /s): min=265984, max=302720, per=100.00%, avg=280549.84, 
stdev=10264.36
    lat (usec) : 250=0.01%, 1000=0.01%
    lat (msec) : 2=0.02%, 4=0.02%, 10=0.05%, 20=96.57%, 50=3.34%
  cpu          : usr=7.56%, sys=55.57%, ctx=6174, majf=0, minf=523
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=700656/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1024

Run status group 0 (all jobs):
   READ: io=2736.1MB, aggrb=280262KB/s, minb=280262KB/s, maxb=280262KB/s, 
mint=10000msec, maxt=10000msec

Disk stats (read/write):
  vdd: ios=695490/0, merge=0/0, ticks=785741/0, in_queue=785442, util=90.69%


patched:
job1: (groupid=0, jobs=1): err= 0: pid=7743: Tue Apr  7 13:19:07 2015
  read : io=8126.6MB, bw=832158KB/s, iops=208039, runt= 10000msec
    slat (usec): min=20, max=14351, avg=55.08, stdev=143.47
    clat (usec): min=283, max=20003, avg=4846.77, stdev=1355.35
     lat (usec): min=609, max=20074, avg=4901.95, stdev=1362.40
    clat percentiles (usec):
     |  1.00th=[ 4016],  5.00th=[ 4048], 10.00th=[ 4080], 20.00th=[ 4128],
     | 30.00th=[ 4192], 40.00th=[ 4192], 50.00th=[ 4256], 60.00th=[ 4512],
     | 70.00th=[ 4896], 80.00th=[ 5664], 90.00th=[ 5920], 95.00th=[ 6752],
     | 99.00th=[11968], 99.50th=[13632], 99.90th=[15552], 99.95th=[17024],
     | 99.99th=[19840]
    bw (KB  /s): min=740992, max=896640, per=100.00%, avg=836978.95, 
stdev=51034.87
    lat (usec) : 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 4=0.50%, 10=97.79%, 20=1.70%, 50=0.01%
  cpu          : usr=20.28%, sys=69.11%, ctx=879, majf=0, minf=522
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=2080396/w=0/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1024

Run status group 0 (all jobs):
   READ: io=8126.6MB, aggrb=832158KB/s, minb=832158KB/s, maxb=832158KB/s, 
mint=10000msec, maxt=10000msec

Disk stats (read/write):
  vdd: ios=127877/0, merge=1918166/0, ticks=23118/0, in_queue=23047, util=94.08%

micron p320h:

unpatched:

job1: (groupid=0, jobs=1): err= 0: pid=3244: Tue Apr  7 13:29:14 2015
  read : io=6728.9MB, bw=688968KB/s, iops=172241, runt= 10001msec
    slat (usec): min=43, max=6273, avg=81.79, stdev=125.96
    clat (usec): min=78, max=12485, avg=5852.06, stdev=1154.76
     lat (usec): min=146, max=12572, avg=5933.92, stdev=1163.75
    clat percentiles (usec):
     |  1.00th=[ 4192],  5.00th=[ 4384], 10.00th=[ 4576], 20.00th=[ 5600],
     | 30.00th=[ 5664], 40.00th=[ 5728], 50.00th=[ 5792], 60.00th=[ 5856],
     | 70.00th=[ 6112], 80.00th=[ 6176], 90.00th=[ 6240], 95.00th=[ 6368],
     | 99.00th=[11840], 99.50th=[11968], 99.90th=[12096], 99.95th=[12096],
     | 99.99th=[12224]
    bw (KB  /s): min=648328, max=859264, per=98.80%, avg=680711.16, 
stdev=62016.70
    lat (usec) : 100=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=0.04%, 10=97.07%, 20=2.87%
  cpu          : usr=10.28%, sys=73.61%, ctx=104436, majf=0, minf=6217
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=1722592/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1024

Run status group 0 (all jobs):
   READ: io=6728.9MB, aggrb=688967KB/s, minb=688967KB/s, maxb=688967KB/s, 
mint=10001msec, maxt=10001msec

Disk stats (read/write):
  rssda: ios=1688772/0, merge=0/0, ticks=188820/0, in_queue=188678, util=96.61%

patched:

job1: (groupid=0, jobs=1): err= 0: pid=9531: Tue Apr  7 13:22:28 2015
  read : io=11607MB, bw=1160.6MB/s, iops=297104, runt= 10001msec
    slat (usec): min=21, max=6376, avg=43.05, stdev=81.82
    clat (usec): min=116, max=9844, avg=3393.90, stdev=752.57
     lat (usec): min=167, max=9889, avg=3437.01, stdev=757.02
    clat percentiles (usec):
     |  1.00th=[ 2832],  5.00th=[ 2992], 10.00th=[ 3056], 20.00th=[ 3120],
     | 30.00th=[ 3152], 40.00th=[ 3248], 50.00th=[ 3280], 60.00th=[ 3344],
     | 70.00th=[ 3376], 80.00th=[ 3504], 90.00th=[ 3728], 95.00th=[ 3824],
     | 99.00th=[ 9152], 99.50th=[ 9408], 99.90th=[ 9664], 99.95th=[ 9664],
     | 99.99th=[ 9792]
    bw (MB  /s): min= 1139, max= 1183, per=100.00%, avg=1161.07, stdev=10.58
    lat (usec) : 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
    lat (msec) : 2=0.01%, 4=98.31%, 10=1.67%
  cpu          : usr=18.59%, sys=66.65%, ctx=55655, majf=0, minf=6218
  IO depths    : 1=0.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.1%, 32=0.1%, >=64=100.0%
     submit    : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=0.0%, 8=0.0%, 16=100.0%, 32=0.0%, 64=0.0%, >=64=0.1%
     issued    : total=r=2971338/w=0/d=0, short=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1024

Run status group 0 (all jobs):
   READ: io=11607MB, aggrb=1160.6MB/s, minb=1160.6MB/s, maxb=1160.6MB/s, 
mint=10001msec, maxt=10001msec

Disk stats (read/write):
  rssda: ios=183005/0, merge=2745105/0, ticks=31972/0, in_queue=31948, 
util=97.63%
---
 block/blk-core.c                    | 29 ++++++++++++++++-------------
 block/blk-lib.c                     |  2 +-
 block/blk-throttle.c                |  2 +-
 drivers/block/xen-blkback/blkback.c |  2 +-
 drivers/md/dm-bufio.c               |  6 +++---
 drivers/md/dm-crypt.c               |  2 +-
 drivers/md/dm-kcopyd.c              |  2 +-
 drivers/md/dm-thin.c                |  2 +-
 drivers/md/md.c                     |  2 +-
 drivers/md/raid1.c                  |  2 +-
 drivers/md/raid10.c                 |  2 +-
 drivers/md/raid5.c                  |  4 ++--
 drivers/target/target_core_iblock.c |  2 +-
 fs/aio.c                            |  2 +-
 fs/block_dev.c                      |  2 +-
 fs/btrfs/scrub.c                    |  2 +-
 fs/btrfs/transaction.c              |  2 +-
 fs/btrfs/tree-log.c                 | 12 ++++++------
 fs/btrfs/volumes.c                  |  6 +++---
 fs/buffer.c                         |  2 +-
 fs/direct-io.c                      |  2 +-
 fs/ext4/file.c                      |  2 +-
 fs/ext4/inode.c                     |  4 ++--
 fs/f2fs/checkpoint.c                |  2 +-
 fs/f2fs/gc.c                        |  2 +-
 fs/f2fs/node.c                      |  2 +-
 fs/gfs2/log.c                       |  2 +-
 fs/hpfs/buffer.c                    |  2 +-
 fs/jbd/checkpoint.c                 |  2 +-
 fs/jbd/commit.c                     |  4 ++--
 fs/jbd2/checkpoint.c                |  2 +-
 fs/jbd2/commit.c                    |  2 +-
 fs/mpage.c                          |  2 +-
 fs/nfs/blocklayout/blocklayout.c    |  4 ++--
 fs/xfs/xfs_buf.c                    |  4 ++--
 fs/xfs/xfs_dir2_readdir.c           |  2 +-
 fs/xfs/xfs_itable.c                 |  2 +-
 include/linux/blkdev.h              |  5 +++--
 mm/madvise.c                        |  2 +-
 mm/page-writeback.c                 |  2 +-
 mm/readahead.c                      |  2 +-
 mm/swap_state.c                     |  2 +-
 mm/vmscan.c                         |  2 +-
 43 files changed, 74 insertions(+), 70 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index 794c3e7..fcd9c2f 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -3018,21 +3018,21 @@ void blk_start_plug(struct blk_plug *plug)
 {
        struct task_struct *tsk = current;
 
+       if (tsk->plug) {
+               tsk->plug->depth++;
+               return;
+       }
+
+       plug->depth = 1;
        INIT_LIST_HEAD(&plug->list);
        INIT_LIST_HEAD(&plug->mq_list);
        INIT_LIST_HEAD(&plug->cb_list);
 
        /*
-        * If this is a nested plug, don't actually assign it. It will be
-        * flushed on its own.
+        * Store ordering should not be needed here, since a potential
+        * preempt will imply a full memory barrier
         */
-       if (!tsk->plug) {
-               /*
-                * Store ordering should not be needed here, since a potential
-                * preempt will imply a full memory barrier
-                */
-               tsk->plug = plug;
-       }
+       tsk->plug = plug;
 }
 EXPORT_SYMBOL(blk_start_plug);
 
@@ -3177,12 +3177,15 @@ void blk_flush_plug_list(struct blk_plug *plug, bool 
from_schedule)
        local_irq_restore(flags);
 }
 
-void blk_finish_plug(struct blk_plug *plug)
+void blk_finish_plug(void)
 {
-       blk_flush_plug_list(plug, false);
+       struct blk_plug *plug = current->plug;
 
-       if (plug == current->plug)
-               current->plug = NULL;
+       if (--plug->depth > 0)
+               return;
+
+       blk_flush_plug_list(plug, false);
+       current->plug = NULL;
 }
 EXPORT_SYMBOL(blk_finish_plug);
 
diff --git a/block/blk-lib.c b/block/blk-lib.c
index 7688ee3..ac347d3 100644
--- a/block/blk-lib.c
+++ b/block/blk-lib.c
@@ -128,7 +128,7 @@ int blkdev_issue_discard(struct block_device *bdev, 
sector_t sector,
                 */
                cond_resched();
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        /* Wait for bios in-flight */
        if (!atomic_dec_and_test(&bb.done))
diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 5b9c6d5..222a77a 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -1281,7 +1281,7 @@ static void blk_throtl_dispatch_work_fn(struct 
work_struct *work)
                blk_start_plug(&plug);
                while((bio = bio_list_pop(&bio_list_on_stack)))
                        generic_make_request(bio);
-               blk_finish_plug(&plug);
+               blk_finish_plug();
        }
 }
 
diff --git a/drivers/block/xen-blkback/blkback.c 
b/drivers/block/xen-blkback/blkback.c
index 2a04d34..74bea21 100644
--- a/drivers/block/xen-blkback/blkback.c
+++ b/drivers/block/xen-blkback/blkback.c
@@ -1374,7 +1374,7 @@ static int dispatch_rw_block_io(struct xen_blkif *blkif,
                submit_bio(operation, biolist[i]);
 
        /* Let the I/Os go.. */
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        if (operation == READ)
                blkif->st_rd_sect += preq.nr_sects;
diff --git a/drivers/md/dm-bufio.c b/drivers/md/dm-bufio.c
index 86dbbc7..502c63b 100644
--- a/drivers/md/dm-bufio.c
+++ b/drivers/md/dm-bufio.c
@@ -715,7 +715,7 @@ static void __flush_write_list(struct list_head *write_list)
                submit_io(b, WRITE, b->block, write_endio);
                dm_bufio_cond_resched();
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 /*
@@ -1126,7 +1126,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
                                &write_list);
                if (unlikely(!list_empty(&write_list))) {
                        dm_bufio_unlock(c);
-                       blk_finish_plug(&plug);
+                       blk_finish_plug();
                        __flush_write_list(&write_list);
                        blk_start_plug(&plug);
                        dm_bufio_lock(c);
@@ -1149,7 +1149,7 @@ void dm_bufio_prefetch(struct dm_bufio_client *c,
        dm_bufio_unlock(c);
 
 flush_plug:
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 EXPORT_SYMBOL_GPL(dm_bufio_prefetch);
 
diff --git a/drivers/md/dm-crypt.c b/drivers/md/dm-crypt.c
index 713a962..65d7b72 100644
--- a/drivers/md/dm-crypt.c
+++ b/drivers/md/dm-crypt.c
@@ -1224,7 +1224,7 @@ pop_from_list:
                        rb_erase(&io->rb_node, &write_tree);
                        kcryptd_io_write(io);
                } while (!RB_EMPTY_ROOT(&write_tree));
-               blk_finish_plug(&plug);
+               blk_finish_plug();
        }
        return 0;
 }
diff --git a/drivers/md/dm-kcopyd.c b/drivers/md/dm-kcopyd.c
index 3a7cade..4a76e42 100644
--- a/drivers/md/dm-kcopyd.c
+++ b/drivers/md/dm-kcopyd.c
@@ -593,7 +593,7 @@ static void do_work(struct work_struct *work)
        process_jobs(&kc->complete_jobs, kc, run_complete_job);
        process_jobs(&kc->pages_jobs, kc, run_pages_job);
        process_jobs(&kc->io_jobs, kc, run_io_job);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 /*
diff --git a/drivers/md/dm-thin.c b/drivers/md/dm-thin.c
index 921aafd..be42bf5 100644
--- a/drivers/md/dm-thin.c
+++ b/drivers/md/dm-thin.c
@@ -1824,7 +1824,7 @@ static void process_thin_deferred_bios(struct thin_c *tc)
                        dm_pool_issue_prefetches(pool->pmd);
                }
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 static int cmp_cells(const void *lhs, const void *rhs)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 717daad..c4ec179 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7686,7 +7686,7 @@ void md_do_sync(struct md_thread *thread)
        /*
         * this also signals 'finished resyncing' to md_stop
         */
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        wait_event(mddev->recovery_wait, !atomic_read(&mddev->recovery_active));
 
        /* tell personality that we are finished */
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index d34e238..4f8fad4 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2441,7 +2441,7 @@ static void raid1d(struct md_thread *thread)
                if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
                        md_check_recovery(mddev);
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 static int init_resync(struct r1conf *conf)
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a7196c4..92bb5dd 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -2835,7 +2835,7 @@ static void raid10d(struct md_thread *thread)
                if (mddev->flags & ~(1<<MD_CHANGE_PENDING))
                        md_check_recovery(mddev);
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 static int init_resync(struct r10conf *conf)
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index cd2f96b..695bf0f 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5281,7 +5281,7 @@ static void raid5_do_work(struct work_struct *work)
        pr_debug("%d stripes handled\n", handled);
 
        spin_unlock_irq(&conf->device_lock);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        pr_debug("--- raid5worker inactive\n");
 }
@@ -5352,7 +5352,7 @@ static void raid5d(struct md_thread *thread)
        spin_unlock_irq(&conf->device_lock);
 
        async_tx_issue_pending_all();
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        pr_debug("--- raid5d inactive\n");
 }
diff --git a/drivers/target/target_core_iblock.c 
b/drivers/target/target_core_iblock.c
index d4a4b0f..17d8730 100644
--- a/drivers/target/target_core_iblock.c
+++ b/drivers/target/target_core_iblock.c
@@ -367,7 +367,7 @@ static void iblock_submit_bios(struct bio_list *list, int 
rw)
        blk_start_plug(&plug);
        while ((bio = bio_list_pop(list)))
                submit_bio(rw, bio);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 static void iblock_end_io_flush(struct bio *bio, int err)
diff --git a/fs/aio.c b/fs/aio.c
index f8e52a1..b873698 100644
--- a/fs/aio.c
+++ b/fs/aio.c
@@ -1616,7 +1616,7 @@ long do_io_submit(aio_context_t ctx_id, long nr,
                if (ret)
                        break;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        percpu_ref_put(&ctx->users);
        return i ? i : ret;
diff --git a/fs/block_dev.c b/fs/block_dev.c
index 975266b..f5848de 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -1609,7 +1609,7 @@ ssize_t blkdev_write_iter(struct kiocb *iocb, struct 
iov_iter *from)
                if (err < 0)
                        ret = err;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        return ret;
 }
 EXPORT_SYMBOL_GPL(blkdev_write_iter);
diff --git a/fs/btrfs/scrub.c b/fs/btrfs/scrub.c
index ec57687..f314cfb8 100644
--- a/fs/btrfs/scrub.c
+++ b/fs/btrfs/scrub.c
@@ -3316,7 +3316,7 @@ out:
        scrub_wr_submit(sctx);
        mutex_unlock(&sctx->wr_ctx.wr_lock);
 
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        btrfs_free_path(path);
        btrfs_free_path(ppath);
        return ret < 0 ? ret : 0;
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index 8be4278..fee10af 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -983,7 +983,7 @@ static int btrfs_write_and_wait_marked_extents(struct 
btrfs_root *root,
 
        blk_start_plug(&plug);
        ret = btrfs_write_marked_extents(root, dirty_pages, mark);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        ret2 = btrfs_wait_marked_extents(root, dirty_pages, mark);
 
        if (ret)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index c5b8ba3..879c7fd 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -2574,7 +2574,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
        blk_start_plug(&plug);
        ret = btrfs_write_marked_extents(log, &log->dirty_log_pages, mark);
        if (ret) {
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                btrfs_abort_transaction(trans, root, ret);
                btrfs_free_logged_extents(log, log_transid);
                btrfs_set_log_full_commit(root->fs_info, trans);
@@ -2619,7 +2619,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
                if (!list_empty(&root_log_ctx.list))
                        list_del_init(&root_log_ctx.list);
 
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                btrfs_set_log_full_commit(root->fs_info, trans);
 
                if (ret != -ENOSPC) {
@@ -2635,7 +2635,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
        }
 
        if (log_root_tree->log_transid_committed >= root_log_ctx.log_transid) {
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                mutex_unlock(&log_root_tree->log_mutex);
                ret = root_log_ctx.log_ret;
                goto out;
@@ -2643,7 +2643,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
 
        index2 = root_log_ctx.log_transid % 2;
        if (atomic_read(&log_root_tree->log_commit[index2])) {
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                ret = btrfs_wait_marked_extents(log, &log->dirty_log_pages,
                                                mark);
                btrfs_wait_logged_extents(trans, log, log_transid);
@@ -2669,7 +2669,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
         * check the full commit flag again
         */
        if (btrfs_need_log_full_commit(root->fs_info, trans)) {
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                btrfs_wait_marked_extents(log, &log->dirty_log_pages, mark);
                btrfs_free_logged_extents(log, log_transid);
                mutex_unlock(&log_root_tree->log_mutex);
@@ -2680,7 +2680,7 @@ int btrfs_sync_log(struct btrfs_trans_handle *trans,
        ret = btrfs_write_marked_extents(log_root_tree,
                                         &log_root_tree->dirty_log_pages,
                                         EXTENT_DIRTY | EXTENT_NEW);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        if (ret) {
                btrfs_set_log_full_commit(root->fs_info, trans);
                btrfs_abort_transaction(trans, root, ret);
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 8222f6f..16db068 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -358,7 +358,7 @@ loop_lock:
                if (pending_bios == &device->pending_sync_bios) {
                        sync_pending = 1;
                } else if (sync_pending) {
-                       blk_finish_plug(&plug);
+                       blk_finish_plug();
                        blk_start_plug(&plug);
                        sync_pending = 0;
                }
@@ -415,7 +415,7 @@ loop_lock:
                }
                /* unplug every 64 requests just for good measure */
                if (batch_run % 64 == 0) {
-                       blk_finish_plug(&plug);
+                       blk_finish_plug();
                        blk_start_plug(&plug);
                        sync_pending = 0;
                }
@@ -431,7 +431,7 @@ loop_lock:
        spin_unlock(&device->io_lock);
 
 done:
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 static void pending_bios_fn(struct btrfs_work *work)
diff --git a/fs/buffer.c b/fs/buffer.c
index 20805db..8181c44 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -758,7 +758,7 @@ static int fsync_buffers_list(spinlock_t *lock, struct 
list_head *list)
        }
 
        spin_unlock(lock);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        spin_lock(lock);
 
        while (!list_empty(&tmp)) {
diff --git a/fs/direct-io.c b/fs/direct-io.c
index e181b6b..16f16ed 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1262,7 +1262,7 @@ do_blockdev_direct_IO(int rw, struct kiocb *iocb, struct 
inode *inode,
        if (sdio.bio)
                dio_bio_submit(dio, &sdio);
 
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        /*
         * It is possible that, we return short IO due to end of file.
diff --git a/fs/ext4/file.c b/fs/ext4/file.c
index 33a09da..3a293eb 100644
--- a/fs/ext4/file.c
+++ b/fs/ext4/file.c
@@ -183,7 +183,7 @@ ext4_file_write_iter(struct kiocb *iocb, struct iov_iter 
*from)
                        ret = err;
        }
        if (o_direct)
-               blk_finish_plug(&plug);
+               blk_finish_plug();
 
 errout:
        if (aio_mutex)
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index 5cb9a21..90ce0cb 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -2302,7 +2302,7 @@ static int ext4_writepages(struct address_space *mapping,
 
                blk_start_plug(&plug);
                ret = write_cache_pages(mapping, wbc, __writepage, mapping);
-               blk_finish_plug(&plug);
+               blk_finish_plug();
                goto out_writepages;
        }
 
@@ -2438,7 +2438,7 @@ retry:
                if (ret)
                        break;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        if (!ret && !cycled && wbc->nr_to_write > 0) {
                cycled = 1;
                mpd.last_page = writeback_index - 1;
diff --git a/fs/f2fs/checkpoint.c b/fs/f2fs/checkpoint.c
index 7f794b7..86ba453 100644
--- a/fs/f2fs/checkpoint.c
+++ b/fs/f2fs/checkpoint.c
@@ -846,7 +846,7 @@ retry_flush_nodes:
                goto retry_flush_nodes;
        }
 out:
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        return err;
 }
 
diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 76adbc3..abeef77 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -678,7 +678,7 @@ static void do_garbage_collect(struct f2fs_sb_info *sbi, 
unsigned int segno,
                gc_data_segment(sbi, sum->entries, gc_list, segno, gc_type);
                break;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        stat_inc_seg_count(sbi, GET_SUM_TYPE((&sum->footer)));
        stat_inc_call_count(sbi->stat_info);
diff --git a/fs/f2fs/node.c b/fs/f2fs/node.c
index 97bd9d3..c4aa9e2 100644
--- a/fs/f2fs/node.c
+++ b/fs/f2fs/node.c
@@ -1098,7 +1098,7 @@ repeat:
                ra_node_page(sbi, nid);
        }
 
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        lock_page(page);
        if (unlikely(page->mapping != NODE_MAPPING(sbi))) {
diff --git a/fs/gfs2/log.c b/fs/gfs2/log.c
index 536e7a6..06f25d17 100644
--- a/fs/gfs2/log.c
+++ b/fs/gfs2/log.c
@@ -159,7 +159,7 @@ restart:
                        goto restart;
        }
        spin_unlock(&sdp->sd_ail_lock);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        trace_gfs2_ail_flush(sdp, wbc, 0);
 }
 
diff --git a/fs/hpfs/buffer.c b/fs/hpfs/buffer.c
index 8057fe4..138462d 100644
--- a/fs/hpfs/buffer.c
+++ b/fs/hpfs/buffer.c
@@ -35,7 +35,7 @@ void hpfs_prefetch_sectors(struct super_block *s, unsigned 
secno, int n)
                secno++;
                n--;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 /* Map a sector into a buffer and return pointers to it and to the buffer. */
diff --git a/fs/jbd/checkpoint.c b/fs/jbd/checkpoint.c
index 08c0304..cd6b09f 100644
--- a/fs/jbd/checkpoint.c
+++ b/fs/jbd/checkpoint.c
@@ -263,7 +263,7 @@ __flush_batch(journal_t *journal, struct buffer_head **bhs, 
int *batch_count)
        blk_start_plug(&plug);
        for (i = 0; i < *batch_count; i++)
                write_dirty_buffer(bhs[i], WRITE_SYNC);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        for (i = 0; i < *batch_count; i++) {
                struct buffer_head *bh = bhs[i];
diff --git a/fs/jbd/commit.c b/fs/jbd/commit.c
index bb217dc..e1046c3 100644
--- a/fs/jbd/commit.c
+++ b/fs/jbd/commit.c
@@ -447,7 +447,7 @@ void journal_commit_transaction(journal_t *journal)
        blk_start_plug(&plug);
        err = journal_submit_data_buffers(journal, commit_transaction,
                                          write_op);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        /*
         * Wait for all previously submitted IO to complete.
@@ -697,7 +697,7 @@ start_journal_io:
                }
        }
 
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        /* Lo and behold: we have just managed to send a transaction to
            the log.  Before we can commit it, wait for the IO so far to
diff --git a/fs/jbd2/checkpoint.c b/fs/jbd2/checkpoint.c
index 988b32e..6aa0039 100644
--- a/fs/jbd2/checkpoint.c
+++ b/fs/jbd2/checkpoint.c
@@ -187,7 +187,7 @@ __flush_batch(journal_t *journal, int *batch_count)
        blk_start_plug(&plug);
        for (i = 0; i < *batch_count; i++)
                write_dirty_buffer(journal->j_chkpt_bhs[i], WRITE_SYNC);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        for (i = 0; i < *batch_count; i++) {
                struct buffer_head *bh = journal->j_chkpt_bhs[i];
diff --git a/fs/jbd2/commit.c b/fs/jbd2/commit.c
index b73e021..8f532c8 100644
--- a/fs/jbd2/commit.c
+++ b/fs/jbd2/commit.c
@@ -805,7 +805,7 @@ start_journal_io:
                        __jbd2_journal_abort_hard(journal);
        }
 
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        /* Lo and behold: we have just managed to send a transaction to
            the log.  Before we can commit it, wait for the IO so far to
diff --git a/fs/mpage.c b/fs/mpage.c
index 3e79220..bf7d6c3 100644
--- a/fs/mpage.c
+++ b/fs/mpage.c
@@ -695,7 +695,7 @@ mpage_writepages(struct address_space *mapping,
                if (mpd.bio)
                        mpage_bio_submit(WRITE, mpd.bio);
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        return ret;
 }
 EXPORT_SYMBOL(mpage_writepages);
diff --git a/fs/nfs/blocklayout/blocklayout.c b/fs/nfs/blocklayout/blocklayout.c
index 1cac3c1..e93b6a8 100644
--- a/fs/nfs/blocklayout/blocklayout.c
+++ b/fs/nfs/blocklayout/blocklayout.c
@@ -311,7 +311,7 @@ bl_read_pagelist(struct nfs_pgio_header *header)
        }
 out:
        bl_submit_bio(READ, bio);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        put_parallel(par);
        return PNFS_ATTEMPTED;
 }
@@ -433,7 +433,7 @@ bl_write_pagelist(struct nfs_pgio_header *header, int sync)
        header->res.count = header->args.count;
 out:
        bl_submit_bio(WRITE, bio);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        put_parallel(par);
        return PNFS_ATTEMPTED;
 }
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c
index 1790b00..2f89ca2 100644
--- a/fs/xfs/xfs_buf.c
+++ b/fs/xfs/xfs_buf.c
@@ -1289,7 +1289,7 @@ _xfs_buf_ioapply(
                if (size <= 0)
                        break;  /* all done */
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 /*
@@ -1823,7 +1823,7 @@ __xfs_buf_delwri_submit(
 
                xfs_buf_submit(bp);
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        return pinned;
 }
diff --git a/fs/xfs/xfs_dir2_readdir.c b/fs/xfs/xfs_dir2_readdir.c
index 098cd78..7e8fa3f 100644
--- a/fs/xfs/xfs_dir2_readdir.c
+++ b/fs/xfs/xfs_dir2_readdir.c
@@ -455,7 +455,7 @@ xfs_dir2_leaf_readbuf(
                        }
                }
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
 out:
        *bpp = bp;
diff --git a/fs/xfs/xfs_itable.c b/fs/xfs/xfs_itable.c
index 82e3142..c3ac5ec 100644
--- a/fs/xfs/xfs_itable.c
+++ b/fs/xfs/xfs_itable.c
@@ -196,7 +196,7 @@ xfs_bulkstat_ichunk_ra(
                                             &xfs_inode_buf_ops);
                }
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 }
 
 /*
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7f9a516..188133f 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1091,6 +1091,7 @@ static inline void blk_post_runtime_resume(struct 
request_queue *q, int err) {}
  * schedule() where blk_schedule_flush_plug() is called.
  */
 struct blk_plug {
+       int depth; /* number of nested plugs */
        struct list_head list; /* requests */
        struct list_head mq_list; /* blk-mq requests */
        struct list_head cb_list; /* md requires an unplug callback */
@@ -1107,7 +1108,7 @@ struct blk_plug_cb {
 extern struct blk_plug_cb *blk_check_plugged(blk_plug_cb_fn unplug,
                                             void *data, int size);
 extern void blk_start_plug(struct blk_plug *);
-extern void blk_finish_plug(struct blk_plug *);
+extern void blk_finish_plug(void);
 extern void blk_flush_plug_list(struct blk_plug *, bool);
 
 static inline void blk_flush_plug(struct task_struct *tsk)
@@ -1646,7 +1647,7 @@ static inline void blk_start_plug(struct blk_plug *plug)
 {
 }
 
-static inline void blk_finish_plug(struct blk_plug *plug)
+static inline void blk_finish_plug(void)
 {
 }
 
diff --git a/mm/madvise.c b/mm/madvise.c
index d551475..18a34ee 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -539,7 +539,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, 
len_in, int, behavior)
                        vma = find_vma(current->mm, start);
        }
 out:
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        if (write)
                up_write(&current->mm->mmap_sem);
        else
diff --git a/mm/page-writeback.c b/mm/page-writeback.c
index 644bcb6..4570f6e 100644
--- a/mm/page-writeback.c
+++ b/mm/page-writeback.c
@@ -2020,7 +2020,7 @@ int generic_writepages(struct address_space *mapping,
 
        blk_start_plug(&plug);
        ret = write_cache_pages(mapping, wbc, __writepage, mapping);
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        return ret;
 }
 
diff --git a/mm/readahead.c b/mm/readahead.c
index 9356758..64182a2 100644
--- a/mm/readahead.c
+++ b/mm/readahead.c
@@ -136,7 +136,7 @@ static int read_pages(struct address_space *mapping, struct 
file *filp,
        ret = 0;
 
 out:
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        return ret;
 }
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 405923f..5721f64 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -478,7 +478,7 @@ struct page *swapin_readahead(swp_entry_t entry, gfp_t 
gfp_mask,
                        SetPageReadahead(page);
                page_cache_release(page);
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
 
        lru_add_drain();        /* Push any new pages onto the LRU now */
 skip:
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5e8eadd..56bb274 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2222,7 +2222,7 @@ static void shrink_lruvec(struct lruvec *lruvec, int 
swappiness,
 
                scan_adjusted = true;
        }
-       blk_finish_plug(&plug);
+       blk_finish_plug();
        sc->nr_reclaimed += nr_reclaimed;
 
        /*
-- 
1.8.3.1

<Prev in Thread] Current Thread [Next in Thread>