xfs
[Top] [All Lists]

XFS direct IO problem

To: "xfs" <xfs@xxxxxxxxxxx>
Subject: XFS direct IO problem
From: "YeYin" <eyniy@xxxxxx>
Date: Wed, 8 Apr 2015 12:21:45 +0800
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=qq.com; s=s201307; t=1428466908; bh=PX3rKi4Jk48uHLP3O0elOzQIwVZIBRYn8NAJVTXTjKE=; h=X-QQ-FEAT:X-QQ-SSF:X-HAS-ATTACH:X-QQ-BUSINESS-ORIGIN: X-Originating-IP:X-QQ-STYLE:X-QQ-mid:From:To:Subject:Mime-Version:Content-Type:Content-Transfer-Encoding:Date: X-Priority:Message-ID:X-QQ-MIME:X-Mailer:X-QQ-Mailer:X-QQ-SENDSIZE: X-QQ-FName:X-QQ-LocalIP; b=vq0q9ICoV6dVTFC0jMU7fasjoxgsXn7iGkZP6OWun1bgxmjkFVqTtZk5p6PUt8slw ONkvZLw8ffdBNcnEm79B4YYpuzBtizYWVTJWI1/q1ymg1/CzCd8SIX8ikzO+UypcJq t6JGbl35AltBe7ry8Tj097yaiWDFlKdSpPVjETDY=
Hi,
About 2 months ago, I asked one problem in XFS, see here(http://oss.sgi.com/archives/xfs/2015-02/msg00197.html).

After that, I use direct IO in MySQL, see here(https://dev.mysql.com/doc/refman/5.5/en/innodb-parameters.html#sysvar_innodb_flush_method).

However, I found that MySQL performance is still poor sometimes. I use some tools(https://github.com/brendangregg/perf-tools) to trace the kernel, I found some problems:

# ./funccount -i 1  "xfs_f*"
Tracing "xfs_f*"... Ctrl-C to end.
FUNC                              COUNT
xfs_file_aio_read                 15591
xfs_flushinval_pages              15591
xfs_find_bdev_for_inode           31182

As we can see, xfs_file_aio_read each will call xfs_flushinval_pages. 
Note that I used direct IO!!!

xfs_flushinval_pages will call truncate_inode_pages_range, from here(https://bitbucket.org/hustcat/kernel-2.6.32/src/0e5d90ed6f3ef8a3b5fe62a04cc6766a721c70f8/fs/xfs/linux-2.6/xfs_fs_subr.c?at=master#cl-56)

Indeed that,
# ./funccount -i 1  "truncate_inode_page*"
Tracing "truncate_inode_page*"... Ctrl-C to end.
FUNC                              COUNT
truncate_inode_page                   4
truncate_inode_pages                176
truncate_inode_pages_range        15474
FUNC                              COUNT
truncate_inode_page                   1
truncate_inode_pages                  5
truncate_inode_pages_range        15566

As we can see, truncate_inode_pages_range called times as many as xfs_flushinval_pages,
However, I found that truncate_inode_pages_range didn't call truncate_inode_page:

# ./funcgraph truncate_inode_pages_range
Tracing "truncate_inode_pages_range"... Ctrl-C to end.
  2)   1.020 us    |  finish_task_switch();
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.413 us    |      find_get_pages();
  2)   1.033 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.234 us    |      find_get_pages();
  2)   0.690 us    |    }
  2)   3.362 us    |  }
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.266 us    |      find_get_pages();
  2)   0.745 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.248 us    |      find_get_pages();
  2)   0.701 us    |    }
  2)   2.844 us    |  }
  2)               |  truncate_inode_pages_range() {
  2)               |    pagevec_lookup() {
  2)   0.262 us    |      find_get_pages();
  2)   0.740 us    |    }
  2)   0.238 us    |    _cond_resched();
  2)               |    pagevec_lookup() {
  2)   0.251 us    |      find_get_pages();
  2)   0.705 us    |    }
  2)   2.767 us    |  }

This will canse inode->i_mapping->nrpages > 0 always, and xfs_file_aio_read/xfs_file_dio_aio_write will always call xfs_flushinval_pages. Even worse, xfs_file_dio_aio_write will use EXCL lock:

	if (mapping->nrpages && iolock == XFS_IOLOCK_SHARED) {
		xfs_rw_iunlock(ip, iolock);
		iolock = XFS_IOLOCK_EXCL;
		xfs_rw_ilock(ip, iolock);
	}

see here(https://bitbucket.org/hustcat/kernel-2.6.32/src/0e5d90ed6f3ef8a3b5fe62a04cc6766a721c70f8/fs/xfs/linux-2.6/xfs_file.c?at=master#cl-659).

This will cause bad performance, even direct IO. I still don't understand why not truncate_inode_page called?

Every time, after I run this:
echo 1 > /proc/sys/vm/drop_caches

Immediately enhance performance.

Ye,
Thanks.


<Prev in Thread] Current Thread [Next in Thread>