[PATCH v4] xfs: probe data buffer from page cache for unwritten extents
Jeff Liu
jeff.liu at oracle.com
Fri Jul 20 03:28:06 CDT 2012
Hi All,
According to Mark and Christoph's comments for v3, except optimizing xfs_seek_data() with unwritten extents
probing, the xfs_seek_hole() is also refined to that in this version.
Hi Mark,
I have one code change in xfs_has_unwritten_buffer() need to update for you.
Originally, we did BH state examination combine with "*offset <= XFS_FSB_TO_B(mp, last)", this caused a few
corner cases run failed per my tests.
For instance, assuming we have a tiny pre-allocated file only has 8 bytes and call SEEK_DATA with offset = 1 against it.
In this case, '*offset' is 1 but XFS_FSB_TO_B(mp, last) is calculated to *ZERO*, xfs_has_unwritten_buffea r() will return
false with a data extent missing.
I have removed this check logic from there, it does not affect the test results that I have verified through your seekv8 test program.
Hence, we still got optimized as transfer '*offset' ranther than 'map[x].br_startoff' to xfs_has_unwritten_buffer(). :)
v3->v4:
xfs_seek_hole() refinement, suggested by Mark and Christoph.
* refine xfs_seek_hole() with unwritten extents search, treat it as a hole if no data buffer was found from page cache.
* s/goto out/break/g, break out of the extent maps reading loop rather than 'go to', I must have got my head up in the clouds when writing v3. :(
* xfs_has_unwritten_buffer(), remove 'offset <= XFS_FSB_TO_B(mp, last))' from BH state checking branch.
The page index offset might less than '*start', so we will miss a data extent if so.
* xfs_has_unwritten_buffer(), don't reset '*offset' to *ZERO* if no data buffer was found because of xfs_seek_hole() will
call this function to examine an unwritten extent has data or not. If not, it will use the returned '*offset' as a hole offset.
So set '*offset' to zero in xfs_has_unwritten_buffer() will lead to wrong result.
* avoid re-starting the next round search in both xfs_seek_data() and xfs_seek_hole() if the end offset of the 2nd extent map is hit the EOF.
So for SEEK_DATA, it means there is no data extent beyond the current offset and return ENXIO, for SEEK_HOLE, return the file
size to indicate hitting EOF. The comments were also changed(s/reading offset not beyond/reading offset not beyond or hit EOF/)accordingly.
v2->v3:
Tested by Mark, hit BUG() for continuous unwritten extents without data wrote.
* xfs_seek_data(), remove BUG() and having extents map search in loop.
v1->v2:
suggested by Mark.
* xfs_has_unwritten_buffer(), use the input offset instead of bmap->br_startoff to
calculate page index for data buffer probing.
Thanks,
-Jeff
Signed-off-by: Jie Liu <jeff.liu at oracle.com>
Reviewed-by: Mark Tinguely <tinguely at sgi.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
Reviewed-by: Dave Chinner <dchinner at redhat.com>
---
fs/xfs/xfs_file.c | 283 +++++++++++++++++++++++++++++++++++++++++++++++------
1 files changed, 254 insertions(+), 29 deletions(-)
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9f7ec15..5934de9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -36,6 +36,7 @@
#include <linux/dcache.h>
#include <linux/falloc.h>
+#include <linux/pagevec.h>
static const struct vm_operations_struct xfs_file_vm_ops;
@@ -966,6 +967,106 @@ xfs_vm_page_mkwrite(
return block_page_mkwrite(vma, vmf, xfs_get_blocks);
}
+/*
+ * Probe the data buffer offset in page cache for unwritten extents.
+ * Iterate each page to find out if a buffer head state has BH_Unwritten
+ * or BH_Uptodate set.
+ */
+STATIC bool
+xfs_has_unwritten_buffer(
+ struct inode *inode,
+ struct xfs_bmbt_irec *map,
+ loff_t *offset)
+{
+ struct xfs_inode *ip = XFS_I(inode);
+ struct xfs_mount *mp = ip->i_mount;
+ struct pagevec pvec;
+ pgoff_t index;
+ pgoff_t end;
+ bool found = false;
+
+ pagevec_init(&pvec, 0);
+
+ index = XFS_FSB_TO_B(mp, XFS_B_TO_FSBT(mp, *offset))
+ >> PAGE_CACHE_SHIFT;
+ end = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount)
+ >> PAGE_CACHE_SHIFT;
+ pr_info("page search: index=%lu, end=%lu\n", index, end);
+ do {
+ unsigned int i;
+ unsigned nr_pages;
+ int want = min_t(pgoff_t, end - index,
+ (pgoff_t)PAGEVEC_SIZE - 1) + 1;
+ nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
+ want);
+ if (nr_pages == 0)
+ break;
+
+ for (i = 0; i < nr_pages; i++) {
+ struct page *page = pvec.pages[i];
+ struct buffer_head *bh;
+ struct buffer_head *head;
+ xfs_fileoff_t last;
+
+ /*
+ * There is no need to check the following pages
+ * if the current page offset is out of range.
+ */
+ if (page->index > end)
+ goto out;
+
+ if (!trylock_page(page))
+ goto out;
+
+ if (!page_has_buffers(page)) {
+ unlock_page(page);
+ continue;
+ }
+
+ last = XFS_B_TO_FSBT(mp,
+ page->index << PAGE_CACHE_SHIFT);
+ bh = head = page_buffers(page);
+ do {
+ /*
+ * An extent in XFS_EXT_UNWRITTEN has disk
+ * blocks already mapped to it, but no data
+ * has been committed to them yet. If it has
+ * dirty data in the page cache it can be
+ * identified by having BH_Unwritten set in
+ * each buffers. Also, the buffer head state
+ * might be in BH_Uptodate too if the buffer
+ * writeback procedure was fired, we need to
+ * examine it as well.
+ */
+ if (buffer_unwritten(bh) ||
+ buffer_uptodate(bh)) {
+ found = true;
+ *offset = max_t(loff_t, *offset,
+ XFS_FSB_TO_B(mp, last));
+ unlock_page(page);
+ goto out;
+ }
+ last++;
+ } while ((bh = bh->b_this_page) != head);
+ unlock_page(page);
+ }
+
+ /*
+ * If the number of probed pages less than our desired,
+ * there should no more pages mapped, search done.
+ */
+ if (nr_pages < want)
+ break;
+
+ index = pvec.pages[i - 1]->index + 1;
+ pagevec_release(&pvec);
+ } while (index < end);
+
+out:
+ pagevec_release(&pvec);
+ return found;
+}
+
STATIC loff_t
xfs_seek_data(
struct file *file,
@@ -975,8 +1076,6 @@ xfs_seek_data(
struct inode *inode = file->f_mapping->host;
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
- struct xfs_bmbt_irec map[2];
- int nmap = 2;
loff_t uninitialized_var(offset);
xfs_fsize_t isize;
xfs_fileoff_t fsbno;
@@ -987,39 +1086,97 @@ xfs_seek_data(
lock = xfs_ilock_map_shared(ip);
isize = i_size_read(inode);
+
if (start >= isize) {
error = ENXIO;
goto out_unlock;
}
- fsbno = XFS_B_TO_FSBT(mp, start);
-
/*
* Try to read extents from the first block indicated
* by fsbno to the end block of the file.
*/
+ fsbno = XFS_B_TO_FSBT(mp, start);
end = XFS_B_TO_FSB(mp, isize);
+ for (;;) {
+ struct xfs_bmbt_irec map[2];
+ int nmap = 2;
- error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
- XFS_BMAPI_ENTIRE);
- if (error)
- goto out_unlock;
+ error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
+ XFS_BMAPI_ENTIRE);
+ if (error)
+ goto out_unlock;
- /*
- * Treat unwritten extent as data extent since it might
- * contains dirty data in page cache.
- */
- if (map[0].br_startblock != HOLESTARTBLOCK) {
- offset = max_t(loff_t, start,
- XFS_FSB_TO_B(mp, map[0].br_startoff));
- } else {
- if (nmap == 1) {
+ /* No extents at given offset, must be beyond EOF */
+ if (nmap == 0) {
error = ENXIO;
goto out_unlock;
}
offset = max_t(loff_t, start,
- XFS_FSB_TO_B(mp, map[1].br_startoff));
+ XFS_FSB_TO_B(mp, map[0].br_startoff));
+ if (map[0].br_state == XFS_EXT_NORM &&
+ !isnullstartblock(map[0].br_startblock))
+ break;
+ else {
+ /*
+ * Landed in an unwritten extent, try to search
+ * data buffer from page cache firstly. Treat
+ * it as a hole if nothing was found, and skip
+ * to check the next extent.
+ */
+ if (map[0].br_startblock == DELAYSTARTBLOCK ||
+ map[0].br_state == XFS_EXT_UNWRITTEN) {
+ /* Probing page cache start from offset */
+ if (xfs_has_unwritten_buffer(inode, &map[0],
+ &offset))
+ break;
+ }
+
+ /*
+ * Found a hole in map[0] and nothing in map[1].
+ * Probably means that we are reading after EOF.
+ */
+ if (nmap == 1) {
+ error = ENXIO;
+ goto out_unlock;
+ }
+
+ /*
+ * We have two mappings, and need to check map[1] to
+ * see if there is data or not.
+ */
+ offset = max_t(loff_t, start,
+ XFS_FSB_TO_B(mp, map[1].br_startoff));
+ if (map[1].br_state == XFS_EXT_NORM &&
+ !isnullstartblock(map[1].br_startblock))
+ break;
+ else {
+ /*
+ * So map[1] is an unwritten extent as well,
+ * try to search for data buffer in page cache.
+ * We might find nothing if it is an allocated
+ * and resvered extent.
+ */
+ if (map[1].br_startblock == DELAYSTARTBLOCK ||
+ map[1].br_state == XFS_EXT_UNWRITTEN) {
+ if (xfs_has_unwritten_buffer(inode,
+ &map[1], &offset))
+ break;
+ }
+ }
+ }
+
+ /*
+ * Nothing was found, proceed to the next round of search if
+ * reading offset not beyond or hit EOF.
+ */
+ fsbno = map[1].br_startoff + map[1].br_blockcount;
+ start = XFS_FSB_TO_B(mp, fsbno);
+ if (start >= isize) {
+ error = ENXIO;
+ goto out_unlock;
+ }
}
if (offset != file->f_pos)
@@ -1043,9 +1200,9 @@ xfs_seek_hole(
struct xfs_inode *ip = XFS_I(inode);
struct xfs_mount *mp = ip->i_mount;
loff_t uninitialized_var(offset);
- loff_t holeoff;
xfs_fsize_t isize;
xfs_fileoff_t fsbno;
+ xfs_filblks_t end;
uint lock;
int error;
@@ -1061,19 +1218,87 @@ xfs_seek_hole(
}
fsbno = XFS_B_TO_FSBT(mp, start);
- error = xfs_bmap_first_unused(NULL, ip, 1, &fsbno, XFS_DATA_FORK);
- if (error)
- goto out_unlock;
+ end = XFS_B_TO_FSB(mp, isize);
+ for (;;) {
+ struct xfs_bmbt_irec map[2];
+ int nmap = 2;
+
+ error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
+ XFS_BMAPI_ENTIRE);
+ if (error)
+ goto out_unlock;
+
+ /* No extents at given offset, must be beyond EOF */
+ if (nmap == 0) {
+ error = ENXIO;
+ goto out_unlock;
+ }
+
+ offset = max_t(loff_t, start,
+ XFS_FSB_TO_B(mp, map[0].br_startoff));
+ if (map[0].br_startblock == HOLESTARTBLOCK)
+ break;
+ else {
+ /*
+ * Landed in an unwritten extent, try to search
+ * data buffer from page cache firstly. Treat
+ * it as a hole and return the offset if nothing
+ * was found.
+ */
+ if (map[0].br_state == XFS_EXT_UNWRITTEN ||
+ isnullstartblock(map[0].br_startblock)) {
+ /* Probing page cache start from offset */
+ if (!xfs_has_unwritten_buffer(inode, &map[0],
+ &offset))
+ break;
+ }
+
+ /*
+ * map[0] is unwritten and there is no hole past
+ * offset, probably means that we are reading after
+ * EOF. Then the file offset is adjusted to the
+ * end of the file(i.e., there is an implicit hole
+ * at the end of any file).
+ */
+ if (nmap == 1) {
+ offset = isize;
+ break;
+ }
+
+ /*
+ * We have two mappings, and need to check map[1] to
+ * see if it is a hole or not.
+ */
+ offset = max_t(loff_t, start,
+ XFS_FSB_TO_B(mp, map[1].br_startoff));
+ if (map[1].br_startblock == HOLESTARTBLOCK)
+ break;
+ else {
+ /*
+ * So map[1] is an unwritten extent as well.
+ * Try to search for data buffer in page cache.
+ * Treat it as a hole and return the offset if
+ * nothing was found.
+ */
+ if (map[1].br_state == XFS_EXT_UNWRITTEN ||
+ isnullstartblock(map[1].br_startblock)) {
+ if (!xfs_has_unwritten_buffer(inode,
+ &map[1], &offset))
+ break;
+ }
+ }
+ }
- holeoff = XFS_FSB_TO_B(mp, fsbno);
- if (holeoff <= start)
- offset = start;
- else {
/*
- * xfs_bmap_first_unused() could return a value bigger than
- * isize if there are no more holes past the supplied offset.
+ * Both mappings are have data, proceed to the next round of
+ * search if reading offset not beyond or hit EOF.
*/
- offset = min_t(loff_t, holeoff, isize);
+ fsbno = map[1].br_startoff + map[1].br_blockcount;
+ start = XFS_FSB_TO_B(mp, fsbno);
+ if (start >= isize) {
+ offset = isize;
+ break;
+ }
}
if (offset != file->f_pos)
--
1.7.4.1
More information about the xfs
mailing list