xfs
[Top] [All Lists]

[PATCH v4] xfs: probe data buffer from page cache for unwritten extents

To: xfs@xxxxxxxxxxx
Subject: [PATCH v4] xfs: probe data buffer from page cache for unwritten extents
From: Jeff Liu <jeff.liu@xxxxxxxxxx>
Date: Fri, 20 Jul 2012 16:28:06 +0800
Organization: Oracle
Reply-to: jeff.liu@xxxxxxxxxx
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.28) Gecko/20120313 Thunderbird/3.1.20
Hi All,

According to Mark and Christoph's comments for v3, except optimizing 
xfs_seek_data() with unwritten extents
probing, the xfs_seek_hole() is also refined to that in this version.

Hi Mark,
I have one code change in xfs_has_unwritten_buffer() need to update for you.
Originally, we did BH state examination combine with "*offset <= 
XFS_FSB_TO_B(mp, last)", this caused a few
corner cases run failed per my tests.
For instance, assuming we have a tiny pre-allocated file only has 8 bytes and 
call SEEK_DATA with offset = 1 against it.
In this case, '*offset' is 1 but XFS_FSB_TO_B(mp, last) is calculated to 
*ZERO*, xfs_has_unwritten_buffea r() will return
false with a data extent missing.
I have removed this check logic from there, it does not affect the test results 
that I have verified through your seekv8 test program.
Hence, we still got optimized as transfer '*offset' ranther than 
'map[x].br_startoff' to xfs_has_unwritten_buffer(). :)


v3->v4:
xfs_seek_hole() refinement, suggested by Mark and Christoph.
* refine xfs_seek_hole() with unwritten extents search, treat it as a hole if 
no data buffer was found from page cache.
* s/goto out/break/g, break out of the extent maps reading loop rather than 'go 
to', I must have got my head up in the clouds when writing v3. :(
* xfs_has_unwritten_buffer(), remove 'offset <= XFS_FSB_TO_B(mp, last))' from 
BH state checking branch.
  The page index offset might less than '*start', so we will miss a data extent 
if so.
* xfs_has_unwritten_buffer(), don't reset '*offset' to *ZERO* if no data buffer 
was found because of xfs_seek_hole() will
  call this function to examine an unwritten extent has data or not.  If not, 
it will use the returned '*offset' as a hole offset.
  So set '*offset' to zero in xfs_has_unwritten_buffer() will lead to wrong 
result.
* avoid re-starting the next round search in both xfs_seek_data() and 
xfs_seek_hole() if the end offset of the 2nd extent map is hit the EOF.
  So for SEEK_DATA, it means there is no data extent beyond the current offset 
and return ENXIO, for SEEK_HOLE, return the file
  size to indicate hitting EOF.  The comments were also changed(s/reading 
offset not beyond/reading offset not beyond or hit EOF/)accordingly.

v2->v3:
Tested by Mark, hit BUG() for continuous unwritten extents without data wrote.
* xfs_seek_data(), remove BUG() and having extents map search in loop.

v1->v2:
suggested by Mark.
* xfs_has_unwritten_buffer(), use the input offset instead of bmap->br_startoff 
to
 calculate page index for data buffer probing.

Thanks,
-Jeff


Signed-off-by: Jie Liu <jeff.liu@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

---
 fs/xfs/xfs_file.c |  283 +++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 254 insertions(+), 29 deletions(-)

diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 9f7ec15..5934de9 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -36,6 +36,7 @@
 
 #include <linux/dcache.h>
 #include <linux/falloc.h>
+#include <linux/pagevec.h>
 
 static const struct vm_operations_struct xfs_file_vm_ops;
 
@@ -966,6 +967,106 @@ xfs_vm_page_mkwrite(
        return block_page_mkwrite(vma, vmf, xfs_get_blocks);
 }
 
+/*
+ * Probe the data buffer offset in page cache for unwritten extents.
+ * Iterate each page to find out if a buffer head state has BH_Unwritten
+ * or BH_Uptodate set.
+ */
+STATIC bool
+xfs_has_unwritten_buffer(
+       struct inode            *inode,
+       struct xfs_bmbt_irec    *map,
+       loff_t                  *offset)
+{
+       struct xfs_inode        *ip = XFS_I(inode);
+       struct xfs_mount        *mp = ip->i_mount;
+       struct pagevec          pvec;
+       pgoff_t                 index;
+       pgoff_t                 end;
+       bool                    found = false;
+
+       pagevec_init(&pvec, 0);
+
+       index = XFS_FSB_TO_B(mp, XFS_B_TO_FSBT(mp, *offset))
+                            >> PAGE_CACHE_SHIFT;
+       end = XFS_FSB_TO_B(mp, map->br_startoff + map->br_blockcount)
+                          >> PAGE_CACHE_SHIFT;
+       pr_info("page search: index=%lu, end=%lu\n", index, end);
+       do {
+               unsigned int    i;
+               unsigned        nr_pages;
+               int             want = min_t(pgoff_t, end - index,
+                                            (pgoff_t)PAGEVEC_SIZE - 1) + 1;
+               nr_pages = pagevec_lookup(&pvec, inode->i_mapping, index,
+                                         want);
+               if (nr_pages == 0)
+                       break;
+
+               for (i = 0; i < nr_pages; i++) {
+                       struct page             *page = pvec.pages[i];
+                       struct buffer_head      *bh;
+                       struct buffer_head      *head;
+                       xfs_fileoff_t           last;
+
+                       /*
+                        * There is no need to check the following pages
+                        * if the current page offset is out of range.
+                        */
+                       if (page->index > end)
+                               goto out;
+
+                       if (!trylock_page(page))
+                               goto out;
+
+                       if (!page_has_buffers(page)) {
+                               unlock_page(page);
+                               continue;
+                       }
+
+                       last = XFS_B_TO_FSBT(mp,
+                                            page->index << PAGE_CACHE_SHIFT);
+                       bh = head = page_buffers(page);
+                       do {
+                               /*
+                                * An extent in XFS_EXT_UNWRITTEN has disk
+                                * blocks already mapped to it, but no data
+                                * has been committed to them yet.  If it has
+                                * dirty data in the page cache it can be
+                                * identified by having BH_Unwritten set in
+                                * each buffers.  Also, the buffer head state
+                                * might be in BH_Uptodate too if the buffer
+                                * writeback procedure was fired, we need to
+                                * examine it as well.
+                                */
+                               if (buffer_unwritten(bh) ||
+                                   buffer_uptodate(bh)) {
+                                       found = true;
+                                       *offset = max_t(loff_t, *offset,
+                                                       XFS_FSB_TO_B(mp, last));
+                                       unlock_page(page);
+                                       goto out;
+                               }
+                               last++;
+                       } while ((bh = bh->b_this_page) != head);
+                       unlock_page(page);
+               }
+
+               /*
+                * If the number of probed pages less than our desired,
+                * there should no more pages mapped, search done.
+                */
+               if (nr_pages < want)
+                       break;
+
+               index = pvec.pages[i - 1]->index + 1;
+               pagevec_release(&pvec);
+       } while (index < end);
+
+out:
+       pagevec_release(&pvec);
+       return found;
+}
+
 STATIC loff_t
 xfs_seek_data(
        struct file             *file,
@@ -975,8 +1076,6 @@ xfs_seek_data(
        struct inode            *inode = file->f_mapping->host;
        struct xfs_inode        *ip = XFS_I(inode);
        struct xfs_mount        *mp = ip->i_mount;
-       struct xfs_bmbt_irec    map[2];
-       int                     nmap = 2;
        loff_t                  uninitialized_var(offset);
        xfs_fsize_t             isize;
        xfs_fileoff_t           fsbno;
@@ -987,39 +1086,97 @@ xfs_seek_data(
        lock = xfs_ilock_map_shared(ip);
 
        isize = i_size_read(inode);
+
        if (start >= isize) {
                error = ENXIO;
                goto out_unlock;
        }
 
-       fsbno = XFS_B_TO_FSBT(mp, start);
-
        /*
         * Try to read extents from the first block indicated
         * by fsbno to the end block of the file.
         */
+       fsbno = XFS_B_TO_FSBT(mp, start);
        end = XFS_B_TO_FSB(mp, isize);
+       for (;;) {
+               struct xfs_bmbt_irec    map[2];
+               int                     nmap = 2;
 
-       error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
-                              XFS_BMAPI_ENTIRE);
-       if (error)
-               goto out_unlock;
+               error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
+                                      XFS_BMAPI_ENTIRE);
+               if (error)
+                       goto out_unlock;
 
-       /*
-        * Treat unwritten extent as data extent since it might
-        * contains dirty data in page cache.
-        */
-       if (map[0].br_startblock != HOLESTARTBLOCK) {
-               offset = max_t(loff_t, start,
-                              XFS_FSB_TO_B(mp, map[0].br_startoff));
-       } else {
-               if (nmap == 1) {
+               /* No extents at given offset, must be beyond EOF */
+               if (nmap == 0) {
                        error = ENXIO;
                        goto out_unlock;
                }
 
                offset = max_t(loff_t, start,
-                              XFS_FSB_TO_B(mp, map[1].br_startoff));
+                              XFS_FSB_TO_B(mp, map[0].br_startoff));
+               if (map[0].br_state == XFS_EXT_NORM &&
+                   !isnullstartblock(map[0].br_startblock))
+                       break;
+               else {
+                       /*
+                        * Landed in an unwritten extent, try to search
+                        * data buffer from page cache firstly.  Treat
+                        * it as a hole if nothing was found, and skip
+                        * to check the next extent.
+                        */
+                       if (map[0].br_startblock == DELAYSTARTBLOCK ||
+                           map[0].br_state == XFS_EXT_UNWRITTEN) {
+                               /* Probing page cache start from offset */
+                               if (xfs_has_unwritten_buffer(inode, &map[0],
+                                                            &offset))
+                                       break;
+                       }
+
+                       /*
+                        * Found a hole in map[0] and nothing in map[1].
+                        * Probably means that we are reading after EOF.
+                        */
+                       if (nmap == 1) {
+                               error = ENXIO;
+                               goto out_unlock;
+                       }
+
+                       /*
+                        * We have two mappings, and need to check map[1] to
+                        * see if there is data or not.
+                        */
+                       offset = max_t(loff_t, start,
+                                      XFS_FSB_TO_B(mp, map[1].br_startoff));
+                       if (map[1].br_state == XFS_EXT_NORM &&
+                           !isnullstartblock(map[1].br_startblock))
+                               break;
+                       else {
+                               /*
+                                * So map[1] is an unwritten extent as well,
+                                * try to search for data buffer in page cache.
+                                * We might find nothing if it is an allocated
+                                * and resvered extent.
+                                */
+                               if (map[1].br_startblock == DELAYSTARTBLOCK ||
+                                   map[1].br_state == XFS_EXT_UNWRITTEN) {
+                                       if (xfs_has_unwritten_buffer(inode,
+                                                       &map[1], &offset))
+                                               break;
+                               }
+                       }
+               }
+
+               /*
+                * Nothing was found, proceed to the next round of search if
+                * reading offset not beyond or hit EOF.
+                */
+               fsbno = map[1].br_startoff + map[1].br_blockcount;
+               start = XFS_FSB_TO_B(mp, fsbno);
+               if (start >= isize) {
+                       error = ENXIO;
+                       goto out_unlock;
+               }
        }
 
        if (offset != file->f_pos)
@@ -1043,9 +1200,9 @@ xfs_seek_hole(
        struct xfs_inode        *ip = XFS_I(inode);
        struct xfs_mount        *mp = ip->i_mount;
        loff_t                  uninitialized_var(offset);
-       loff_t                  holeoff;
        xfs_fsize_t             isize;
        xfs_fileoff_t           fsbno;
+       xfs_filblks_t           end;
        uint                    lock;
        int                     error;
 
@@ -1061,19 +1218,87 @@ xfs_seek_hole(
        }
 
        fsbno = XFS_B_TO_FSBT(mp, start);
-       error = xfs_bmap_first_unused(NULL, ip, 1, &fsbno, XFS_DATA_FORK);
-       if (error)
-               goto out_unlock;
+       end = XFS_B_TO_FSB(mp, isize);
+       for (;;) {
+               struct xfs_bmbt_irec    map[2];
+               int                     nmap = 2;
+
+               error = xfs_bmapi_read(ip, fsbno, end - fsbno, map, &nmap,
+                                      XFS_BMAPI_ENTIRE);
+               if (error)
+                       goto out_unlock;
+
+               /* No extents at given offset, must be beyond EOF */
+               if (nmap == 0) {
+                       error = ENXIO;
+                       goto out_unlock;
+               }
+
+               offset = max_t(loff_t, start,
+                              XFS_FSB_TO_B(mp, map[0].br_startoff));
+               if (map[0].br_startblock == HOLESTARTBLOCK)
+                       break;
+               else {
+                       /*
+                        * Landed in an unwritten extent, try to search
+                        * data buffer from page cache firstly.  Treat
+                        * it as a hole and return the offset if nothing
+                        * was found.
+                        */
+                       if (map[0].br_state == XFS_EXT_UNWRITTEN ||
+                           isnullstartblock(map[0].br_startblock)) {
+                               /* Probing page cache start from offset */
+                               if (!xfs_has_unwritten_buffer(inode, &map[0],
+                                                             &offset))
+                                       break;
+                       }
+
+                       /*
+                        * map[0] is unwritten and there is no hole past
+                        * offset, probably means that we are reading after
+                        * EOF.  Then the file offset is adjusted to the
+                        * end of the file(i.e., there is an implicit hole
+                        * at the end of any file).
+                        */
+                       if (nmap == 1) {
+                               offset = isize;
+                               break;
+                       }
+
+                       /*
+                        * We have two mappings, and need to check map[1] to
+                        * see if it is a hole or not.
+                        */
+                       offset = max_t(loff_t, start,
+                                      XFS_FSB_TO_B(mp, map[1].br_startoff));
+                       if (map[1].br_startblock == HOLESTARTBLOCK)
+                               break;
+                       else {
+                               /*
+                                * So map[1] is an unwritten extent as well.
+                                * Try to search for data buffer in page cache.
+                                * Treat it as a hole and return the offset if
+                                * nothing was found.
+                                */
+                               if (map[1].br_state == XFS_EXT_UNWRITTEN ||
+                                   isnullstartblock(map[1].br_startblock)) {
+                                       if (!xfs_has_unwritten_buffer(inode,
+                                                       &map[1], &offset))
+                                               break;
+                               }
+                       }
+               }
 
-       holeoff = XFS_FSB_TO_B(mp, fsbno);
-       if (holeoff <= start)
-               offset = start;
-       else {
                /*
-                * xfs_bmap_first_unused() could return a value bigger than
-                * isize if there are no more holes past the supplied offset.
+                * Both mappings are have data, proceed to the next round of
+                * search if reading offset not beyond or hit EOF.
                 */
-               offset = min_t(loff_t, holeoff, isize);
+               fsbno = map[1].br_startoff + map[1].br_blockcount;
+               start = XFS_FSB_TO_B(mp, fsbno);
+               if (start >= isize) {
+                       offset = isize;
+                       break;
+               }
        }
 
        if (offset != file->f_pos)
-- 
1.7.4.1

<Prev in Thread] Current Thread [Next in Thread>