This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".
The branch, master has been updated
df9e825 xfs: page type check in writeback only checks last buffer
c74915f xfs: Do background CIL flushes via a workqueue
8562ab3 xfs: pass shutdown method into xfs_trans_ail_delete_bulk
7de77f7 xfs: remove some obsolete comments in xfs_trans_ail.c
d03b7cb xfs: on-stack delayed write buffer lists
a20e7de xfs: do not add buffers to the delwri queue until pushed
9e46ca7 xfs: do not write the buffer from xfs_qm_dqflush
23bb1cd xfs: do not write the buffer from xfs_iflush
b2e8fbb xfs: don't flush inodes from background inode reclaim
353923d xfs: implement freezing by emptying the AIL
5f221bb xfs: allow assigning the tail lsn with the AIL lock held
9994be3 xfs: remove log item from AIL in xfs_iflush after a shutdown
58237be xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown
0d7f4c8 xfs: using GFP_NOFS for blkdev_issue_flush
41226c2 xfs: punch all delalloc blocks beyond EOF on write failure.
from bc236c3b202daa6aecdb100522dcf078379210e5 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit df9e825962c70e77098587fce8c9fe8a71367425
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:43 2012 +1000
xfs: page type check in writeback only checks last buffer
xfs_is_delayed_page() checks to see if a page has buffers matching
the given IO type passed in. It does so by walking the buffer heads
on the page and checking if the state flags match the IO type.
However, the "acceptable" variable that is calculated is overwritten
every time a new buffer is checked. Hence if the first buffer on the
page is of the right type, this state is lost if the second buffer
is not of the correct type. This means that xfs_aops_discard_page()
may not discard delalloc regions when it is supposed to, and
xfs_convert_page() may not cluster IO as efficiently as possible.
This problem only occurs on filesystems with a block size smaller
than page size.
Also, rename xfs_is_delayed_page() to xfs_check_page_type() to
better describe what it is doing - it is not delalloc specific
anymore.
The problem was first noticed by Peter Watkins.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit c74915f82eb5b73e09f47ae9b603da0b3a818ad5
Author: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon Apr 23 17:54:32 2012 +1000
xfs: Do background CIL flushes via a workqueue
Doing background CIL flushes adds significant latency to whatever
async transaction that triggers it. To avoid blocking async
transactions on things like waiting for log buffer IO to complete,
move the CIL push off into a workqueue. By moving the push work
into a workqueue, we remove all the latency that the commit adds
from the foreground transaction commit path. This also means that
single threaded workloads won't do the CIL push procssing, leaving
them more CPU to do more async transactions.
To do this, we need to keep track of the sequence number we have
pushed work for. This avoids having many transaction commits
attempting to schedule work for the same sequence, and ensures that
we only ever have one push (background or forced) in progress at a
time. It also means that we don't need to take the CIL lock in write
mode to check for potential background push races, which reduces
lock contention.
To avoid potential issues with "smart" IO schedulers, don't use the
workqueue for log force triggered flushes. Instead, do them directly
so that the log IO is done directly by the process issuing the log
force and so doesn't get stuck on IO elevator queue idling
incorrectly delaying the log IO from the workqueue.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 8562ab3dec9fb806785fd86fe3eb3ccc54d6ce1d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Apr 23 15:58:41 2012 +1000
xfs: pass shutdown method into xfs_trans_ail_delete_bulk
xfs_trans_ail_delete_bulk() can be called from different contexts so
if the item is not in the AIL we need different shutdown for each
context. Pass in the shutdown method needed so the correct action
can be taken.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 7de77f79785ed8dbac9c5f86e6440aaac506fa15
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:40 2012 +1000
xfs: remove some obsolete comments in xfs_trans_ail.c
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit d03b7cb00946bae3d0a31b19558164f66d51b45c
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:39 2012 +1000
xfs: on-stack delayed write buffer lists
Queue delwri buffers on a local on-stack list instead of a per-buftarg one,
and write back the buffers per-process instead of by waking up xfsbufd.
This is now easily doable given that we have very few places left that write
delwri buffers:
- log recovery:
Only done at mount time, and already forcing out the buffers
synchronously using xfs_flush_buftarg
- quotacheck:
Same story.
- dquot reclaim:
Writes out dirty dquots on the LRU under memory pressure. We might
want to look into doing more of this via xfsaild, but it's already
more optimal than the synchronous inode reclaim that writes each
buffer synchronously.
- xfsaild:
This is the main beneficiary of the change. By keeping a local list
of buffers to write we reduce latency of writing out buffers, and
more importably we can remove all the delwri list promotions which
were hitting the buffer cache hard under sustained metadata loads.
The implementation is very straight forward - xfs_buf_delwri_queue now gets
a new list_head pointer that it adds the delwri buffers to, and all callers
need to eventually submit the list using xfs_buf_delwi_submit or
xfs_buf_delwi_submit_nowait. Buffers that already are on a delwri list are
skipped in xfs_buf_delwri_queue, assuming they already are on another delwri
list. The biggest change to pass down the buffer list was done to the AIL
pushing. Now that we operate on buffers the trylock, push and pushbuf log
item methods are merged into a single push routine, which tries to lock the
item, and if possible add the buffer that needs writeback to the buffer
list.
This leads to much simpler code than the previous split but requires the
individual IOP_PUSH instances to unlock and reacquire the AIL around calls
to blocking routines.
Given that xfsailds now also handle writing out buffers, the conditions for
log forcing and the sleep times needed some small changes. The most
important one is that we consider an AIL busy as long we still have buffers
to push, and the other one is that we do increment the pushed LSN for
buffers that are under flushing at this moment, but still count them towards
the stuck items for restart purposes. Without this we could hammer on stuck
items without ever forcing the log and not make progress under heavy random
delete workloads on fast flash storage devices.
[ Dave Chinner:
- rebase on previous patches.
- improved comments for XBF_DELWRI_Q handling
- fix XBF_ASYNC handling in queue submission (test 106 failure)
- rename delwri submit function buffer list parameters for clarity
- xfs_efd_item_push() should return XFS_ITEM_PINNED ]
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit a20e7de64186dfe16be453efa85fbaf0bbb77668
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:38 2012 +1000
xfs: do not add buffers to the delwri queue until pushed
Instead of adding buffers to the delwri list as soon as they are logged,
even if they can't be written until commited because they are pinned
defer adding them to the delwri list until xfsaild pushes them. This
makes the code more similar to other log items and prepares for writing
buffers directly from xfsaild.
The complication here is that we need to fail buffers that were added
but not logged yet in xfs_buf_item_unpin, borrowing code from
xfs_bioerror.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 9e46ca73d83721e0f841e41498272e18bb04b50d
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:37 2012 +1000
xfs: do not write the buffer from xfs_qm_dqflush
Instead of writing the buffer directly from inside xfs_qm_dqflush return it
to the caller and let the caller decide what to do with the buffer. Also
remove the pincount check in xfs_qm_dqflush that all non-blocking callers
already implement and the now unused flags parameter and the XFS_DQ_IS_DIRTY
check that all callers already perform.
[ Dave Chinner: fixed build error cause by missing '{'. ]
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 23bb1cd05c9792e0e5b2f769bb342b1e393c4a01
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:36 2012 +1000
xfs: do not write the buffer from xfs_iflush
Instead of writing the buffer directly from inside xfs_iflush return it to
the caller and let the caller decide what to do with the buffer. Also
remove the pincount check in xfs_iflush that all non-blocking callers
already
implement and the now unused flags parameter.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit b2e8fbb162ec9e850026068b2a0d5aecd595a154
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:35 2012 +1000
xfs: don't flush inodes from background inode reclaim
We already flush dirty inodes throug the AIL regularly, there is no reason
to have second thread compete with it and disturb the I/O pattern. We still
do write inodes when doing a synchronous reclaim from the shrinker or during
unmount for now.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 353923db2736113086ec17b509621a9061470f19
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:34 2012 +1000
xfs: implement freezing by emptying the AIL
Now that we write back all metadata either synchronously or through
the AIL we can simply implement metadata freezing in terms of
emptying the AIL.
The implementation for this is fairly simply and straight-forward:
A new routine is added that asks the xfsaild to push the AIL to the
end and waits for it to complete and send a wakeup. The routine will
then loop if the AIL is not actually empty, and continue to do so
until the AIL is compeltely empty.
We keep an inode reclaim pass in the freeze process to avoid having
memory pressure have to reclaim inodes that require dirtying the
filesystem to be reclaimed after the freeze has completed. This
means we can also treat unmount in the exact same way as freeze.
As an upside we can now remove the radix tree based inode writeback
and xfs_unmountfs_writesb.
[ Dave Chinner:
- Cleaned up commit message.
- Added inode reclaim passes back into freeze.
- Cleaned up wakeup mechanism to avoid the use of a new
sleep counter variable. ]
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 5f221bb04e73ab5cfbea56097fcecfe8b2929ed8
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:33 2012 +1000
xfs: allow assigning the tail lsn with the AIL lock held
Provide a variant of xlog_assign_tail_lsn that has the AIL lock already
held. By doing so we do an additional atomic_read + atomic_set under
the lock, which comes down to two instructions.
Switch xfs_trans_ail_update_bulk and xfs_trans_ail_delete_bulk to the
new version to reduce the number of lock roundtrips, and prepare for
a new addition that would require a third lock roundtrip in
xfs_trans_ail_delete_bulk. This addition is also the reason for
slightly rearranging the conditionals and relying on xfs_log_space_wake
for checking that the filesystem has been shut down internally.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 9994be38f2b55d4d64e360dcaff4d8820d2ee5cb
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:32 2012 +1000
xfs: remove log item from AIL in xfs_iflush after a shutdown
If a filesystem has been forced shutdown we are never going to write inodes
to disk, which means the inode items will stay in the AIL until we free
the inode. Currently that is not a problem, but a pending change requires us
to empty the AIL before shutting down the filesystem. In that case leaving
the inode in the AIL is lethal. Make sure to remove the log item from the
AIL
to allow emptying the AIL on shutdown filesystems.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 58237bea40964e083c26f934aaa13d4878607aa4
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Mon Apr 23 15:58:31 2012 +1000
xfs: remove log item from AIL in xfs_qm_dqflush after a shutdown
If a filesystem has been forced shutdown we are never going to write dquots
to disk, which means the dquot items will stay in the AIL forever.
Currently that is not a problem, but a pending chance requires us to
empty the AIL before shutting down the filesystem, in which case this
behaviour is lethal. Make sure to remove the log item from the AIL
to allow emptying the AIL on shutdown filesystems.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 0d7f4c8c971ead1aa633016d8265ff3fdd08a53d
Author: Shaohua Li <shli@xxxxxxxxxx>
Date: Tue Apr 24 21:23:46 2012 +0800
xfs: using GFP_NOFS for blkdev_issue_flush
Issuing a block device flush request in transaction context using GFP_KERNEL
directly can cause deadlocks due to memory reclaim recursion. Use GFP_NOFS
to
avoid recursion from reclaim context.
Signed-off-by: Shaohua Li <shli@xxxxxxxxxxxx>
Reviewed-by: Mark Tinguely <tinguely@xxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
commit 41226c22aaabfbcfc7e7286027db2d3ab1518b80
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Apr 26 09:23:09 2012 +1000
xfs: punch all delalloc blocks beyond EOF on write failure.
I've been seeing regular ASSERT failures in xfstests when running
fsstress based tests over the past month. xfs_getbmap() has been
failing this test:
XFS: Assertion failed: ((iflags & BMV_IF_DELALLOC) != 0) ||
(map[i].br_startblock != DELAYSTARTBLOCK), file: fs/xfs/xfs_bmap.c,
line: 5650
where it is encountering a delayed allocation extent after writing
all the dirty data to disk and then walking the extent map
atomically by holding the XFS_IOLOCK_SHARED to prevent new delayed
allocation extents from being created.
Test 083 on a 512 byte block size filesystem was used to reproduce
the problem, because it only had a 5s run timeand would usually fail
every 3-4 runs. This test is exercising ENOSPC behaviour by running
fsstress on a nearly full filesystem. The following trace extract
shows the final few events on the inode that tripped the assert:
xfs_ilock: flags ILOCK_EXCL caller xfs_setfilesize
xfs_setfilesize: isize 0x180000 disize 0x12d400 offset 0x17e200
count 7680
file size updated to 0x180000 by IO completion
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_iext_insert: state idx 3 offset 3072 block 4503599627239432
count 1 flag 0 caller xfs_bmap_add_extent_hole_delay
xfs_get_blocks_alloc: size 0x180000 offset 0x180000 count 512 type
startoff 0xc00 startblock -1 blockcount 0x1
xfs_ilock: flags ILOCK_EXCL caller __xfs_get_blocks
delalloc write, adding a single block at offset 0x180000
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180200
count 512
ENOSPC trying to allocate a dellalloc block at offset 0x180200
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_get_blocks_alloc: size 0x180000 offset 0x180200 count 512 type
startoff 0xc00 startblock -1 blockcount 0x2
And succeeding on retry after flushing dirty inodes.
xfs_ilock: flags ILOCK_EXCL caller __xfs_get_blocks
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180400
count 512
ENOSPC trying to allocate a dellalloc block at offset 0x180400
xfs_ilock: flags ILOCK_EXCL caller xfs_iomap_write_delay
xfs_delalloc_enospc: isize 0x180000 disize 0x180000 offset 0x180400
count 512
And failing the retry, giving a real ENOSPC error.
xfs_ilock: flags ILOCK_EXCL caller xfs_vm_write_failed
^^^^^^^^^^^^^^^^^^^
The smoking gun - the write being failed and cleaning up delalloc
blocks beyond EOF allocated by the failed write.
xfs_getattr:
xfs_ilock: flags IOLOCK_SHARED caller xfs_getbmap
xfs_ilock: flags ILOCK_SHARED caller xfs_ilock_map_shared
And that's where we died almost immediately afterwards.
xfs_bmapi_read() found delalloc extent beyond current file in memory
file size. Some debug I added to xfs_getbmap() showed the state just
before the assert failure:
ino 0x80e48: off 0xc00, fsb 0xffffffffffffffff, len 0x1, size 0x180000
start_fsb 0x106, end_fsb 0x638
ino flags 0x2 nex 0xd bmvcnt 0x555, len 0x3c58a6f23c0bf1, start 0xc00
ext 0: off 0x1fc, fsb 0x24782, len 0x254
ext 1: off 0x450, fsb 0x40851, len 0x30
ext 2: off 0x480, fsb 0xd99, len 0x1b8
ext 3: off 0x92f, fsb 0x4099a, len 0x3b
ext 4: off 0x96d, fsb 0x41844, len 0x98
ext 5: off 0xbf1, fsb 0x408ab, len 0xf
which shows that we found a single delalloc block beyond EOF (first
line of output) when we were returning the map for a length
somewhere around 10^16 bytes long (second line), and the on-disk
extents showed they didn't go past EOF (last lines).
Further debug added to xfs_vm_write_failed() showed this happened
when punching out delalloc blocks beyond the end of the file after
the failed write:
[ 132.606693] ino 0x80e48: vwf to 0x181000, sze 0x180000
[ 132.609573] start_fsb 0xc01, end_fsb 0xc08
It punched the range 0xc01 -> 0xc08, but the range we really need to
punch is 0xc00 -> 0xc07 (8 blocks from 0xc00) as this testing was
run on a 512 byte block size filesystem (8 blocks per page).
the punch from is 0xc00. So end_fsb is correct, but start_fsb is
wrong as we punch from start_fsb for (end_fsb - start_fsb) blocks.
Hence we are not punching the delalloc block beyond EOF in the case.
The fix is simple - it's a silly off-by-one mistake in calculating
the range. It's especially silly because the macro used to calculate
the start_fsb already takes into account the case where the inode
size is an exact multiple of the filesystem block size...
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Eric Sandeen <sandeen@xxxxxxxxxx>
Signed-off-by: Ben Myers <bpm@xxxxxxx>
-----------------------------------------------------------------------
Summary of changes:
fs/xfs/xfs_aops.c | 14 +-
fs/xfs/xfs_buf.c | 341 +++++++++++++++++----------------------------
fs/xfs/xfs_buf.h | 28 +---
fs/xfs/xfs_buf_item.c | 107 +++++---------
fs/xfs/xfs_dquot.c | 90 ++++--------
fs/xfs/xfs_dquot.h | 3 +-
fs/xfs/xfs_dquot_item.c | 160 ++++++---------------
fs/xfs/xfs_extfree_item.c | 58 +++-----
fs/xfs/xfs_iget.c | 18 +--
fs/xfs/xfs_inode.c | 100 ++++---------
fs/xfs/xfs_inode.h | 3 +-
fs/xfs/xfs_inode_item.c | 174 +++++++----------------
fs/xfs/xfs_inode_item.h | 2 +-
fs/xfs/xfs_log.c | 31 +++--
fs/xfs/xfs_log.h | 1 +
fs/xfs/xfs_log_cil.c | 244 +++++++++++++++++++-------------
fs/xfs/xfs_log_priv.h | 2 +
fs/xfs/xfs_log_recover.c | 49 ++++---
fs/xfs/xfs_mount.c | 56 ++------
fs/xfs/xfs_mount.h | 2 +-
fs/xfs/xfs_qm.c | 169 +++++++++++-----------
fs/xfs/xfs_super.c | 25 ++--
fs/xfs/xfs_sync.c | 240 +++++++++----------------------
fs/xfs/xfs_trace.h | 7 +-
fs/xfs/xfs_trans.h | 18 +--
fs/xfs/xfs_trans_ail.c | 206 +++++++++++++--------------
fs/xfs/xfs_trans_buf.c | 86 ++++--------
fs/xfs/xfs_trans_priv.h | 12 +-
28 files changed, 884 insertions(+), 1362 deletions(-)
hooks/post-receive
--
XFS development tree
|