This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".
The branch, for-linus has been updated
d0eb2f3 xfs: convert grant head manipulations to lockless algorithm
3f16b98 xfs: introduce new locks for the log grant ticket wait queues
c8a09ff xfs: convert log grant heads to atomic variables
1c3cb9e xfs: convert l_tail_lsn to an atomic variable.
84f3c68 xfs: convert l_last_sync_lsn to an atomic variable
2ced19c xfs: make AIL tail pushing independent of the grant lock
eb40a87 xfs: use wait queues directly for the log wait queues
a69ed03 xfs: combine grant heads into a single 64 bit integer
663e496 xfs: rework log grant space calculations
3f336c6 xfs: fact out common grant head/log tail verification code
1054794 xfs: convert log grant ticket queues to list heads
9552e7f xfs: use AIL bulk delete function to implement single delete
e605994 xfs: use AIL bulk update function to implement single updates
3013683 xfs: remove all the inodes on a buffer from the AIL in bulk
c90821a xfs: consume iodone callback items on buffers as they are processed
e677d0f xfs: reduce the number of AIL push wakeups
0e57f6a xfs: bulk AIL insertion during transaction commit
eb3efa1 xfs: clean up xfs_ail_delete()
b199c8a xfs: Pull EFI/EFD handling out from under the AIL lock
9c5f841 xfs: fix EFI transaction cancellation.
821eb21 xfs: connect up buffer reclaim priority hooks
430cbeb xfs: add a lru to the XFS buffer cache
ff57ab2 xfs: convert xfsbud shrinker to a per-buftarg shrinker.
1a427ab xfs: convert pag_ici_lock to a spin lock
1a3e8f3 xfs: convert inode cache lookups to use RCU locking
d95b7aa xfs: rcu free inodes
6e85756 xfs: don't truncate prealloc from frequently accessed inodes
055388a xfs: dynamic speculative EOF preallocation
622d814 xfs: use KM_NOFS for allocations during attribute list operations
dcfcf20 xfs: provide a inode iolock lockdep class
489a150 xfs: factor duplicate code in xfs_alloc_ag_vextent_near into a helper
9f9baab xfs: clean up xfs_alloc_ag_vextent_exact
ecff71e xfs: simplify xfs_map_at_offset
aeea1b1 xfs: refactor xfs_vm_writepage
2fa24f9 xfs: remove the all_bh flag from xfs_convert_page
ed1e7b7 xfs: remove xfs_probe_cluster
8ff2957 xfs: simplify xfs_map_blocks
a206c81 xfs: kill xfs_iomap
405f804 xfs: cleanup the xfs_iomap_write_* helpers
6ac7248 xfs: a few small tweaks for overwrites in xfs_vm_writepage
221cb25 xfs: remove some dead bio handling code
85da94c xfs: improve mapping type check in xfs_vm_writepage
c9f71f5 xfs: untangle phase1 vs phase2 recovery helpers
d045094 xfs: refactor xlog_recover_commit_trans
d5689ea xfs: use struct list_head for the buf cancel table
e2714bf xfs: remove leftovers of old buffer log items in recovery code
576ecb8 xfs: fix exporting with left over 64-bit inodes
from 05340d4ab2ec2b6b4962c1c41c6ea8fb550f947b (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit d0eb2f38b250b7d6c993adf81b0e4ded0565497e
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:29:14 2010 +1100
xfs: convert grant head manipulations to lockless algorithm
The only thing that the grant lock remains to protect is the grant head
manipulations when adding or removing space from the log. These calculations
are already based on atomic variables, so we can already update them safely
without locks. However, the grant head manpulations require atomic
multi-step
calculations to be executed, which the algorithms currently don't allow.
To make these multi-step calculations atomic, convert the algorithms to
compare-and-exchange loops on the atomic variables. That is, we sample the
old
value, perform the calculation and use atomic64_cmpxchg() to attempt to
update
the head with the new value. If the head has not changed since we sampled
it,
it will succeed and we are done. Otherwise, we rerun the calculation again
from
a new sample of the head.
This allows us to remove the grant lock from around all the grant head space
manipulations, and that effectively removes the grant lock from the log
completely. Hence we can remove the grant lock completely from the log at
this
point.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 3f16b9850743b702380f098ab5e0308cd6af1792
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:29:01 2010 +1100
xfs: introduce new locks for the log grant ticket wait queues
The log grant ticket wait queues are currently protected by the log
grant lock. However, the queues are functionally independent from
each other, and operations on them only require serialisation
against other queue operations now that all of the other log
variables they use are atomic values.
Hence, we can make them independent of the grant lock by introducing
new locks just to protect the lists operations. because the lists
are independent, we can use a lock per list and ensure that reserve
and write head queuing do not contend.
To ensure forced shutdowns work correctly in conjunction with the
new fast paths, ensure that we check whether the log has been shut
down in the grant functions once we hold the relevant spin locks but
before we go to sleep. This is needed to co-ordinate correctly with
the wakeups that are issued on the ticket queues so we don't leave
any processes sleeping on the queues during a shutdown.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit c8a09ff8ca2235bccdaea8a52fbd5349646a8ba4
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Sat Dec 4 00:02:40 2010 +1100
xfs: convert log grant heads to atomic variables
Convert the log grant heads to atomic64_t types in preparation for
converting the accounting algorithms to atomic operations. his patch
just converts the variables; the algorithmic changes are in a
separate patch for clarity.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 1c3cb9ec07fabf0c0970adc46fd2a1f09c1186dd
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:28:39 2010 +1100
xfs: convert l_tail_lsn to an atomic variable.
log->l_tail_lsn is currently protected by the log grant lock. The
lock is only needed for serialising readers against writers, so we
don't really need the lock if we make the l_tail_lsn variable an
atomic. Converting the l_tail_lsn variable to an atomic64_t means we
can start to peel back the grant lock from various operations.
Also, provide functions to safely crack an atomic LSN variable into
it's component pieces and to recombined the components into an
atomic variable. Use them where appropriate.
This also removes the need for explicitly holding a spinlock to read
the l_tail_lsn on 32 bit platforms.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
commit 84f3c683c4d3f36d3c3ed320babd960a332ac458
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Fri Dec 3 22:11:29 2010 +1100
xfs: convert l_last_sync_lsn to an atomic variable
log->l_last_sync_lsn is updated in only one critical spot - log
buffer Io completion - and is protected by the grant lock here. This
requires the grant lock to be taken for every log buffer IO
completion. Converting the l_last_sync_lsn variable to an atomic64_t
means that we do not need to take the grant lock in log buffer IO
completion to update it.
This also removes the need for explicitly holding a spinlock to read
the l_last_sync_lsn on 32 bit platforms.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 2ced19cbae5448b720919a494606c62095d4f4db
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:09:20 2010 +1100
xfs: make AIL tail pushing independent of the grant lock
The xlog_grant_push_ail() currently takes the grant lock internally to
sample
the tail lsn, last sync lsn and the reserve grant head. Most of the callers
already hold the grant lock but have to drop it before calling
xlog_grant_push_ail(). This is a left over from when the AIL tail pushing
was
done in line and hence xlog_grant_push_ail had to drop the grant lock. AIL
push
is now done in another thread and hence we can safely hold the grant lock
over
the entire xlog_grant_push_ail call.
Push the grant lock outside of xlog_grant_push_ail() to simplify the locking
and synchronisation needed for tail pushing. This will reduce traffic on
the
grant lock by itself, but this is only one step in preparing for the
complete
removal of the grant lock.
While there, clean up the formatting of xlog_grant_push_ail() to match the
rest of the XFS code.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit eb40a87500ac2f6be7eaf8ebb35610e6d0e60e9a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:09:01 2010 +1100
xfs: use wait queues directly for the log wait queues
The log grant queues are one of the few places left using sv_t
constructs for waiting. Given we are touching this code, we should
convert them to plain wait queues. While there, convert all the
other sv_t users in the log code as well.
Seeing as this removes the last users of the sv_t type, remove the
header file defining the wrapper and the fragments that still
reference it.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit a69ed03c24d4a336c23b7116127713d5a8c5ac4d
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:08:20 2010 +1100
xfs: combine grant heads into a single 64 bit integer
Prepare for switching the grant heads to atomic variables by
combining the two 32 bit values that make up the grant head into a
single 64 bit variable. Provide wrapper functions to combine and
split the grant heads appropriately for calculations and use them as
necessary.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 663e496a720a3a9fc08ea70b29724e8906b34e43
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:06:05 2010 +1100
xfs: rework log grant space calculations
The log grant space calculations are repeated for both write and
reserve grant heads. To make it simpler to convert the calculations
toa different algorithm, factor them so both the gratn heads use the
same calculation functions. Once this is done we can drop the
wrappers that are used in only a couple of place to update both
grant heads at once as they don't provide any particular value.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 3f336c6fa17c2b3d14b3dd1bd6e64e9cc97b6359
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:02:52 2010 +1100
xfs: fact out common grant head/log tail verification code
Factor repeated debug code out of grant head manipulation functions into a
separate function. This removes ifdef DEBUG spagetti from the code and makes
the code easier to follow.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 1054794198e39103cb986618c4c10ec2252b7089
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Dec 21 12:02:25 2010 +1100
xfs: convert log grant ticket queues to list heads
The grant write and reserve queues use a roll-your-own double linked
list, so convert it to a standard list_head structure and convert
all the list traversals to use list_for_each_entry(). We can also
get rid of the XLOG_TIC_IN_Q flag as we can use the list_empty()
check to tell if the ticket is in a list or not.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 9552e7f2f3dd13a7580e488a7a3582332daad4f5
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 12:36:15 2010 +1100
xfs: use AIL bulk delete function to implement single delete
We now have two copies of AIL delete operations that are mostly
duplicate functionality. The single log item deletes can be
implemented via the bulk updates by turning xfs_trans_ail_delete()
into a simple wrapper. This removes all the duplicate delete
functionality and associated helpers.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit e60599492990d1b52c70e9ed2f8e062fe11ca937
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 12:34:26 2010 +1100
xfs: use AIL bulk update function to implement single updates
We now have two copies of AIL insert operations that are mostly
duplicate functionality. The single log item updates can be
implemented via the bulk updates by turning xfs_trans_ail_update()
into a simple wrapper. This removes all the duplicate insert
functionality and associated helpers.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 3013683253ad04f67d8cfaa25be708353686b90a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 12:03:17 2010 +1100
xfs: remove all the inodes on a buffer from the AIL in bulk
When inode buffer IO completes, usually all of the inodes are removed from
the
AIL. This involves processing them one at a time and taking the AIL lock
once
for every inode. When all CPUs are processing inode IO completions, this
causes
excessive amount sof contention on the AIL lock.
Instead, change the way we process inode IO completion in the buffer
IO done callback. Allow the inode IO done callback to walk the list
of IO done callbacks and pull all the inodes off the buffer in one
go and then process them as a batch.
Once all the inodes for removal are collected, take the AIL lock
once and do a bulk removal operation to minimise traffic on the AIL
lock.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit c90821a26a8c90ad1e3116393b8a8260ab46bffb
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Fri Dec 3 17:00:52 2010 +1100
xfs: consume iodone callback items on buffers as they are processed
To allow buffer iodone callbacks to consume multiple items off the
callback list, first we need to convert the xfs_buf_do_callbacks()
to consume items and always pull the next item from the head of the
list.
The means the item list walk is never dependent on knowing the
next item on the list and hence allows callbacks to remove items
from the list as well. This allows callbacks to do bulk operations
by scanning the list for identical callbacks, consuming them all
and then processing them in bulk, negating the need for multiple
callbacks of that type.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit e677d0f9548e2245ee3c2977661ca8ca165af188
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Fri Dec 17 20:08:04 2010 +1100
xfs: reduce the number of AIL push wakeups
The xfaild often tries to rest to wait for congestion to pass of for
IO to complete, but is regularly woken in tail-pushing situations.
In severe cases, the xfsaild is getting woken tens of thousands of
times a second. Reduce the number needless wakeups by only waking
the xfsaild if the new target is larger than the old one. Further
make short sleeps uninterruptible as they occur when the xfsaild has
decided it needs to back off to allow some IO to complete and being
woken early is counter-productive.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 0e57f6a36f9be03e5abb755f524ee91c4aebe854
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 12:02:19 2010 +1100
xfs: bulk AIL insertion during transaction commit
When inserting items into the AIL from the transaction committed
callbacks, we take the AIL lock for every single item that is to be
inserted. For a CIL checkpoint commit, this can be tens of thousands
of individual inserts, yet almost all of the items will be inserted
at the same point in the AIL because they have the same index.
To reduce the overhead and contention on the AIL lock for such
operations, introduce a "bulk insert" operation which allows a list
of log items with the same LSN to be inserted in a single operation
via a list splice. To do this, we need to pre-sort the log items
being committed into a temporary list for insertion.
The complexity is that not every log item will end up with the same
LSN, and not every item is actually inserted into the AIL. Items
that don't match the commit LSN will be inserted and unpinned as per
the current one-at-a-time method (relatively rare), while items that
are not to be inserted will be unpinned and freed immediately. Items
that are to be inserted at the given commit lsn are placed in a
temporary array and inserted into the AIL in bulk each time the
array fills up.
As a result of this, we trade off AIL hold time for a significant
reduction in traffic. lock_stat output shows that the worst case
hold time is unchanged, but contention from AIL inserts drops by an
order of magnitude and the number of lock traversal decreases
significantly.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit eb3efa1249b6413be930bdf13d10b6238028a440
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Fri Dec 3 16:42:57 2010 +1100
xfs: clean up xfs_ail_delete()
xfs_ail_delete() has a needlessly complex interface. It returns the log item
that was passed in for deletion (which the callers then assert is identical
to
the one passed in), and callers of xfs_ail_delete() still need to invalidate
current traversal cursors.
Make xfs_ail_delete() return void, move the cursor invalidation inside it,
and
clean up the callers just to use the log item pointer they passed in.
While cleaning up, remove the messy and unnecessary "/* ARGUSED */" comments
around all these functions.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit b199c8a4ba11879df87daad496ceee41fdc6aa82
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 11:59:49 2010 +1100
xfs: Pull EFI/EFD handling out from under the AIL lock
EFI/EFD interactions are protected from races by the AIL lock. They
are the only type of log items that require the the AIL lock to
serialise internal state, so they need to be separated from the AIL
lock before we can do bulk insert operations on the AIL.
To acheive this, convert the counter of the number of extents in the
EFI to an atomic so it can be safely manipulated by EFD processing
without locks. Also, convert the EFI state flag manipulations to use
atomic bit operations so no locks are needed to record state
changes. Finally, use the state bits to determine when it is safe to
free the EFI and clean up the code to do this neatly.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 9c5f8414efd5eeed9f498d4170337a3eb126341f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Mon Dec 20 11:57:24 2010 +1100
xfs: fix EFI transaction cancellation.
XFS_EFI_CANCELED has not been set in the code base since
xfs_efi_cancel() was removed back in 2006 by commit
065d312e15902976d256ddaf396a7950ec0350a8 ("[XFS] Remove unused
iop_abort log item operation), and even then xfs_efi_cancel() was
never called. I haven't tracked it back further than that (beyond
git history), but it indicates that the handling of EFIs in
cancelled transactions has been broken for a long time.
Basically, when we get an IOP_UNPIN(lip, 1); call from
xfs_trans_uncommit() (i.e. remove == 1), if we don't free the log
item descriptor we leak it. Fix the behviour to be correct and kill
the XFS_EFI_CANCELED flag.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 821eb21d97a8b686649c08b7284d0b9f34d0e138
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 2 16:31:13 2010 +1100
xfs: connect up buffer reclaim priority hooks
Now that the buffer reclaim infrastructure can handle different reclaim
priorities for different types of buffers, reconnect the hooks in the
XFS code that has been sitting dormant since it was ported to Linux. This
should finally give use reclaim prioritisation that is on a par with the
functionality that Irix provided XFS 15 years ago.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 430cbeb86fdcbbdabea7d4aa65307de8de425350
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 2 16:30:55 2010 +1100
xfs: add a lru to the XFS buffer cache
Introduce a per-buftarg LRU for memory reclaim to operate on. This
is the last piece we need to put in place so that we can fully
control the buffer lifecycle. This allows XFS to be responsibile for
maintaining the working set of buffers under memory pressure instead
of relying on the VM reclaim not to take pages we need out from
underneath us.
The implementation introduces a b_lru_ref counter into the buffer.
This is currently set to 1 whenever the buffer is referenced and so is used
to
determine if the buffer should be added to the LRU or not when freed.
Effectively it allows lazy LRU initialisation of the buffer so we do not
need
to touch the LRU list and locks in xfs_buf_find().
Instead, when the buffer is being released and we drop the last
reference to it, we check the b_lru_ref count and if it is none zero
we re-add the buffer reference and add the inode to the LRU. The
b_lru_ref counter is decremented by the shrinker, and whenever the
shrinker comes across a buffer with a zero b_lru_ref counter, if
released the LRU reference on the buffer. In the absence of a lookup
race, this will result in the buffer being freed.
This counting mechanism is used instead of a reference flag so that
it is simple to re-introduce buffer-type specific reclaim reference
counts to prioritise reclaim more effectively. We still have all
those hooks in the XFS code, so this will provide the infrastructure
to re-implement that functionality.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit ff57ab21995a8636cfc72efeebb09cc6034d756f
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Nov 30 17:27:57 2010 +1100
xfs: convert xfsbud shrinker to a per-buftarg shrinker.
Before we introduce per-buftarg LRU lists, split the shrinker
implementation into per-buftarg shrinker callbacks. At the moment
we wake all the xfsbufds to run the delayed write queues to free
the dirty buffers and make their pages available for reclaim.
However, with an LRU, we want to be able to free clean, unused
buffers as well, so we need to separate the xfsbufd from the
shrinker callbacks.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Alex Elder <aelder@xxxxxxx>
commit 1a427ab0c1b205d1bda8da0b77ea9d295ac23c57
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 16 17:08:41 2010 +1100
xfs: convert pag_ici_lock to a spin lock
now that we are using RCU protection for the inode cache lookups,
the lock is only needed on the modification side. Hence it is not
necessary for the lock to be a rwlock as there are no read side
holders anymore. Convert it to a spin lock to reflect it's exclusive
nature.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Alex Elder <aelder@xxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 1a3e8f3da09c7082d25b512a0ffe569391e4c09a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Fri Dec 17 17:29:43 2010 +1100
xfs: convert inode cache lookups to use RCU locking
With delayed logging greatly increasing the sustained parallelism of inode
operations, the inode cache locking is showing significant read vs write
contention when inode reclaim runs at the same time as lookups. There is
also a lot more write lock acquistions than there are read locks (4:1 ratio)
so the read locking is not really buying us much in the way of parallelism.
To avoid the read vs write contention, change the cache to use RCU locking
on
the read side. To avoid needing to RCU free every single inode, use the
built
in slab RCU freeing mechanism. This requires us to be able to detect
lookups of
freed inodes, so enÑ?ure that ever freed inode has an inode number of zero
and
the XFS_IRECLAIM flag set. We already check the XFS_IRECLAIM flag in cache
hit
lookup path, but also add a check for a zero inode number as well.
We canthen convert all the read locking lockups to use RCU read side locking
and hence remove all read side locking.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Alex Elder <aelder@xxxxxxx>
commit d95b7aaf9ab6738bef1ebcc52ab66563085e44ac
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 16 16:41:39 2010 +1100
xfs: rcu free inodes
Introduce RCU freeing of XFS inodes so that we can convert lookup
traversals to use rcu_read_lock() protection. This patch only
introduces the RCU freeing to minimise the potential conflicts with
mainline if this is merged into mainline via a VFS patchset. It
abuses the i_dentry list for the RCU callback structure because the
VFS patches make this a union so it is safe to use like this and
simplifies and merge issues.
This patch uses basic RCU freeing rather than SLAB_DESTROY_BY_RCU.
The later lookup patches need the same "found free inode" protection
regardless of the RCU freeing method used, so once again the RCU
freeing method can be dealt with apprpriately at merge time without
affecting any other code.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Paul E. McKenney <paulmck@xxxxxxxxxxxxxxxxxx>
commit 6e857567dbbfe14dd6cc3f7414671b047b1ff5c7
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 23 12:02:31 2010 +1100
xfs: don't truncate prealloc from frequently accessed inodes
A long standing problem for streaming writeÑ? through the NFS server
has been that the NFS server opens and closes file descriptors on an
inode for every write. The result of this behaviour is that the
->release() function is called on every close and that results in
XFS truncating speculative preallocation beyond the EOF. This has
an adverse effect on file layout when multiple files are being
written at the same time - they interleave their extents and can
result in severe fragmentation.
To avoid this problem, keep track of ->release calls made on a dirty
inode. For most cases, an inode is only going to be opened once for
writing and then closed again during it's lifetime in cache. Hence
if there are multiple ->release calls when the inode is dirty, there
is a good chance that the inode is being accessed by the NFS server.
Hence set a flag the first time ->release is called while there are
delalloc blocks still outstanding on the inode.
If this flag is set when ->release is next called, then do no
truncate away the speculative preallocation - leave it there so that
subsequent writes do not need to reallocate the delalloc space. This
will prevent interleaving of extents of different inodes written
concurrently to the same AG.
If we get this wrong, it is not a big deal as we truncate
speculative allocation beyond EOF anyway in xfs_inactive() when the
inode is thrown out of the cache.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 055388a3188f56676c21e92962fc366ac8b5cb72
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Tue Jan 4 11:35:03 2011 +1100
xfs: dynamic speculative EOF preallocation
Currently the size of the speculative preallocation during delayed
allocation is fixed by either the allocsize mount option of a
default size. We are seeing a lot of cases where we need to
recommend using the allocsize mount option to prevent fragmentation
when buffered writes land in the same AG.
Rather than using a fixed preallocation size by default (up to 64k),
make it dynamic by basing it on the current inode size. That way the
EOF preallocation will increase as the file size increases. Hence
for streaming writes we are much more likely to get large
preallocations exactly when we need it to reduce fragementation.
For default settings, the size of the initial extents is determined
by the number of parallel writers and the amount of memory in the
machine. For 4GB RAM and 4 concurrent 32GB file writes:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET
TOTAL
0: [0..1048575]: 1048672..2097247 0 (1048672..2097247)
1048576
1: [1048576..2097151]: 5242976..6291551 0 (5242976..6291551)
1048576
2: [2097152..4194303]: 12583008..14680159 0 (12583008..14680159)
2097152
3: [4194304..8388607]: 25165920..29360223 0 (25165920..29360223)
4194304
4: [8388608..16777215]: 58720352..67108959 0 (58720352..67108959)
8388608
5: [16777216..33554423]: 117440584..134217791 0 (117440584..134217791)
16777208
6: [33554424..50331511]: 184549056..201326143 0 (184549056..201326143)
16777088
7: [50331512..67108599]: 251657408..268434495 0 (251657408..268434495)
16777088
and for 16 concurrent 16GB file writes:
EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET
TOTAL
0: [0..262143]: 2490472..2752615 0 (2490472..2752615)
262144
1: [262144..524287]: 6291560..6553703 0 (6291560..6553703)
262144
2: [524288..1048575]: 13631592..14155879 0 (13631592..14155879)
524288
3: [1048576..2097151]: 30408808..31457383 0 (30408808..31457383)
1048576
4: [2097152..4194303]: 52428904..54526055 0 (52428904..54526055)
2097152
5: [4194304..8388607]: 104857704..109052007 0 (104857704..109052007)
4194304
6: [8388608..16777215]: 209715304..218103911 0 (209715304..218103911)
8388608
7: [16777216..33554423]: 452984848..469762055 0 (452984848..469762055)
16777208
Because it is hard to take back specualtive preallocation, cases
where there are large slow growing log files on a nearly full
filesystem may cause premature ENOSPC. Hence as the filesystem nears
full, the maximum dynamic prealloc size Ñ?s reduced according to this
table (based on 4k block size):
freespace max prealloc size
>5% full extent (8GB)
4-5% 2GB (8GB >> 2)
3-4% 1GB (8GB >> 3)
2-3% 512MB (8GB >> 4)
1-2% 256MB (8GB >> 5)
<1% 128MB (8GB >> 6)
This should reduce the amount of space held in speculative
preallocation for such cases.
The allocsize mount option turns off the dynamic behaviour and fixes
the prealloc size to whatever the mount option specifies. i.e. the
behaviour is unchanged.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
commit 622d81494fa32343a4b97b607619656c7a4a6d1a
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 23 11:57:37 2010 +1100
xfs: use KM_NOFS for allocations during attribute list operations
When listing attributes, we are doiing memory allocations under the
inode ilock using only KM_SLEEP. This allows memory allocation to
recurse back into the filesystem and do writeback, which may the
ilock we already hold on the current inode. THis will deadlock.
Hence use KM_NOFS for such allocations outside of transaction
context to ensure that reclaim recursion does not occur.
Reported-by: Nick Piggin <npiggin@xxxxxxxxx>
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit dcfcf20512cb517ac18b9433b676183fa1257911
Author: Dave Chinner <dchinner@xxxxxxxxxx>
Date: Thu Dec 23 11:57:13 2010 +1100
xfs: provide a inode iolock lockdep class
The XFS iolock needs to be re-initialised to a new lock class before
it enters reclaim to prevent lockdep false positives. Unfortunately,
this is not sufficient protection as inodes in the XFS_IRECLAIMABLE
state can be recycled and not re-initialised before being reused.
We need to re-initialise the lock state when transfering out of
XFS_IRECLAIMABLE state to XFS_INEW, but we need to keep the same
class as if the inode was just allocated. Hence we need a specific
lockdep class variable for the iolock so that both initialisations
use the same class.
While there, add a specific class for inodes in the reclaim state so
that it is easy to tell from lockdep reports what state the inode
was in that generated the report.
Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
Reviewed-by: Christoph Hellwig <hch@xxxxxx>
commit 489a150f6454e2cd93d9e0ee6d7c5a361844f62a
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 15:04:11 2010 +0000
xfs: factor duplicate code in xfs_alloc_ag_vextent_near into a helper
Add a new xfs_alloc_find_best_extent that does a forward/backward
search in the allocation btree. That code previously was existed
two times in xfs_alloc_ag_vextent_near, once for each search
direction.
Based on an earlier patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 9f9baab38dacd11fe6095a1e59f3783a305f7020
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 15:03:57 2010 +0000
xfs: clean up xfs_alloc_ag_vextent_exact
Use a goto label to consolidate all block not found cases, and add a
tracepoint for them. Also clean up a few whitespace issues.
Based on an earlier patch from Dave Chinner.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit ecff71e677c6d469f525dcf31ada709d5858307c
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:25 2010 +0000
xfs: simplify xfs_map_at_offset
Move the buffer locking into the callers as they need to do it
wether they call xfs_map_at_offset or not. Remove the b_bdev
assignment, which is already done by get_blocks. Remove the
duplicate extent type asserts in xfs_convert_page just before
calling xfs_map_at_offset.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit aeea1b1f81800e362a3aca86d769d02e137a8fa7
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:24 2010 +0000
xfs: refactor xfs_vm_writepage
After the last patches the code for overwrites is the same as for
delayed and unwritten extents except that it doesn't need to call
xfs_map_at_offset. Take care of that fact to simplify
xfs_vm_writepage.
The buffer loop now first checks the type of buffer and checks/sets
the ioend type, or continues to the next buffer if it's not
interesting to us. Only after that we validate the iomap and
perform the block mapping if needed, all in common code for the
cases where we have to do work.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 2fa24f92530edaf86c3b5f662464e0d2e3b3e517
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:23 2010 +0000
xfs: remove the all_bh flag from xfs_convert_page
The all_bh flag is always set when entering the page clustering
machinery with a regular written extent, which means the check for
it is superflous.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit ed1e7b7e484dfb64168755613d499f32a97409bd
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:22 2010 +0000
xfs: remove xfs_probe_cluster
xfs_map_blocks always calls xfs_bmapi with the XFS_BMAPI_ENTIRE
entire flag, which tells it to not cap the extent at the passed in
size, but just treat the size as an minimum to map. This means
xfs_probe_cluster is entirely useless as we'll always get the whole
extent back anyway.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 8ff2957d581582890693affc09920108a67cb05d
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:21 2010 +0000
xfs: simplify xfs_map_blocks
No need to lock the extent map exclusive when performing an
overwrite, we know the extent map must already have been loaded by
get_blocks. Apply the non-blocking inode semantics to all mapping
types instead of just delayed allocations. Remove the handling of
not yet allocated blocks for the IO_UNWRITTEN case - if an extent is
marked as unwritten allocated in the buffer it must already have an
extent on disk.
Add asserts to verify all the assumptions above in debug builds.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit a206c817c864583c44e2f418db8e6c7a000fbc38
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:20 2010 +0000
xfs: kill xfs_iomap
Opencode the xfs_iomap code in it's two callers. The overlap of
passed flags already was minimal and will be further reduced in the
next patch.
As a side effect the BMAPI_* flags for xfs_bmapi and the IO_* flags
for I/O end processing are merged into a single set of flags, which
should be a bit more descriptive of the operation we perform.
Also improve the tracing by giving each caller it's own type set of
tracepoints.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 405f80429436d38ab4e6b4c0d99861a1f00648fd
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:19 2010 +0000
xfs: cleanup the xfs_iomap_write_* helpers
Remove passing the BMAPI_* flags to these helpers, in
xfs_iomap_write_direct the check BMAPI_DIRECT was always true, and
in the xfs_iomap_write_delay path is was never checked at all.
Remove the nmap return value as we never make use of it.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 6ac7248ec5f20cb44a063d7c7191b8e0068b5a28
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:18 2010 +0000
xfs: a few small tweaks for overwrites in xfs_vm_writepage
Don't trylock the buffer. We are the only one ever locking it for a
regular file address space, and trylock was only copied from the
generic code which did it due to the old buffer based writeout in
jbd. Also make sure to only write out the buffer if the iomap
actually is valid, because we wouldn't have a proper mapping
otherwise. In practice we will never get an invalid mapping here as
the page lock guarantees truncate doesn't race with us, but better
be safe than sorry. Also make sure we allocate a new ioend when
crossing boundaries between mappings, just like we do for delalloc
and unwritten extents. Again this currently doesn't matter as the
I/O end handler only cares for the boundaries for unwritten extents,
but this makes the code fully correct and the same as for
delalloc/unwritten extents.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 221cb2517e8fc9a1d67c7a8a9c19fc5a916b583f
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:17 2010 +0000
xfs: remove some dead bio handling code
We'll never have BIO_EOPNOTSUPP set after calling submit_bio as this
can only happen for discards, and used to happen for barriers, none
of which is every submitted by xfs_submit_ioend_bio. Also remove
the loop around bio_alloc as it will never fail due to it's mempool
backing.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 85da94c6b4666582c38579ccdcd90a5d9b5697ef
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Fri Dec 10 08:42:16 2010 +0000
xfs: improve mapping type check in xfs_vm_writepage
Currently we only refuse a "read-only" mapping for writing out
unwritten and delayed buffers, and refuse any other for overwrites.
Improve the checks to require delalloc mappings for delayed buffers,
and unwritten extent mappings for unwritten extents.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit c9f71f5fc4390ea3a8087c00d53a799e7e0f0f8e
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Wed Dec 1 22:06:24 2010 +0000
xfs: untangle phase1 vs phase2 recovery helpers
Dispatch to a different helper for phase1 vs phase2 in
xlog_recover_commit_trans instead of doing it in all the
low-level functions.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit d0450948641b2090b5d467ba638bbebd40b20b21
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Wed Dec 1 22:06:23 2010 +0000
xfs: refactor xlog_recover_commit_trans
Merge the call to xlog_recover_reorder_trans and the loop over the
recovery items from xlog_recover_do_trans into xlog_recover_commit_trans,
and keep the switch statement over the log item types as a separate helper.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit d5689eaa0ac5588cf459ee32f86d5700dd7d6403
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Wed Dec 1 22:06:22 2010 +0000
xfs: use struct list_head for the buf cancel table
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit e2714bf8d5c8e131a6df6b0ea2269433e9a03a9b
Author: Christoph Hellwig <hch@xxxxxxxxxxxxx>
Date: Wed Dec 1 22:06:21 2010 +0000
xfs: remove leftovers of old buffer log items in recovery code
XFS used to support different types of buffer log items long time
ago. Remove the switch statements checking the log item type in
various buffer recovery helpers that were left over from those days
and the rather useless xlog_recover_do_buffer_pass2 wrapper.
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
commit 576ecb8e2b725726471cc62b12c01e28d33127ba
Author: Samuel Kvasnica <samuel.kvasnica@xxxxxxxxx>
Date: Fri Nov 19 13:38:49 2010 +0000
xfs: fix exporting with left over 64-bit inodes
We now support mounting and using filesystems with 64-bit inodes
even when not mounted with the inode64 option (which now only
controls if we allocate new inodes in that space or not). Make sure
we always use large NFS file handles when exporting a filesystem
that may contain 64-bit inodes. Note that this only affects newly
generated file handles, any outstanding 32-bit file handle is still
accepted.
[hch: the comment and commit log are mine, the rest is from a patch
snipplet from Samuel]
Signed-off-by: Christoph Hellwig <hch@xxxxxx>
Signed-off-by: Alex Elder <aelder@xxxxxxx>
-----------------------------------------------------------------------
Summary of changes:
fs/xfs/linux-2.6/sv.h | 59 ----
fs/xfs/linux-2.6/xfs_aops.c | 425 ++++++++++--------------
fs/xfs/linux-2.6/xfs_aops.h | 16 +
fs/xfs/linux-2.6/xfs_buf.c | 235 +++++++++-----
fs/xfs/linux-2.6/xfs_buf.h | 22 +-
fs/xfs/linux-2.6/xfs_export.c | 12 +-
fs/xfs/linux-2.6/xfs_linux.h | 1 -
fs/xfs/linux-2.6/xfs_super.c | 22 +-
fs/xfs/linux-2.6/xfs_sync.c | 92 ++++--
fs/xfs/linux-2.6/xfs_trace.h | 59 ++--
fs/xfs/quota/xfs_dquot.c | 1 -
fs/xfs/xfs_ag.h | 2 +-
fs/xfs/xfs_alloc.c | 351 ++++++++------------
fs/xfs/xfs_attr_leaf.c | 4 +-
fs/xfs/xfs_btree.c | 9 +-
fs/xfs/xfs_buf_item.c | 32 ++-
fs/xfs/xfs_buf_item.h | 11 -
fs/xfs/xfs_extfree_item.c | 97 +++---
fs/xfs/xfs_extfree_item.h | 11 +-
fs/xfs/xfs_fsops.c | 1 +
fs/xfs/xfs_iget.c | 90 ++++-
fs/xfs/xfs_inode.c | 54 +++-
fs/xfs/xfs_inode.h | 15 +-
fs/xfs/xfs_inode_item.c | 92 +++++-
fs/xfs/xfs_iomap.c | 233 +++++---------
fs/xfs/xfs_iomap.h | 27 +--
fs/xfs/xfs_log.c | 739 +++++++++++++++++++----------------------
fs/xfs/xfs_log_cil.c | 17 +-
fs/xfs/xfs_log_priv.h | 127 ++++++--
fs/xfs/xfs_log_recover.c | 620 ++++++++++++++---------------------
fs/xfs/xfs_mount.c | 23 ++-
fs/xfs/xfs_mount.h | 14 +
fs/xfs/xfs_trans.c | 79 +++++-
fs/xfs/xfs_trans.h | 2 +-
fs/xfs/xfs_trans_ail.c | 232 +++++++------
fs/xfs/xfs_trans_extfree.c | 8 +-
fs/xfs/xfs_trans_priv.h | 35 ++-
fs/xfs/xfs_vnodeops.c | 61 +++--
38 files changed, 2009 insertions(+), 1921 deletions(-)
delete mode 100644 fs/xfs/linux-2.6/sv.h
hooks/post-receive
--
XFS development tree
|