[XFS updates] XFS development tree branch, for-linus, updated. v2.6.34-19248-g2bfc96a
xfs at oss.sgi.com
xfs at oss.sgi.com
Mon Aug 30 13:34:38 CDT 2010
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "XFS development tree".
The branch, for-linus has been updated
b5420f2 xfs: do not discard page cache data on EAGAIN
3b93c7a xfs: don't do memory allocation under the CIL context lock
a44f13e xfs: Reduce log force overhead for delayed logging
1a387d3 xfs: dummy transactions should not dirty VFS state
2fe3366 xfs: ensure f_ffree returned by statfs() is non-negative
efceab1 xfs: handle negative wbc->nr_to_write during sync writeback
4536f2a xfs: fix untrusted inode number lookup
5b3eed7 xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
d17c701 xfs: unlock items before allowing the CIL to commit
5f248c9 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
b57922d convert remaining ->clear_inode() to ->evict_inode()
a4ffdde simplify checks for I_CLEAR/I_FREEING
fa9b227 xfs: new truncate sequence
155130a get rid of block_write_begin_newtrunc
eafdc7d sort out blockdev_direct_IO variants
90e0c22 Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
ade7ce3 quota: Clean up the namespace in dqblk_xfs.h
from 6b0a2996a0c023d84bc27ec7528a6e54cb5ea264 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
commit b5420f235953448eeae615b3361584dc5e414f34
Author: Christoph Hellwig <hch at infradead.org>
Date: Tue Aug 24 11:47:51 2010 +1000
xfs: do not discard page cache data on EAGAIN
If xfs_map_blocks returns EAGAIN because of lock contention we must redirty the
page and not disard the pagecache content and return an error from writepage.
We used to do this correctly, but the logic got lost during the recent
reshuffle of the writepage code.
Signed-off-by: Christoph Hellwig <hch at lst.de>
Reported-by: Mike Gao <ygao.linux at gmail.com>
Tested-by: Mike Gao <ygao.linux at gmail.com>
Reviewed-by: Dave Chinner <dchinner at redhat.com>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
commit 3b93c7aaefc05ee2a75e2726929b01a321402984
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:45:53 2010 +1000
xfs: don't do memory allocation under the CIL context lock
Formatting items requires memory allocation when using delayed
logging. Currently that memory allocation is done while holding the
CIL context lock in read mode. This means that if memory allocation
takes some time (e.g. enters reclaim), we cannot push on the CIL
until the allocation(s) required by formatting complete. This can
stall CIL pushes for some time, and once a push is stalled so are
all new transaction commits.
Fix this splitting the item formatting into two steps. The first
step which does the allocation and memcpy() into the allocated
buffer is now done outside the CIL context lock, and only the CIL
insert is done inside the CIL context lock. This avoids the stall
issue.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit a44f13edf0ebb4e41942d0f16ca80489dcf6659d
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:40:03 2010 +1000
xfs: Reduce log force overhead for delayed logging
Delayed logging adds some serialisation to the log force process to
ensure that it does not deference a bad commit context structure
when determining if a CIL push is necessary or not. It does this by
grabing the CIL context lock exclusively, then dropping it before
pushing the CIL if necessary. This causes serialisation of all log
forces and pushes regardless of whether a force is necessary or not.
As a result fsync heavy workloads (like dbench) can be significantly
slower with delayed logging than without.
To avoid this penalty, copy the current sequence from the context to
the CIL structure when they are swapped. This allows us to do
unlocked checks on the current sequence without having to worry
about dereferencing context structures that may have already been
freed. Hence we can remove the CIL context locking in the forcing
code and only call into the push code if the current context matches
the sequence we need to force.
By passing the sequence into the push code, we can check the
sequence again once we have the CIL lock held exclusive and abort if
the sequence has already been pushed. This avoids a lock round-trip
and unnecessary CIL pushes when we have racing push calls.
The result is that the regression in dbench performance goes away -
this change improves dbench performance on a ramdisk from ~2100MB/s
to ~2500MB/s. This compares favourably to not using delayed logging
which retuns ~2500MB/s for the same workload.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit 1a387d3be2b30c90f20d49a3497a8fc0693a9d18
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:46:31 2010 +1000
xfs: dummy transactions should not dirty VFS state
When we need to cover the log, we issue dummy transactions to ensure
the current log tail is on disk. Unfortunately we currently use the
root inode in the dummy transaction, and the act of committing the
transaction dirties the inode at the VFS level.
As a result, the VFS writeback of the dirty inode will prevent the
filesystem from idling long enough for the log covering state
machine to complete. The state machine gets stuck in a loop issuing
new dummy transactions to cover the log and never makes progress.
To avoid this problem, the dummy transactions should not cause
externally visible state changes. To ensure this occurs, make sure
that dummy transactions log an unchanging field in the superblock as
it's state is never propagated outside the filesystem. This allows
the log covering state machine to complete successfully and the
filesystem now correctly enters a fully idle state about 90s after
the last modification was made.
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit 2fe33661fcd79d4c53022509f7223d526b5fa233
Author: Stuart Brodsky <sbrodsky at sgi.com>
Date: Tue Aug 24 11:46:05 2010 +1000
xfs: ensure f_ffree returned by statfs() is non-negative
Because of delayed updates to sb_icount field in the super block, it
is possible to allocate over maxicount number of inodes. This
causes the arithmetic to calculate a negative number of free inodes
in user commands like df or stat -f.
Since maxicount is a somewhat arbitrary number, a slight over
allocation is not critical but user commands should be displayed as
0 or greater and never go negative. To do this the value in the
stats buffer f_ffree is capped to never go negative.
[ Modified to use max_t as per Christoph's comment. ]
Signed-off-by: Stu Brodsky <sbrodsky at sgi.com>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
commit efceab1d563153a2b1a6e7d35376241a48126989
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:44:56 2010 +1000
xfs: handle negative wbc->nr_to_write during sync writeback
During data integrity (WB_SYNC_ALL) writeback, wbc->nr_to_write will
go negative on inodes with more than 1024 dirty pages due to
implementation details of write_cache_pages(). Currently XFS will
abort page clustering in writeback once nr_to_write drops below
zero, and so for data integrity writeback we will do very
inefficient page at a time allocation and IO submission for inodes
with large numbers of dirty pages.
Fix this by only aborting the page clustering code when
wbc->nr_to_write is negative and the sync mode is WB_SYNC_NONE.
Cc: <stable at kernel.org>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit 4536f2ad8b330453d7ebec0746c4374eadd649b1
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:42:30 2010 +1000
xfs: fix untrusted inode number lookup
Commit 7124fe0a5b619d65b739477b3b55a20bf805b06d ("xfs: validate untrusted inode
numbers during lookup") changes the inode lookup code to do btree lookups for
untrusted inode numbers. This change made an invalid assumption about the
alignment of inodes and hence incorrectly calculated the first inode in the
cluster. As a result, some inode numbers were being incorrectly considered
invalid when they were actually valid.
The issue was not picked up by the xfstests suite because it always runs fsr
and dump (the two utilities that utilise the bulkstat interface) on cache hot
inodes and hence the lookup code in the cold cache path was not sufficiently
exercised to uncover this intermittent problem.
Fix the issue by relaxing the btree lookup criteria and then checking if the
record returned contains the inode number we are lookup for. If it we get an
incorrect record, then the inode number is invalid.
Cc: <stable at kernel.org>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit 5b3eed756cd37255cad1181bd86bfd0977e97953
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:42:41 2010 +1000
xfs: ensure we mark all inodes in a freed cluster XFS_ISTALE
Under heavy load parallel metadata loads (e.g. dbench), we can fail
to mark all the inodes in a cluster being freed as XFS_ISTALE as we
skip inodes we cannot get the XFS_ILOCK_EXCL or the flush lock on.
When this happens and the inode cluster buffer has already been
marked stale and freed, inode reclaim can try to write the inode out
as it is dirty and not marked stale. This can result in writing th
metadata to an freed extent, or in the case it has already
been overwritten trigger a magic number check failure and return an
EUCLEAN error such as:
Filesystem "ram0": inode 0x442ba1 background reclaim flush failed with 117
Fix this by ensuring that we hoover up all in memory inodes in the
cluster and mark them XFS_ISTALE when freeing the cluster.
Cc: <stable at kernel.org>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit d17c701ce6a548a92f7f8a3cec20299465f36ee3
Author: Dave Chinner <dchinner at redhat.com>
Date: Tue Aug 24 11:42:52 2010 +1000
xfs: unlock items before allowing the CIL to commit
When we commit a transaction using delayed logging, we need to
unlock the items in the transaciton before we unlock the CIL context
and allow it to be checkpointed. If we unlock them after we release
the CIl context lock, the CIL can checkpoint and complete before
we free the log items. This breaks stale buffer item unlock and
unpin processing as there is an implicit assumption that the unlock
will occur before the unpin.
Also, some log items need to store the LSN of the transaction commit
in the item (inodes and EFIs) and so can race with other transaction
completions if we don't prevent the CIL from checkpointing before
the unlock occurs.
Cc: <stable at kernel.org>
Signed-off-by: Dave Chinner <dchinner at redhat.com>
Reviewed-by: Christoph Hellwig <hch at lst.de>
commit 5f248c9c251c60af3403902b26e08de43964ea0b
Merge: f6cec0ae58c17522a7bc4e2f39dae19f199ab534 dca332528bc69e05f67161e1ed59929633d5e63d
Author: Linus Torvalds <torvalds at linux-foundation.org>
Date: Tue Aug 10 11:26:52 2010 -0700
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (96 commits)
no need for list_for_each_entry_safe()/resetting with superblock list
Fix sget() race with failing mount
vfs: don't hold s_umount over close_bdev_exclusive() call
sysv: do not mark superblock dirty on remount
sysv: do not mark superblock dirty on mount
btrfs: remove junk sb_dirt change
BFS: clean up the superblock usage
AFFS: wait for sb synchronization when needed
AFFS: clean up dirty flag usage
cifs: truncate fallout
mbcache: fix shrinker function return value
mbcache: Remove unused features
add f_flags to struct statfs(64)
pass a struct path to vfs_statfs
update VFS documentation for method changes.
All filesystems that need invalidate_inode_buffers() are doing that explicitly
convert remaining ->clear_inode() to ->evict_inode()
Make ->drop_inode() just return whether inode needs to be dropped
fs/inode.c:clear_inode() is gone
fs/inode.c:evict() doesn't care about delete vs. non-delete paths now
...
Fix up trivial conflicts in fs/nilfs2/super.c
commit b57922d97fd6f79b6dbe6db0c4fd30d219fa08c1
Author: Al Viro <viro at zeniv.linux.org.uk>
Date: Mon Jun 7 14:34:48 2010 -0400
convert remaining ->clear_inode() to ->evict_inode()
Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
commit a4ffdde6e56fdf8c34ddadc2674d6eb978083369
Author: Al Viro <viro at zeniv.linux.org.uk>
Date: Wed Jun 2 17:38:30 2010 -0400
simplify checks for I_CLEAR/I_FREEING
add I_CLEAR instead of replacing I_FREEING with it. I_CLEAR is
equivalent to I_FREEING for almost all code looking at either;
it's there to keep track of having called clear_inode() exactly
once per inode lifetime, at some point after having set I_FREEING.
I_CLEAR and I_FREEING never get set at the same time with the
current code, so we can switch to setting i_flags to I_FREEING | I_CLEAR
instead of I_CLEAR without loss of information. As the result of
such change, checks become simpler and the amount of code that needs
to know about I_CLEAR shrinks a lot.
Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
commit fa9b227e9019ebaeeb06224ba531a490f91144b3
Author: Christoph Hellwig <hch at infradead.org>
Date: Mon Jun 14 05:17:31 2010 -0400
xfs: new truncate sequence
Convert XFS to the new truncate sequence. We still can have errors after
updating the file size in xfs_setattr, but these are real I/O errors and lead
to a transaction abort and filesystem shutdown, so they are not an issue.
Errors from ->write_begin and write_end can now be handled correctly because
we can actually get rid of the delalloc extents while previous the buffer
state was stipped in block_invalidatepage.
There is still no error handling for ->direct_IO, because doing so will need
some major restructuring given that we only have the iolock shared and do not
hold i_mutex at all. Fortunately leaving the normally allocated blocks behind
there is not a major issue and this will get cleaned up by xfs_free_eofblock
later.
Note: the patch is against Al's vfs.git tree as that contains the nessecary
preparations. I'd prefer to get it applied there so that we can get some
testing in linux-next.
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
commit 155130a4f7848b1aac439cab6bda1a175507c71c
Author: Christoph Hellwig <hch at lst.de>
Date: Fri Jun 4 11:29:58 2010 +0200
get rid of block_write_begin_newtrunc
Move the call to vmtruncate to get rid of accessive blocks to the callers
in preparation of the new truncate sequence and rename the non-truncating
version to block_write_begin.
While we're at it also remove several unused arguments to block_write_begin.
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
commit eafdc7d190a944c755a9fe68573c193e6e0217e7
Author: Christoph Hellwig <hch at lst.de>
Date: Fri Jun 4 11:29:53 2010 +0200
sort out blockdev_direct_IO variants
Move the call to vmtruncate to get rid of accessive blocks to the callers
in prepearation of the new truncate calling sequence. This was only done
for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant
was not needed anyway. Get rid of blockdev_direct_IO_no_locking and
its _newtrunc variant while at it as just opencoding the two additional
paramters is shorted than the name suffix.
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Al Viro <viro at zeniv.linux.org.uk>
commit 90e0c225968f0878e090c7ff3f88323973476cee
Merge: 938a73b959cf77aadc41bded3bf416b618aa20b3 5f11e6a44059f728dddd8d0dbe5b4368ea93575b
Author: Linus Torvalds <torvalds at linux-foundation.org>
Date: Sat Aug 7 12:57:07 2010 -0700
Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6
* 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs-2.6:
ext3: Fix dirtying of journalled buffers in data=journal mode
ext3: default to ordered mode
quota: Use mark_inode_dirty_sync instead of mark_inode_dirty
quota: Change quota error message to print out disk and function name
MAINTAINERS: Update entries of ext2 and ext3
MAINTAINERS: Update address of Andreas Dilger
ext3: Avoid filesystem corruption after a crash under heavy delete load
ext3: remove vestiges of nobh support
ext3: Fix set but unused variables
quota: clean up quota active checks
quota: Clean up the namespace in dqblk_xfs.h
quota: check quota reservation on remove_dquot_ref
commit ade7ce31c22e961dfbe1a6d57fd362c90c187cbd
Author: Christoph Hellwig <hch at lst.de>
Date: Fri Jun 4 10:56:01 2010 +0200
quota: Clean up the namespace in dqblk_xfs.h
Almost all identifiers use the FS_* namespace, so rename the missing few
XFS_* ones to FS_* as well. Without this some people might get upset
about having too many XFS names in generic code.
Acked-by: Steven Whitehouse <swhiteho at redhat.com>
Signed-off-by: Christoph Hellwig <hch at lst.de>
Signed-off-by: Jan Kara <jack at suse.cz>
-----------------------------------------------------------------------
Summary of changes:
fs/xfs/linux-2.6/xfs_aops.c | 75 +++++++++---
fs/xfs/linux-2.6/xfs_iops.c | 20 +---
fs/xfs/linux-2.6/xfs_linux.h | 2 -
fs/xfs/linux-2.6/xfs_quotaops.c | 10 +-
fs/xfs/linux-2.6/xfs_super.c | 17 ++-
fs/xfs/linux-2.6/xfs_sync.c | 42 +------
fs/xfs/linux-2.6/xfs_trace.h | 2 +-
fs/xfs/quota/xfs_qm_syscalls.c | 32 +++---
fs/xfs/xfs_fsops.c | 31 +++--
fs/xfs/xfs_fsops.h | 2 +-
fs/xfs/xfs_ialloc.c | 16 ++-
fs/xfs/xfs_inode.c | 49 ++++----
fs/xfs/xfs_log.c | 7 +-
fs/xfs/xfs_log_cil.c | 263 +++++++++++++++++++++++----------------
fs/xfs/xfs_log_priv.h | 13 ++-
fs/xfs/xfs_trans.c | 5 +-
fs/xfs/xfs_trans_priv.h | 3 +-
fs/xfs/xfs_vnodeops.c | 38 +++---
18 files changed, 350 insertions(+), 277 deletions(-)
hooks/post-receive
--
XFS development tree
More information about the xfs
mailing list