xfs
[Top] [All Lists]

Re: [PATCH 06/13] xfs: xfs_sync_data is redundant.

To: Brian Foster <bfoster@xxxxxxxxxx>
Subject: Re: [PATCH 06/13] xfs: xfs_sync_data is redundant.
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 2 Oct 2012 10:10:22 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <5069F9B0.50804@xxxxxxxxxx>
References: <1348807485-20165-1-git-send-email-david@xxxxxxxxxxxxx> <1348807485-20165-7-git-send-email-david@xxxxxxxxxxxxx> <5069F9B0.50804@xxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
Hi, Brian.

On Mon, Oct 01, 2012 at 04:14:40PM -0400, Brian Foster wrote:
> Warning: This message has had one or more attachments removed
> Warning: (273.out.bad).
> Warning: Please read the "boprocket-Attachment-Warning.txt" attachment(s) for 
> more information.

Which says:

> At Mon Oct  1 20:14:58 2012 the virus scanner said:                
>    MailScanner: Attempt to hide real filename extension (273.out.bad)

Looks like your mailer did something wrong with the attachment....


> Heads up... I was doing some testing against my eofblocks set rebased
> against this patchset and I'm reproducing a new 273 failure. The failure
> bisects down to this patch.
> 
> With the bisection, I'm running xfs top of tree plus the following patch:
> 
> xfs: only update the last_sync_lsn when a transaction completes
> 
> ... and patches 1-6 of this set on top of that. i.e.:
> 
> xfs: xfs_sync_data is redundant.
> xfs: Bring some sanity to log unmounting
> xfs: sync work is now only periodic log work
> xfs: don't run the sync work if the filesystem is read-only
> xfs: rationalise xfs_mount_wq users
> xfs: xfs_syncd_stop must die
> xfs: only update the last_sync_lsn when a transaction completes
> xfs: Make inode32 a remountable option
> 
> This is on a 16p (according to /proc/cpuinfo) x86-64 system with 32GB
> RAM. The test and scratch volumes are both 500GB lvm volumes on top of a
> hardware raid.
> I haven't looked into this at all yet but I wanted to
> drop it on the list for now. The 273 output is attached.

I bet you had writes fail with ENOSPC - 201 * 426
= 85626 files of 8k each, that gives 685MB. When the test is
running, I see upwards of 1.5GB of space consumed, which then slowly
drops again as data files are closed and data is written.

Some of that space is specualtive preallocation (4k per file, I
think), but also a significant amount of it is metadata reservation
for delayed allocation (4 blocks per file, IIRC). If I've only got
2GB RAM on my machine, then writeback starts at 200MB written, and
so well before the fs runs out of space the metadata reservations
are being released.

I just upped the VM to 8GB RAM, and immediately I see the test
starting to fail. And this is in 273.full:

cp: cannot create regular file `/mnt/scratch/sub_198/origin/file_141': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_198/origin/file_142': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_198/origin/file_1cp: cannot 
create regular filcp: cannot create regular file 
`/mnt/scratch/sub_198/origin/file_147': No space left on device
cp: cannot create regular file `/mnt/scratch/sub_198/origin/file_1cp: cannot 
create regular filcp: writing `/mnt/scratch/sub_198/origin/file_149': No space 
left cp: writing `/mnt/scratch/sub_156/origin/file_275': No space left on device
cp: failed to extencp: writing `/mnt/scratch/sub_198/origin/file_150': No space 
left cp: writing `/mnt/scratch/sub_156/origin/file_276': No space left on device
cp: failed to extencp: cannot create regular file 
`/mnt/scratch/sub_124/origin/file_3cp: cannot create regular filcp: writing 
`/mnt/scratch/sub_124/origin/file_378': No space left cp: writing 
`/mnt/scratch/sub_173/origin/file_250': No space left on device
cp: failed to extencp: writing `/mnt/scratch/sub_124/origin/file_379': No space 
left cp: cannot create regular file `/mnt/scratch/sub_173/origin/file_2cp: 
cannot create regular file `/mnt/scratch/sub_134/origin/file_337': No space 
left on device
cp: cannot create regular filcp: cannot create regular filcp: writing 
`/mnt/scratch/sub_159/origin/file_307': No space left on device
cp: failed to extend `/mnt/scratch/sub_159/origin/file_307': No space left on 
device
cp: writing `/mnt/scratch/sub_159/origin/file_308': No space left on device
cp: failed to extend `/mnt/scratch/sub_159/origin/file_308': No space left on 
device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_309': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_310': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_311': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_312': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_313': No space 
left on device
cp: cannot create regular file `/mnt/scratch/sub_159/origin/file_314': No space 
left on device
.....

So, turning off speculative preallocation via the allocsize mount
option doesn't fix the problem. IOWs, the problem is too much active
metadata reservation.  If we are caching 685MB, that's less than the
writeback thresholds of a large memory machine, so the metadata
reservations won't be trimmed at all until ENOSPC actually occurs
and writeback is then started.

The problem is that writeback_inodes_sb_if_idle() does not block if
there is already writeback in progress, so the callers just keep
hitting ENOSPC rather than being throttled waiting for delalloc
conversion.

The patch below should fix this - it changes xfs_flush_inodes() to
us sync_inodes_sb(), which will issue IO and block waiting for it to
complete, just like xfs_flush_inodes() used to. Indeed, it passes
again on my VM with 8GB RAM....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

xfs: make inode writeback at ENOSPC blocking.

From: Dave Chinner <dchinner@xxxxxxxxxx>

writeback_inodes_sb_if_idle() is not sufficient to trigger delalloc
conversion fast enough to prevent spurious ENOSPC whent here are
hundreds of writers, thousands of small files and GBs of free RAM.
Change this to use sync_sb_inodes() to block callers while we wait
for writeback like the previous xfs_flush_inodes implementation did.

Signed-off-by: Dave Chinner <dchinner@xxxxxxxxxx>
---
 fs/xfs/xfs_inode.h |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/xfs/xfs_inode.h b/fs/xfs/xfs_inode.h
index da69c18..0ec7a46 100644
--- a/fs/xfs/xfs_inode.h
+++ b/fs/xfs/xfs_inode.h
@@ -294,7 +294,7 @@ xfs_new_eof(struct xfs_inode *ip, xfs_fsize_t new_size)
 static inline void
 xfs_flush_inodes(struct xfs_inode *ip)
 {
-       writeback_inodes_sb_if_idle(VFS_I(ip)->i_sb, WB_REASON_FS_FREE_SPACE);
+       sync_inodes_sb(VFS_I(ip)->i_sb);
 }
 
 /*

<Prev in Thread] Current Thread [Next in Thread>