xfs
[Top] [All Lists]

Re: fs corruption exposed by "xfs: increase prealloc size to double that

To: Al Viro <viro@xxxxxxxxxxxxxxxxxx>
Subject: Re: fs corruption exposed by "xfs: increase prealloc size to double that of the previous extent"
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Tue, 18 Mar 2014 12:16:32 +1100
Cc: linux-fsdevel@xxxxxxxxxxxxxxx, Brian Foster <bfoster@xxxxxxxxxx>, xfs@xxxxxxxxxxx, Dave Chinner <dchinner@xxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140317024305.GD7072@dastard>
References: <20140315210216.GP18016@xxxxxxxxxxxxxxxxxx> <20140316022105.GQ18016@xxxxxxxxxxxxxxxxxx> <20140316023931.GR18016@xxxxxxxxxxxxxxxxxx> <20140316205624.GS18016@xxxxxxxxxxxxxxxxxx> <20140317013638.GB7072@dastard> <20140317024305.GD7072@dastard>
User-agent: Mutt/1.5.21 (2010-09-15)
On Mon, Mar 17, 2014 at 01:43:05PM +1100, Dave Chinner wrote:
> On Mon, Mar 17, 2014 at 12:36:39PM +1100, Dave Chinner wrote:
> > On Sun, Mar 16, 2014 at 08:56:24PM +0000, Al Viro wrote:
> > > Looks like __xfs_get_blocks() is broken in that respect - I'm definitely
> > > seeing O_DIRECT write() crossing the EOF calling it *once*, getting
> > > ->b_size set to a lot more than what remains until EOF and buffer_head
> > > not getting BH_New on it.
> > 
> > XFS should never do that. It does not mix existing block mappings
> > with newly allocated mappings, and so the newly allocated region
> > beyond EOF should always give a "xfs_get_blocks_alloc" trace, and
> > when that is emitted we always set buffer_new()....
> 
> And so XFS is not mixing extent types, nor is it seeing newly
> allocated blocks beyond EOF, and so it sees it as a single
> extent full of valid data....
> 
> OK, now I understand where the bad mapping problem is coming from,
> and why a changing in speculative prealloc size might be seen as the
> "cause" by a bisect. And it is a pre-existing condition that has
> been there for years - I'd think that testing with
> allocsize=<something large> would expose it easily on old kernels.
> 
> Al, you're right that the code that calculates the mapping size
> needs to split mappings that extend across EOF for direct IO.  I'll
> cook up a patch to fix it after lunch.

Except that this now leaves stale, unreferenced delalloc extents on
the inode, so there's some other problem being masked by this
behaviour and so generic/270 now assert fails. I'll have to look
further.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>