xfs
[Top] [All Lists]

Re: [PATCH] prototype file data inode inlining

To: David Chinner <dgc@xxxxxxx>
Subject: Re: [PATCH] prototype file data inode inlining
From: IWAMOTO Toshihiro <iwamoto@xxxxxxxxxxxxx>
Date: Tue, 11 Mar 2008 17:30:38 +0900
Cc: IWAMOTO Toshihiro <iwamoto@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <20080308002124.GN155407@xxxxxxx>
References: <20080307093411.4B1912DC9B2@xxxxxxxxxxxxxxxxxx> <20080308002124.GN155407@xxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Wanderlust/2.15.5 (Almost Unreal) SEMI/1.14.6 (Maruoka) FLIM/1.14.9 (Gojō) APEL/10.7 Emacs/22.1 (x86_64-pc-linux-gnu) MULE/5.0 (SAKAKI)
At Sat, 8 Mar 2008 11:21:24 +1100,
David Chinner wrote:
> Interesting. I'm not going to comment on the code, just the overall
> design and implementation.

Thanks for comments.

> Problems:

>       - local -> extent conversion occurs at copy-in time, not writeback
>         time, so using the normal read/write paths through ->get_blocks()
>         will fail here in xfs_bmapi():
> 
>    4793         if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
>    4794 >>>>>>>>        ASSERT(wr && tp);
>    4795                 if ((error = xfs_bmap_local_to_extents(tp, ip,
>    4796                                 firstblock, total, &logflags, 
> whichfork)))
>    4797                         goto error0;
>    4798         }
> 
>         because on a normal read or write (delayed allocation) we
>         are not doing allocation and hence do not have an open
>         transaction the first time we come through here. Just
>         avoiding this conversion and returning zero maps if we are
>         not writing will not help in the delayed allocation case.
>       
> 
> I note that you hacked around this by special casing the inline
> format ->get_blocks() callout to copy the data into the page and
> marking it mapped and uptodate. I think this is the wrong approach
> and is not the right layer at which to make this distinction - the special
> casing needs to be done at higher layers, not in the block mapping
> function.
> 
> I think for inline data, we'd do best to special case this as high
> up the read/write paths as possible. e.g. for read() type operations
> intercept in xfs_read() and just do what we need to do there for
> populating the single page cache page. For write, we should let it
> go through the normal delayed allocation mechanisms, only converting
> to local format during ->writepage() if there's a single block
> extent and it fits in the data fork. This also handles the truncate
> case nicely.

I was vaguely aware of layering violation, but took my dirty&quick
approach as the largest concern at that time was whether file inlining
gives enough performance gain.

With your suggestion, I would be able to implement better.
Unfortunately, I lack time to do so.

> > Some random notes and the patch itself follows.
> > 
> > Inlined file data are written from xfs_page_state_convert().
> > The xfs_trans related operations in that function is to get inode
> > written on disk and isn't for crash consistency.
> 
> Which is the exact opposite of what they are supposed to be used for.
> Given that the next thing that happens after data write in the writeback path
> is ->write_inode(), forcing the inode into the log for pure data changes
> is unnecessary. We just need to format the data into the inode during
> data writeback.

It seemed that setting the XFS_ILOG_DDATA bit in
ip->i_itemp->ili_format.ilf_fields was necessary for xfs_iflush_fork,
and I wasn't aware of other solutions.


> > xfs_bmap_local_to_extents() has been modified to work with file data,
> > but logging isn't implemented.  A machine crash can cause data
> > corruption.
> 
> There are two ways to do inline->extent safely from a crash recovery
> perspective. 
> 
> Method 1: Use an Intent/Done transaction pair

> Method 2: Log the data

Thanks for explanation.  It doesn't sound so complicated as I've
imagined.

--
IWAMOTO Toshihiro


<Prev in Thread] Current Thread [Next in Thread>