At Sat, 8 Mar 2008 11:21:24 +1100,
David Chinner wrote:
> Interesting. I'm not going to comment on the code, just the overall
> design and implementation.
Thanks for comments.
> Problems:
> - local -> extent conversion occurs at copy-in time, not writeback
> time, so using the normal read/write paths through ->get_blocks()
> will fail here in xfs_bmapi():
>
> 4793 if (XFS_IFORK_FORMAT(ip, whichfork) == XFS_DINODE_FMT_LOCAL) {
> 4794 >>>>>>>> ASSERT(wr && tp);
> 4795 if ((error = xfs_bmap_local_to_extents(tp, ip,
> 4796 firstblock, total, &logflags,
> whichfork)))
> 4797 goto error0;
> 4798 }
>
> because on a normal read or write (delayed allocation) we
> are not doing allocation and hence do not have an open
> transaction the first time we come through here. Just
> avoiding this conversion and returning zero maps if we are
> not writing will not help in the delayed allocation case.
>
>
> I note that you hacked around this by special casing the inline
> format ->get_blocks() callout to copy the data into the page and
> marking it mapped and uptodate. I think this is the wrong approach
> and is not the right layer at which to make this distinction - the special
> casing needs to be done at higher layers, not in the block mapping
> function.
>
> I think for inline data, we'd do best to special case this as high
> up the read/write paths as possible. e.g. for read() type operations
> intercept in xfs_read() and just do what we need to do there for
> populating the single page cache page. For write, we should let it
> go through the normal delayed allocation mechanisms, only converting
> to local format during ->writepage() if there's a single block
> extent and it fits in the data fork. This also handles the truncate
> case nicely.
I was vaguely aware of layering violation, but took my dirty&quick
approach as the largest concern at that time was whether file inlining
gives enough performance gain.
With your suggestion, I would be able to implement better.
Unfortunately, I lack time to do so.
> > Some random notes and the patch itself follows.
> >
> > Inlined file data are written from xfs_page_state_convert().
> > The xfs_trans related operations in that function is to get inode
> > written on disk and isn't for crash consistency.
>
> Which is the exact opposite of what they are supposed to be used for.
> Given that the next thing that happens after data write in the writeback path
> is ->write_inode(), forcing the inode into the log for pure data changes
> is unnecessary. We just need to format the data into the inode during
> data writeback.
It seemed that setting the XFS_ILOG_DDATA bit in
ip->i_itemp->ili_format.ilf_fields was necessary for xfs_iflush_fork,
and I wasn't aware of other solutions.
> > xfs_bmap_local_to_extents() has been modified to work with file data,
> > but logging isn't implemented. A machine crash can cause data
> > corruption.
>
> There are two ways to do inline->extent safely from a crash recovery
> perspective.
>
> Method 1: Use an Intent/Done transaction pair
> Method 2: Log the data
Thanks for explanation. It doesn't sound so complicated as I've
imagined.
--
IWAMOTO Toshihiro
|