Cleancache support in XFS

James Dingwall james.dingwall at zynstra.com
Thu May 2 03:24:51 CDT 2013


Dave Chinner wrote:
> On Wed, May 01, 2013 at 11:20:44AM -0500, Ben Myers wrote:
>> Hi James,
>>
>> On Wed, May 01, 2013 at 01:39:09PM +0100, James Dingwall wrote:
>>> In reference to: http://oss.sgi.com/archives/xfs/2012-05/msg00046.html
>>>
>>> $ grep -r cleancache fs/xfs
>>> on the 3.9 kernel source suggests that no patch was submitted to
>>> enable cleancache for the XFS filesystem.  Since it was suggested
>>> that this could be a one liner I've had a go and my first effort is
>>> inline below.  While this seems to compile OK I have no experience
>>> in filesystems so I would appreciate it if anyone can point out that
>>> it is obviously wrong and likely to eat my data before I try booting
>>> the kernel.
>>>
>>> If it seems a reasonable attempt what would be the best way to check
>>> that it isn't doing nasty things?
>> Hrm.. Looks like there is a doc in Documentation/vm/cleancache.txt which
>> includes a list of attributes the filesystem needs to have to work properly
>> with cleancache.
> So, those points are:
I had started to look at these too but I feel very out of my depth! I 
had similar conclusions to what Dave wrote but I don't think my thoughts 
should carry very much (any) weight.  Anyway I gambled and booted my xen 
domU with this patch and so far so good...  xen top shows that tmem is 
now being used where previously it wasn't.  I'll try running the 
xfstests at the weekend after a couple more days up time to see what 
happens.
>
> | Some points for a filesystem to consider:
> |
> | - The FS should be block-device-based (e.g. a ram-based FS such
> |  as tmpfs should not enable cleancache)
>
> OK.
>
> |- To ensure coherency/correctness, the FS must ensure that all
> |  file removal or truncation operations either go through VFS or
> |  add hooks to do the equivalent cleancache "invalidate" operations
>
> There be dragons - do all the XFS ioctls do the right thing?
vfs_unlink() calls *dir->i_op->unlink, in xfs_iops.c for S_IFDIR there is:

if (xfs_sb_version_hasasciici(&XFS_M(inode->i_sb)->m_sb))
         inode->i_op = &xfs_dir_ci_inode_operations;
else
         inode->i_op = &xfs_dir_inode_operations;

where .unlink in xfs_dir_inode_operations is xfs_vn_unlink() in either
condition.

I can't work out how to follow the vfs_truncate() in to the filesystem
code and perhaps there are other paths that would lead to file removal.

>
> |- To ensure coherency/correctness, either inode numbers must
> |  be unique across the lifetime of the on-disk file OR the
> |  FS must provide an "encode_fh" function.
>
> Ok.
>
> |- The FS must call the VFS superblock alloc and deactivate routines
> |  or add hooks to do the equivalent cleancache calls done there.
>
> OK.
>
> |- To maximize performance, all pages fetched from the FS should
> |  go through the do_mpag_readpage routine or the FS should add
> |  hooks to do the equivalent (cf. btrfs)
>
> xfs uses mpage_readpages() so should be fine.
I think there is a cleancache documentation bug since no other fs calls 
do_mpage_readpage().
>
> |- Currently, the FS blocksize must be the same as PAGESIZE.  This
> |  is not an architectural restriction, but no backends currently
> |  support anything different.
>
> Which means that we need hooks in the mount path to determine if
> this is the case or not. I note that neither ext3/ext4 do this check
> so I can't determine why this restriction is mentioned, and I'm not
> sure if it has any relevance to btrfs.
>
> IOWs, I'd like to know why this restriction exists - what does
> cleancache care about how the filesystem maps blocks to the page in
> the page cache - any way the filesystem does this it uses
> page->private to hide this fact from the page cache....
+ Konrad (cleancache maintainer) for any opinion.
>
> |- A clustered FS should invoke the "shared_init_fs" cleancache
> |  hook to get best performance for some backends.
>
> Not a problem.
>
> So there's a couple of things that need to be explained and
> explored, and a bunch of testing to be done....
>
James



More information about the xfs mailing list