[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Snapshots



[on XFS with LVM snapshots...]
> man xfs_freeze
> 
> Or if you have the correct LVM patches in place then the 
> kernel does it
> all for you.
> 
I've been using XFS with snapshots off-and-on for quite a while, mostly with
good results (currently using XFS CVS from March 19th and LVM 1.0.3, both
with some patches).  That said, the comment with the last change to
xfs_fs_freeze in XFS's CVS tree still worries me:
---Begin TAKE from Eric Sandeen---
There are still some problems with xfs_freeze, but this solves
part of the problem.  Even with this change, people are still 
seeing consistency problems on a frozen filesystem, so don't
trust xfs_freeze just yet.

Date:  Mon Feb  4 08:39:30 PST 2002
Workarea:
stout.americas.sgi.com:/localhome/eric/2.4.x-xfs/workarea-reallyclean

The following file(s) were checked into:
  bonnie.engr.sgi.com:/isms/slinx/2.4.x-xfs


Modid:  2.4.x-xfs:slinx:110921a
linux/fs/xfs/xfs_fsops.c - 1.73
	- Remove xfs_iflush_all() from xfs_fs_freeze().  This doesn't play
well with
	  Linux, leaving us with dentries no longer connected to xfs_inodes.
---End TAKE ----

Now, I haven't seen consistency problems.  I have seen lvcreate (the LVM
snapshot creation command) get stuck in D state with the VFS lock patch and
XFS, without the [LVM] VFS lock patch and surrounded by xfs_freeze -f,
xfs_freeze -u, and with the VFS lock patch and surrounded by xfs_freeze -f.
Certainly not every time, but under heavy load and multiple snapshots
already existing on the XFS volume we were making a new snapshot, we'd hit
it eventually.

That led to the first attached patch, no_freeze.patch.  It's not really a
fix, just a kludge to make sure some things don't get stuck because of the
freeze.  An unused process flag is used to set PF_NO_FREEZE, which is a
signal to xfs_check_frozen to let things through.  By using the flag to
protect fsync_dev() and kupdate(), the situation was greatly improved.  (The
patch also contains a protection against writing the log to a read-only
device--an unrelated problem).  The no_freeze_lockfs_patch (second attached
patch) protects fsync_dev_lockfs, in case you're using LVM's VFS lock patch.

After those patches were applied, I still saw a couple situations where
xfs_freeze -f and xfs_freeze -u froze up (not at the same time :->).  In
both cases, it was an xfs_check_frozen() descending from
xfs_unmountfs_writesb() calls.  Can't have the freeze stopping xfs_freeze
calls, so I added PF_NO_FREEZE protection inside xfs_fs_freeze and
xfs_fs_thaw (also in the second patch).  That solved that problem.

Finally, I saw a few cases under load (mixed smb/nfs) where lvcreate would
consume 98% of the CPU and never complete.  In that case, lvcreate wasn't
stuck itself, but was looping endlessly inside write_unlocked_buffers().
The cause seemed to be nfsd, which was stuck in xfs_check_frozen()
descending from write_buffer_delay().  It wasn't from the original write nfs
was trying to process, but came down from balance_dirty() calling
write_some_buffers().  I figured balance_dirty() should be allowed to write
buffers, so I protected that call from freeze as well (also in the second
patch).  Haven't seen a lockup since, but I'm not convinced that another one
isn't out there.

So that's my experience with XFS and LVM snapshots.  YMMV.

Dale Stephenson
steph@snapserver.com

Attachment: no_freeze.patch
Description: Binary data

Attachment: no_freeze_lockfs.patch
Description: Binary data