Bugzilla – Bug 259
xfs_freeze gets stuck in "D" state in the function "down"
Last modified: 2004-02-16 07:39:15 CST
A script calls xfs_freeze on a given filesystem, prior to taking a snapshot. This script is run automatically, every three hours. After running successfully for several "iterations", xfs_freeze eventually gets stuck in "D" state. "ps -p <PID of xfs_freeze> -o wchan,args" returns: WCHAN COMMAND down /usr/sbin/xfs_freeze -f <fs2freeze> A hard reset is needed to recover; until then, the filesystem on which xfs_freeze is run is frozen, but the rest of the system is available. This behavior occurs in recent kernels (with the xfssyncd code; in this particular case CVS of June 19th). In kernels (e.g., XFS Release 1.2) without the xfssyncd code, lvcreate would get stuck in xfs_check_frozen (see http://marc.theaimsgroup.com/?l=linux-xfs&m=105277405929107&w=2), and an xfs_freeze -u <fs> would recover the system (though it did ruin the snapshot). The key to replicating the problem seems to be in the number of times the script runs, not on the load at the time the script runs. (The system had not been heavily loaded within days of the occurence.) This behavior occurred more frequently in versions close to the date the xfssyncd code went in -- for example, on a kernel I compiled on June 10th, from CVS update of the same day, I was able to reproduce the behaviour on the second manual iteration of the script. A version of this script is attached at: http://marc.theaimsgroup.com/?l=postgresql-admin&m=104932938526084&w=2
Solved? I updated linux-2.4-xfs from CVS 9/30/2003 13:28; the failure pattern reported has not occurred after a week of running the test script hourly, so I'm cautiously optimistic that the problem went away.
Marking as fixed.
Really closing now as it's not reproducible