Bug 259 - xfs_freeze gets stuck in "D" state in the function "down"
: xfs_freeze gets stuck in "D" state in the function "down"
Status: RESOLVED FIXED
Product: XFS
Classification: Unclassified
Component: XFS kernel code
: Current
: All All
: critical
: ---
Assigned To: XFS power people
:
:
:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2003-07-03 14:10 CDT by Murthy Kambhampaty
Modified: 2004-02-16 07:39 CST (History)
0 users

See Also:


Attachments

Note You need to log in before you can comment on or make changes to this bug.
Description Murthy Kambhampaty 2003-07-03 14:10:10 CDT
A script calls xfs_freeze on a given filesystem, prior to taking a snapshot.
This script is run automatically, every three hours. After running successfully
for several "iterations", xfs_freeze eventually gets stuck in "D" state. "ps -p
<PID of xfs_freeze> -o wchan,args" returns:
WCHAN  COMMAND
down   /usr/sbin/xfs_freeze -f <fs2freeze>
A hard reset is needed to recover; until then, the filesystem on which
xfs_freeze is run is frozen, but the rest of the system is available.

This behavior occurs in recent kernels (with the xfssyncd code; in this
particular case CVS of June 19th). In kernels (e.g., XFS Release 1.2) without
the xfssyncd code, lvcreate would get stuck in xfs_check_frozen (see
http://marc.theaimsgroup.com/?l=linux-xfs&m=105277405929107&w=2), and an
xfs_freeze -u <fs> would recover the system (though it did ruin the snapshot).

The key to replicating the problem seems to be in the number of times the script
runs, not on the load at the time the script runs. (The system had not been
heavily loaded within days of the occurence.) This behavior occurred more
frequently in versions close to the date the xfssyncd code went in -- for
example, on a kernel I compiled on June 10th, from CVS update of the same day, I
was able to reproduce the behaviour on the second manual iteration of the
script. A version of this script is attached at:
http://marc.theaimsgroup.com/?l=postgresql-admin&m=104932938526084&w=2
Comment 1 Murthy Kambhampaty 2003-10-09 07:37:05 CDT
Solved?

I updated linux-2.4-xfs from CVS 9/30/2003 13:28; the failure pattern reported
has not occurred after a week of running the test script hourly, so I'm
cautiously optimistic that the problem went away.
Comment 2 Christoph Hellwig 2004-01-02 05:53:29 CST
Marking as fixed.
Comment 3 Christoph Hellwig 2004-02-16 05:39:15 CST
Really closing now as it's not reproducible