http://oss.sgi.com/bugzilla/show_bug.cgi?id=259
Summary: xfs_freeze gets stuck in "D" state in the function
"down"
Product: Linux XFS
Version: Current
Platform: All
OS/Version: All
Status: NEW
Severity: critical
Priority: High
Component: XFS kernel code
AssignedTo: xfs-master@xxxxxxxxxxx
ReportedBy: murthy.kambhampaty@xxxxxxxxx
A script calls xfs_freeze on a given filesystem, prior to taking a snapshot.
This script is run automatically, every three hours. After running successfully
for several "iterations", xfs_freeze eventually gets stuck in "D" state. "ps -p
<PID of xfs_freeze> -o wchan,args" returns:
WCHAN COMMAND
down /usr/sbin/xfs_freeze -f <fs2freeze>
A hard reset is needed to recover; until then, the filesystem on which
xfs_freeze is run is frozen, but the rest of the system is available.
This behavior occurs in recent kernels (with the xfssyncd code; in this
particular case CVS of June 19th). In kernels (e.g., XFS Release 1.2) without
the xfssyncd code, lvcreate would get stuck in xfs_check_frozen (see
http://marc.theaimsgroup.com/?l=linux-xfs&m=105277405929107&w=2), and an
xfs_freeze -u <fs> would recover the system (though it did ruin the snapshot).
The key to replicating the problem seems to be in the number of times the script
runs, not on the load at the time the script runs. (The system had not been
heavily loaded within days of the occurence.) This behavior occurred more
frequently in versions close to the date the xfssyncd code went in -- for
example, on a kernel I compiled on June 10th, from CVS update of the same day, I
was able to reproduce the behaviour on the second manual iteration of the
script. A version of this script is attached at:
http://marc.theaimsgroup.com/?l=postgresql-admin&m=104932938526084&w=2
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
|