On a test computer running the CVS kernel from June 19th, xfs_freeze has
been stuck in "D" state for two days.
ps -p <PID of xfs_freeze> -o wchan,args returns:
WCHAN COMMAND
down /usr/sbin/xfs_freeze -f /home/db
The rest of the system appears to run fine (/home/db hosts the postgresql
test server's "database cluster"). Issuing /usr/sbin/xfs_freeze -u /home/db
seems to do nothing.
Some background:
On this system, a backup script is run every three (3) hours which does the
following:
1.) rysync from /home/db/pgdata to a backup server
2.) create snaphsot of /home/db
3.) create snapshot of /home/db/pgdata/pg_xlog
4.) mount the snapshots to /mnt/pgsnap and /mnt/pgxlsnap
5.) rsync the two folders to the appopriate folders on the backup server
6.) remove the snapshots
[OT: 7.) start and stop the postgresql server on the backup server with the
data just backed-up and capture the log; then quit]
This issues occurred very frequently with the CVS kernel from just after the
xfscyncd changes, but with the June 19th CVS, things ran stably for over a
week.
So far, the only way I've been able to "recover" from this is to do a hard
reset. The "halt" command puts "Jul 3 10:51:16 comptest shutdown: shutting
down for system halt"; but seems to zombie immediately:
24739 D 10:51:15 shutdown -h 0 w
24740 Z 10:51:16 [shutdown <defunct>]
If needed I can compile a kernel with kdb and try to get a stack trace on
xfs_freeze if the problem recurs. I was wondering if this was a known issues
with a fix available ;-)
Thanks,
Murthy
|