Stalled xfs_repair on 100TB filesystem

Dave Chinner david at fromorbit.com
Tue Mar 2 18:25:00 CST 2010


On Tue, Mar 02, 2010 at 09:22:34AM -0800, Jason Vagalatos wrote:
> Hello, On Friday 2/26 I started an xfs_repair on a 100TB
> filesystem:
> 
> #> nohup xfs_repair -v -l /dev/logfs-sessions/logdev
> /dev/logfs-sessions/sessions >
> /root/xfs_repair.out.logfs1.sjc.02262010 &
> 
> I've been monitoring the process with 'top' and tailing the output
> file from the redirect above.  I believe the repair has
> "stalled".  When the process was running 'top' showed almost all
> physical memory consumed and 12.6G of virt memory consumed by
> xfs_repair.  It made it all the way to Phase 6 and has been
> sitting at agno = 14 for almost 48 hours.  The memory consumption
> of xfs_repair has ceased but the process is still "running" and
> consuming 100% CPU:

I wish we could reproduce hangs like this easily. I'd kill the
repair and run with the -P option. From the xfs_repair man page:

       -P     Disable prefetching of inode and directory blocks. Use
	      this option if you find xfs_repair gets stuck and
	      proceeding. Interrupting a stuck xfs_repair is safe.

Cheers,

Dave.
-- 
Dave Chinner
david at fromorbit.com




More information about the xfs mailing list