[Top] [All Lists]

Re: Stalled xfs_repair on 100TB filesystem

To: Jason Vagalatos <Jason.Vagalatos@xxxxxxxxxxxxxxxx>
Subject: Re: Stalled xfs_repair on 100TB filesystem
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Wed, 3 Mar 2010 11:25:00 +1100
Cc: "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
In-reply-to: <DD534F7C25BFA14FB18E6D603135D7EA0A11E82ECB@sbapexch05>
References: <DD534F7C25BFA14FB18E6D603135D7EA0A11E82ECB@sbapexch05>
User-agent: Mutt/1.5.18 (2008-05-17)
On Tue, Mar 02, 2010 at 09:22:34AM -0800, Jason Vagalatos wrote:
> Hello, On Friday 2/26 I started an xfs_repair on a 100TB
> filesystem:
> #> nohup xfs_repair -v -l /dev/logfs-sessions/logdev
> /dev/logfs-sessions/sessions >
> /root/xfs_repair.out.logfs1.sjc.02262010 &
> I've been monitoring the process with 'top' and tailing the output
> file from the redirect above.  I believe the repair has
> "stalled".  When the process was running 'top' showed almost all
> physical memory consumed and 12.6G of virt memory consumed by
> xfs_repair.  It made it all the way to Phase 6 and has been
> sitting at agno = 14 for almost 48 hours.  The memory consumption
> of xfs_repair has ceased but the process is still "running" and
> consuming 100% CPU:

I wish we could reproduce hangs like this easily. I'd kill the
repair and run with the -P option. From the xfs_repair man page:

       -P     Disable prefetching of inode and directory blocks. Use
              this option if you find xfs_repair gets stuck and
              proceeding. Interrupting a stuck xfs_repair is safe.


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>