Stalled xfs_repair on 100TB filesystem
Stan Hoeppner
stan at hardwarefreak.com
Tue Mar 2 18:35:22 CST 2010
Jason Vagalatos put forth on 3/2/2010 11:22 AM:
> Hello,
> On Friday 2/26 I started an xfs_repair on a 100TB filesystem:
>
> #> nohup xfs_repair -v -l /dev/logfs-sessions/logdev /dev/logfs-sessions/sessions > /root/xfs_repair.out.logfs1.sjc.02262010 &
>
> I've been monitoring the process with 'top' and tailing the output file from the redirect above. I believe the repair has "stalled". When the process was running 'top' showed almost all physical memory consumed and 12.6G of virt memory consumed by xfs_repair. It made it all the way to Phase 6 and has been sitting at agno = 14 for almost 48 hours. The memory consumption of xfs_repair has ceased but the process is still "running" and consuming 100% CPU:
Here's how another user solved this xfs_repair "hanging" problem. I say
"hang" because "stall" didn't return the right Google results.
http://marc.info/?l=linux-xfs&m=120600321509730&w=2
Excerpt:
"In betwenn I created a test filesystem 360GB with 120million inodes on it.
xfs_repair without options is unable to complete. If I run xfs_repair -o
bhash=8192 the repair process terminates normally (the filesystem is
actually ok)."
Unfortunately it appears you'll have to start the repair over again.
--
Stan
More information about the xfs
mailing list