Speeding up xfs_repair on filesystem with millions of inodes
Dave Chinner
david at fromorbit.com
Tue Oct 27 19:17:44 CDT 2015
On Tue, Oct 27, 2015 at 11:51:35PM +0100, Michael Weissenbacher wrote:
> Hi Dave!
> First of all, today i cancelled the running xfs_repair (CTRL-C) and
> upped the system RAM from 8GB to 16GB - the maximum possible with this
> hardware.
>
> Dave Chinner wrote:
> > It's waiting on inode IO to complete in memory reclaim. I'd say you
> > have a problem with lots of dirty inodes in memory and very slow
> > writeback due to using something like RAID5/6 (this can be
> > *seriously* slow as mentioned recently here:
> > http://oss.sgi.com/archives/xfs/2015-10/msg00560.html).
> Unfortunately, this is a rather slow RAID-6 setup with 7200RPM disks.
> However, before the power loss occurred it performed quite OK for our
> use case and without any hiccups. But some time after the power loss
> some "rm" commands hung and didn't proceed at all. There was no CPU
> usage and there was hardly any I/O on the file system. That's why I
> suspected some sort of corruption.
Maybe you have a disk that is dying. Do your drives have TLER
enabled on them?
> Dave Chinner wrote:
> > Was it (xfs_repair) making progress, just burning CPU, or was it just hung?
> > Attaching the actual output of repair is also helpful, as are all
> > the things here:
> > ...
> The xfs_repair seemed to be making progress, albeit very very slowly. In
> iotop i saw about 99% I/O usage on kswapd0. Looking at the HDD LED's of
> the array, i could see that there was hardly any access to it at all
> (only once about every 10-15 seconds).
kswapd is tryingto reclaim kernel memory, which has nothing directly
to do with xfs_repair IO or cpu usage. Unless, of course, it is
trying to do reclaim for grab more memory for xfs_repair...
> I didn't include xfs_repair output, since it showed nothing unusual.
> ---snip---
> Phase 1 - find and verify superblock...
> Phase 2 - using internal log
> - zero log...
> - scan filesystem freespace and inode maps...
> - found root inode chunk
> Phase 3 - for each AG...
> - scan and clear agi unlinked lists...
> - process known inodes and perform inode discovery...
> - agno = 0
> ...
> - agno = 14
> - process newly discovered inodes...
> Phase 4 - check for duplicate blocks...
> - setting up duplicate extent list...
> - check for inodes claiming duplicate blocks...
> - agno = 0
> ...
> - agno = 14
> Phase 5 - rebuild AG headers and trees...
> - reset superblock...
> Phase 6 - check inode connectivity...
> - resetting contents of realtime bitmap and summary inodes
> - traversing filesystem ...
> ---snip---
> (and sitting there for about 72 hours)
It really hasn't made much progress if it's still traversing the fs
after 72 hours.
> Dave Chinner wrote:
> > If repair is swapping, then adding more RAM and/or faster swap space
> > will help. There is nothing that you can tweak that changes the
> > runtime or behaviour of phase 6 - it is single threaded and requires
> > traversal of the entire filesystem directory heirarchy to find all
> > the disconnected inodes so they can be moved to lost+found. And it
> > does write inodes, so if you have a slow SATA RAID5/6...
> Ok, so if i understand you correctly, none of the parameters will help
> for phase 6? I know that RAID-6 has slow write characteristics. But in
> fact I didn't see any writes at all with iotop and iostat.
If kswapd is doing all the work, then it's essentially got no memory
available. I would add significantly more swap space as well (e.g.
add swap files to the root filesystem - you can do this while repair
is running, too). If there's sufficient swap space, then repair
should use it fairly efficiently - it doesn't tend to thrash swap
because most of it's memory usage is for information that is only
accessed once per phase or is parked until it is needed in a later
phase so it doesn't need to be read from disk again...
> Dave Chinner wrote:
> >
> > See above. Those numbers don't include reclaimable memory like the
> > buffer cache footprint, which is affected by bhash and concurrency....
> >
> As said above, i did now double the RAM of the machine from 8GB to 16GB.
> Now I started xfs_repair again with the following options. I hope that
> the verbose output will help to understand better what's actually going on.
> # xfs_repair -m 8192 -vv /dev/sdb1
>
> Besides, is it wise to limit the memory with "-m" to keep the system
> from swapping or should I be better using the defaults (which would use
> 75% of RAM)?
Defaults, but it's really only a guideline for cache sizing. If
repair needs more memory to store metadata it is validating (like
the directory structure) then it will consume as much as it needs.
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list