[Bug 16348] kswapd continuously active when doing IO

--- Comment #13 from Dave Chinner <david@xxxxxxxxxxxxx>  2010-07-13 11:10:36 ---
(In reply to comment #12)
> meta-data=/dev/sdc               isize=256    agcount=2048, agsize=1668912 
> blks
> meta-data=/dev/sdb               isize=256    agcount=2048, agsize=1668912 
> blks
> meta-data=/dev/sdd               isize=256    agcount=2048, agsize=1668912 
> blks
> meta-data=/dev/sde               isize=256    agcount=2048, agsize=1668912 
> blks
> meta-data=/dev/sdg               isize=256    agcount=13, agsize=268435392 
> blks
> meta-data=/dev/sdf               isize=256    agcount=26, agsize=268435392 
> blks

It's what I was expecting. The old filesystems were configured strangely with a
lot (2048) of relatively small allocation groups. The newer 13TB and 26TB
filesystems are using 1TB allocation groups, so are a lot more sensible in

Basically the problem is that every iteration of the xfs inode shrinker will be
doing a xfs_perag_get/put on every AG in every filesystem. That means it may be
scaning >8000 AGs before finding an inode to reclaim. This is where teh CPU is
being used.

Per-filesystem shrinker callouts will help this, as will a variable reclaim
start AG. However, the overhead of aggregating across all those AGs won't go
away. I think I have a solution to that (propagate the reclaimable inode bit to
the per-ag radix tree) that will reduce the overhead of aggregation and reclaim
scanning quite a bit, however it is onl an idea right now. Let me think on this
a bit....

