On Fri, Apr 17, 2015 at 10:22:24AM +0800, xuw2015@xxxxxxxxx wrote:
> From: George Wang <xuw2015@xxxxxxxxx>
> Function percpu_counter_read just return the current counter, regardless of
> every cpu's count. This counter can be negative value, which will cause the
> checking of "allocated inode counts <= m_maxicount" false positive.
Have you actually seen this, or is it just theoretical?
> Commit 501ab3238753 "xfs: use generic percpu counters for inode counter
> " introduced this problem.
> Use the percpu_counter_compare, which will first do stuff in current cpu for
> performance; if can not get the result, it will get the exactly counter
> based on the count of every cpu.
That defeats the purpose of using percpu_counter_read() for this
check. We most definitely do not want to lock up the counter twice
for every allocation where we are close to the threshold. We don't
care if we aren't perfectly accurate at the threshold, but we do
care about the overhead of accurately summing the counter as it can
be read hundreds of thousands of times a second.
The correct fix is to use percpu_counter_read_positive(), because in
the majority of cases args.mp->m_maxicount is orders of magnitude
larger (20 million inodes per 100GB of fs space for small filesystems)
than the unaggregated per-cpu counts can cause the sum to go
negative. Hence if it is negative, it may as well be zero because it
makes no difference to the default threshold configurations.