xfs
[Top] [All Lists]

Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT a

To: "Luck, Tony" <tony.luck@xxxxxxxxx>
Subject: Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT and >=256p
From: David Chinner <dgc@xxxxxxx>
Date: Fri, 3 Mar 2006 08:23:34 +1100
Cc: Andi Kleen <ak@xxxxxxx>, David Chinner <dgc@xxxxxxx>, linux-xfs@xxxxxxxxxxx, mingo@xxxxxxx, torvalds@xxxxxxxx
In-reply-to: <B8E391BBE9FE384DAA4C5C003888BE6F05D97285@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
References: <B8E391BBE9FE384DAA4C5C003888BE6F05D97285@xxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Thu, Mar 02, 2006 at 09:09:08AM -0800, Luck, Tony wrote:
> > Ingo, Linus, Tony, what do you think? XFS is running into trouble 
> > on preemptive kernels on >256CPU systems because there are 
> > cases where one thread can hold 2*NR_CPUS spinlocks
> > and that overflows the current 8 bit preempt count.
> 
> NR_CPUS can be 1024 now ... I thought that spinlocks were intended
> for code that will be held for a _short_ time.  Even if the code in
> XFS only wants to execute a single instruction inside this hyper-
> critical region, you need to contend with the fact that the first
> one of those locks that XFS acquired is going to be held while you
> acquire and then release the other 2047 locks.  Does that sound like
> a short time (rhetorical question)?

Perspective: In filesystem and disk I/O terms, yes, it is a _very_
short time.

CPUs are orders of magnitudes faster than disks, so something that
seems very wasteful in terms of CPU time that saves a disk seek or
two or doubles the I/O sizes or reduces fragmentation pays off very
quickly in terms of I/O performance. Burn that CPU as much as you
need, as long as you get the expected payoff at the end.

The tradeoff being made here is that we spend a millisecond or two
every few seconds to lock and rebalance counters instead of wasting
large amounts of CPU time on a single lock. The result is a _major_
improvement in parallel buffered write throughput (an order of
magnitude on our test rig) because it removes the only point of
global contention within XFS for this load.

It also changes the CPU usage scaling from increasing linearly with
thread count to scaling linearly with throughput.  And we get this
without any other measurable performance regressions, so I think
that this is good tradeoff to make.

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>