xfs
[Top] [All Lists]

Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT a

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT and >=256p
From: David Chinner <dgc@xxxxxxx>
Date: Thu, 2 Mar 2006 17:58:07 +1100
Cc: David Chinner <dgc@xxxxxxx>, linux-xfs@xxxxxxxxxxx
In-reply-to: <p73fym1zbqo.fsf@verdi.suse.de>
References: <20060301125320.20FDA49F1681@chook.melbourne.sgi.com> <p73lkvtzw3e.fsf@verdi.suse.de> <20060302030647.GV1174024@melbourne.sgi.com> <p73fym1zbqo.fsf@verdi.suse.de>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: Mutt/1.4.2.1i
On Thu, Mar 02, 2006 at 05:55:43AM +0100, Andi Kleen wrote:
> David Chinner <dgc@xxxxxxx> writes:
> > 
> > However, when we are excluding the counters, we are already running
> > atomic due to holding the in-core superblock spinlock and the
> > per-cpu code uses get_cpu()/put_cpu() so neither can get preempted
> > in the critical sections the spinlocks used to protect.  So the only
> > considerations are having an atomic exclusion mechanism and a wait
> > method that does not sleep.
> 
> spinlocks are should be fine for this in theory. All CPUs spinning
> on a spinlock shouldn't happen normally, but it is supposed to work 
> correctly if it happens.

And that does work correctly. However, that's not the problem being
solved here.

The problem is one thread grabbing > 245 spinlocks and holding them
locked. The preempt count is per thread, so it has to be a single
thread that holds all the locks at the same time to cause the problem.

> If preemptible spinlocks don't scale to this for large NR_CPUs they're
> broken and need to be fixed.
> You discovered a important limitation in them and now instead
> of fixing them properly you're trying to work around it.

It's not NR_CPUs that they need to scale to - I could have 2 locks
per cpu that need to be held, and then it's 2*NR_CPUs that the
pre-empt count must scale to. Where do you draw the line?

Or do you just say "Don't do that" because there are other ways
of acheiving the same goal?

> Please try to see the big picture a bit more instead of just XFS.

Sure, I try to.

From my POV, we've got a regression in new code that will affect a
tiny, tiny minority of machines out there. The new code is arguably
broken or stupid and has been easily fixed without affecting
anything else.

Seems like a no-brainer compared to changing code that affects core
functionality of all plaforms of which all but one would never be
affected by the problem in the first place....

Cheers,

Dave.
-- 
Dave Chinner
R&D Software Enginner
SGI Australian Software Group


<Prev in Thread] Current Thread [Next in Thread>