xfs
[Top] [All Lists]

Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT a

To: David Chinner <dgc@xxxxxxx>
Subject: Re: TAKE 950027 - xfs_icsb_lock_all_counters fails with CONFIG_PREEMPT and >=256p
From: Andi Kleen <ak@xxxxxxx>
Date: Thu, 2 Mar 2006 13:09:45 +0100
Cc: linux-xfs@xxxxxxxxxxx, mingo@xxxxxxx, torvalds@xxxxxxxx, tony.luck@xxxxxxxxx
In-reply-to: <20060302065807.GG1173973@xxxxxxxxxxxxxxxxx>
References: <20060301125320.20FDA49F1681@xxxxxxxxxxxxxxxxxxxxxxx> <p73fym1zbqo.fsf@xxxxxxxxxxxxx> <20060302065807.GG1173973@xxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
User-agent: KMail/1.9.1
On Thursday 02 March 2006 07:58, David Chinner wrote:
> On Thu, Mar 02, 2006 at 05:55:43AM +0100, Andi Kleen wrote:
> > David Chinner <dgc@xxxxxxx> writes:
> > > However, when we are excluding the counters, we are already running
> > > atomic due to holding the in-core superblock spinlock and the
> > > per-cpu code uses get_cpu()/put_cpu() so neither can get preempted
> > > in the critical sections the spinlocks used to protect.  So the only
> > > considerations are having an atomic exclusion mechanism and a wait
> > > method that does not sleep.
> >
> > spinlocks are should be fine for this in theory. All CPUs spinning
> > on a spinlock shouldn't happen normally, but it is supposed to work
> > correctly if it happens.
>
> And that does work correctly. However, that's not the problem being
> solved here.
>
> The problem is one thread grabbing > 245 spinlocks and holding them
> locked. The preempt count is per thread, so it has to be a single
> thread that holds all the locks at the same time to cause the problem.



> > If preemptible spinlocks don't scale to this for large NR_CPUs they're
> > broken and need to be fixed.
> > You discovered a important limitation in them and now instead
> > of fixing them properly you're trying to work around it.
>
> It's not NR_CPUs that they need to scale to - I could have 2 locks
> per cpu that need to be held, and then it's 2*NR_CPUs that the
> pre-empt count must scale to. Where do you draw the line?

2*NR_CPUS is still quite reasonable. And bits in a counter are cheap
anyways. We can easily go to 16bit. Or even more on a 64bit architecture 

It just requires to tweak the bit shifts in linux/hardirq.h

I suspect 256 softirq nestings are not needed, so how about setting 
PREEMPT_BITS to 16 and SOFTIRQ_BITS to 4 and hardirq bits
to 11 and PREEMPT_ACTIVE to 31?

I don't see anything that would rely on the count being positive
so using the sign bit is probably ok. Hardirq 11 might be a bit
tight though, so maybe it would be better to move 64bit machines
to 64bit here.

Ingo, Linus, Tony, what do you think? XFS is running into trouble 
on preemptive kernels on >256CPU systems because there are 
cases where one thread can hold 2*NR_CPUS spinlocks
and that overflows the current 8 bit preempt count.

> Seems like a no-brainer compared to changing code that affects core
> functionality of all plaforms of which all but one would never be
> affected by the problem in the first place....

I disagree because I can easily see other code running into 
the same problem. You were just the first to notice.

-Andi


<Prev in Thread] Current Thread [Next in Thread>