Re: XFS/md/blkdev warning (was Re: Linux 2.6.26-rc2)

To: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
Subject: Re: XFS/md/blkdev warning (was Re: Linux 2.6.26-rc2)
From: Jens Axboe <jens.axboe@xxxxxxxxxx>
Date: Mon, 12 May 2008 18:49:20 +0200
Cc: Alistair John Strachan <alistair@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx, Neil Brown <neilb@xxxxxxx>, Nick Piggin <npiggin@xxxxxxx>
In-reply-to: <alpine.LFD.1.10.0805120933310.3019@woody.linux-foundation.org>
References: <alpine.LFD.1.10.0805120731480.3188@woody.linux-foundation.org> <200805121726.15576.alistair@devzero.co.uk> <alpine.LFD.1.10.0805120933310.3019@woody.linux-foundation.org>
Sender: xfs-bounce@xxxxxxxxxxx
On Mon, May 12 2008, Linus Torvalds wrote:
> On Mon, 12 May 2008, Alistair John Strachan wrote:
> >
> > I've been getting this since -rc1. It's still present in -rc2, so I thought 
> > I'd bug some people. Everything seems to be working fine.
> Hmm. The problem is that blk_remove_plug() does a non-atomic 
>       queue_flag_clear(QUEUE_FLAG_PLUGGED, q);
> without holding the queue lock.
> Now, sometimes that's ok, because of higher-level locking on the same 
> queue, so there is no possibility of any races.
> And yes, this comes through the raid5 layer, and yes, the raid layer holds 
> the 'device_lock' on the raid5_conf_t, so it's all safe from other 
> accesses by that raid5 configuration, but I wonder if at least in theory 
> somebody could access that same device directly.
> So I do suspect that this whole situation with md needs to be resolved 
> some way. Either the queue is already safe (because of md layer locking), 
> and in that case maybe the queue lock should be changed to point to that 
> md layer lock (or that sanity test simply needs to be removed). Or the 
> queue is unsafe (because non-md users can find it too), and we need to fix 
> the locking.
> Alternatively, we may just need to totally revert the thing that made the 
> bit operations non-atomic and depend on the locking. This was introduced 
> by Nick in commit 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e ("block: make 
> queue flags non-atomic"), and maybe it simply isn't viable.

There's been a proposed patch for at least a week, so Neil just needs to
send it in...

Jens Axboe

