On Mon, May 12 2008, Linus Torvalds wrote:
>
>
> On Mon, 12 May 2008, Alistair John Strachan wrote:
> >
> > I've been getting this since -rc1. It's still present in -rc2, so I thought
> > I'd bug some people. Everything seems to be working fine.
>
> Hmm. The problem is that blk_remove_plug() does a non-atomic
>
> queue_flag_clear(QUEUE_FLAG_PLUGGED, q);
>
> without holding the queue lock.
>
> Now, sometimes that's ok, because of higher-level locking on the same
> queue, so there is no possibility of any races.
>
> And yes, this comes through the raid5 layer, and yes, the raid layer holds
> the 'device_lock' on the raid5_conf_t, so it's all safe from other
> accesses by that raid5 configuration, but I wonder if at least in theory
> somebody could access that same device directly.
>
> So I do suspect that this whole situation with md needs to be resolved
> some way. Either the queue is already safe (because of md layer locking),
> and in that case maybe the queue lock should be changed to point to that
> md layer lock (or that sanity test simply needs to be removed). Or the
> queue is unsafe (because non-md users can find it too), and we need to fix
> the locking.
>
> Alternatively, we may just need to totally revert the thing that made the
> bit operations non-atomic and depend on the locking. This was introduced
> by Nick in commit 75ad23bc0fcb4f992a5d06982bf0857ab1738e9e ("block: make
> queue flags non-atomic"), and maybe it simply isn't viable.
There's been a proposed patch for at least a week, so Neil just needs to
send it in...
--
Jens Axboe
|