xfs
[Top] [All Lists]

Re: use-after-free on log replay failure

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: use-after-free on log replay failure
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Wed, 13 Aug 2014 09:11:42 -0400
Cc: Alex Lyakas <alex@xxxxxxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <20140813000312.GC20518@dastard>
References: <9D3CBECB663B4A77B7EF74B67973310A@alyakaslap> <20140804230721.GA20518@dastard> <AC10852F403846A182491ED8071135ED@alyakaslap> <20140806152042.GB39990@xxxxxxxxxxxxxxx> <CAOcd+r3bC59m7Rh-3tmjrnWnF5XoPQfE=U+=hz78NcAGu+Ou1g@xxxxxxxxxxxxxx> <20140811132057.GA1186@xxxxxxxxxxxxxxx> <20140811215207.GS20518@dastard> <20140812120341.GA46654@xxxxxxxxxxxxxxx> <DC407F6E8F8C4EE1AF7117D7D6ABF282@alyakaslap> <20140813000312.GC20518@dastard>
User-agent: Mutt/1.5.23 (2014-03-12)
On Wed, Aug 13, 2014 at 10:03:12AM +1000, Dave Chinner wrote:
> On Tue, Aug 12, 2014 at 03:39:02PM +0300, Alex Lyakas wrote:
> > Then I set up the following Device Mapper target onto /dev/vde:
> > dmsetup create VDE --table "0 41943040 linear-custom /dev/vde 0"
> > I am attaching the code (and Makefile) of dm-linear-custom target.
> > It is exact copy of dm-linear, except that it has a module
> > parameter. With the parameter set to 0, this is an identity mapping
> > onto /dev/vde. If the parameter is set to non-0, all WRITE bios are
> > failed with ENOSPC. There is a workqueue to fail them in a different
> > context (not sure if really needed, but that's what our "real"
> > custom
> > block device does).
> 
> FWIW, now I've looked at the dm module, this could easily be added
> to the dm-flakey driver by adding a "queue_write_error" option
> to it (i.e. similar to the current drop_writes and corrupt_bio_byte
> options).
> 
> If we add the code there, then we could add a debug-only XFS sysfs
> variable to trigger the log recovery sleep, and then use dm-flakey
> to queue and error out writes. That gives us a reproducable xfstest
> for this condition. Brian, does that sound like a reasonable plan to
> you?
> 

It would be nice if we could avoid this kind of timing hack, but I'll
have to look at the related code and see what we have for options. I'm
also assuming the oops/crash is a consistent behavior when we hit the
race, since that probably defines the failure mode of the test.

But yeah, seems like a reasonable plan in general. Added to the todo
list for once we have this sorted out...

Brian

> Thanks for describing the method you've been using to reproduce the
> bug so clearly, Alex.
> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>