On 02/16/15 18:17, Dave Chinner wrote:
> On Mon, Feb 16, 2015 at 11:35:50AM -0600, Mark Tinguely wrote:
>> Thanks Michael, you don't need to hold your test box for me. I do
>> have a way to recreate these ABBA AGF buffer allocation deadlocks
>> and understand the whys and hows very well. I don't have a community
>> way to make a xfstest for it but I think your test is getting close.
>
> If you know what is causing them, then please explain how it occurs
> and how you think it needs to be fixed. Just telling us that you know
> something that we don't doesn't help us solve the problem. :(
>
> In general, the use of the args->firstblock is supposed to avoid the
> ABBA locking order issues with multiple allocations in the one
> transaction by preventing AG selection loops from looping back into
> AGs with a lower index than the first allocation that was made.
>
> So if you are seeing deadlocks, then it may be that we aren't
> following this constraint correctly in all locations....
>
> Cheers,
>
> Dave.
Will this be a classic deadlock that will cause problems when trying to
kill processes and unmount filesystems? If so, then I was unable to use
generic/224 to trigger a deadlock. If not, then I'll need a better way
of looking at the problem.
The longest generic/224 loop lasted only 3-1/2 hours, though. The
fstests enospc group was given some consideration as well.
If this issue does not require a lot of files, I might see if fio can
be helpful here.
Hints on whether to us a fast kernel or a miserably slow kernel would
be rather helpful.
My test setup is torn because most of the recent warning messages are
coming from the CONFIG_XFS_WARN kernels. The i686 Pentium 4 box will be
left that way. However, the Core 2 box was configured per
Documentation/SubmitChecklist from the kernel source, adding debug XFS
and locktorture. The locktorture settings are in flux, exercising
spinlocks at present. There was a mild halt in I/O for generic/017, but
that was XFS waiting on kmem-something waiting on a kmemleak function.
kmemleak was removed, and I'll continue from there.
Thanks!
Michael
|