[Top] [All Lists]

[PATCH] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation

To: xfs@xxxxxxxxxxx
Subject: [PATCH] xfs: avoid AGI/AGF deadlock scenario for inode chunk allocation
From: Brian Foster <bfoster@xxxxxxxxxx>
Date: Mon, 10 Feb 2014 09:33:27 -0500
Delivered-to: xfs@xxxxxxxxxxx
The inode chunk allocation path can lead to deadlock conditions if
a transaction is dirtied with an AGF (to fix up the freelist) for
an AG that cannot satisfy the actual allocation request. This code
path is written to try and avoid this scenario, but it can be
reproduced by running xfstests generic/270 in a loop on a 512b fs.

An example situation is:
- process A attempts an inode allocation on AG 3, modifies
  the freelist, fails the allocation and ultimately moves on to
  AG 0 with the AG 3 AGF held
- process B is doing a free space operation (i.e., truncate) and
  acquires the AG 0 AGF, waits on the AG 3 AGF
- process A acquires the AG 0 AGI, waits on the AG 0 AGF (deadlock)

The problem here is that process A acquired the AG 3 AGF while
moving on to AG 0 (and releasing the AG 3 AGI with the AG 3 AGF
held). xfs_dialloc() makes one pass through each of the AGs when
attempting to allocate an inode chunk. The expectation is a clean
transaction if a particular AG cannot satisfy the allocation
request. xfs_ialloc_ag_alloc() is written to support this through
use of the minalignslop allocation args field.

When using the agi->agi_newino optimization, we attempt an exact
bno allocation request based on the location of the previously
allocated chunk. minalignslop is set to inform the allocator that
we will require alignment on this chunk, and thus to not allow the
request for this AG if the extra space is not available. Suppose
that the AG in question has just enough space for this request, but
not at the requested bno. xfs_alloc_fix_freelist() will proceed as
normal as it determines the request should succeed, and thus it is
allowed to modify the agf. xfs_alloc_ag_vextent() ultimately fails
because the requested bno is not available. In response, the caller
moves on to a NEAR_BNO allocation request for the same AG. The
alignment is set, but the minalignslop field is never reset. This
increases the overall requirement of the request from the first
attempt. If this delta is the difference between allocation success
and failure for the AG, xfs_alloc_fix_freelist() rejects this
request outright the second time around and causes the allocation
request to unnecessarily fail for this AG.

To address this situation, reset the minalignslop field when we
transition from a THIS_BNO to a NEAR_BNO allocation request in

[NOTE: It appears at first glance that the optimized agi_newino
 allocation case is problematic in that it doesn't consider use of
 mp->m_sinoalign, if enabled. If the m_sinoalign allocation fails,
 however, we revert back to normal cluster alignment, which I think
 makes the overall sequence safe.]

Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
 fs/xfs/xfs_ialloc.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/fs/xfs/xfs_ialloc.c b/fs/xfs/xfs_ialloc.c
index 5d7f105..584daf0 100644
--- a/fs/xfs/xfs_ialloc.c
+++ b/fs/xfs/xfs_ialloc.c
@@ -382,6 +382,8 @@ xfs_ialloc_ag_alloc(
                        isaligned = 1;
                } else
                        args.alignment = xfs_ialloc_cluster_alignment(&args);
+               args.minalignslop = 0;
                 * Need to figure out where to allocate the inode blocks.
                 * Ideally they should be spaced out through the a.g.

<Prev in Thread] Current Thread [Next in Thread>