xfs
[Top] [All Lists]

Re: a deadlock at xfs_growfs

To: nathans@xxxxxxx
Subject: Re: a deadlock at xfs_growfs
From: ASANO Masahiro <masano@xxxxxxxxxxxxxx>
Date: Wed, 25 Aug 2004 18:21:31 +0900 (JST)
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20040825173042.A3534270@xxxxxxxxxxxxxxxxxxxxxxxx>
References: <20040825053035.GB9823@frodo> <20040825.140454.1025204076.masano@xxxxxxxxxxxxxx> <20040825173042.A3534270@xxxxxxxxxxxxxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
From: Nathan Scott <nathans@xxxxxxx>
Subject: Re: a deadlock at xfs_growfs
Date: Wed, 25 Aug 2004 17:30:42 +1000

> On Wed, Aug 25, 2004 at 02:04:54PM +0900, ASANO Masahiro wrote:
> > Hi Nathan,
> > 
> > Thank you for your quick response.
> > But I have another problem report for xfs_growfs. :-p
> > 
> > Growing a filesystem in heavy dinode allocate/deallocate situation may
> > cause a deadlock.
> > It looks that (a)a process which hold m_peraglock as reading is
> > waiting for a pagebuf(AGI), and (b)another process which hold the
> > pagebuf(AGI) is waiting for down_read m_peraglock, while (c)xfs_growfs
> > is waiting for down_write m_peraglock.
> 
> Hmm, I don't see the code path where (b) - the rm process - can
> be holding the AGI buffer locked while trying to down m_peraglock?
> That would seem to be an ABBA deadlock, with (a) - the tar - (but
> the growfs process wouldn't even be involved?).  Where is it that
> the xfs_ifree/xfs_difree takes the AGI buffer lock before trying to
> grab the peraglock?

Possibly (b) was a victim. There were some other hang
processes.  kupdated also hanged.  I'll check it again more closely.
Anyway, the pagebuf was linked from XFS_TRANS_INACTIVE transaction
then.

BTW, how about xfs_dialloc().  It seems ABA order.

     xfs_dialloc()
     {
          ...
          down_read(&mp->m_peraglock);
          xfs_ialloc_read_agi();
          up_read(&mp->m_peraglock);
          ...
          down_read(&mp->m_peraglock);
          mp->m_perag[tagno].pagi_freecount--;
          up_read(&mp->m_peraglock);
          ...
     }

> The correct order would be first m_peraglock, then the AGI buffer,
> and I can't see anything that violates that in the code paths where
> you're deadlocked processes are.  Odd.

I see.

> > The following is a kernel backtrace. Its kernel version was 2.4.25,
> > but I guess that the recent kernel also has the same problem. What do
> > you think of it?
> 
> I would expect this deadlock still exists, I don't remember fixing
> it or seeing anyone else fix it - do you have a reproducible test
> case?

I'll do it later.
--
masano


<Prev in Thread] Current Thread [Next in Thread>