[Top] [All Lists]

Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c

To: Mark Tinguely <tinguely@xxxxxxx>
Subject: Re: [PATCH 59/60] xfs: Add xfs_log_rlimit.c
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 27 Jun 2013 08:18:37 +1000
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <51CAF11F.4050905@xxxxxxx>
References: <1371617468-32559-1-git-send-email-david@xxxxxxxxxxxxx> <1371617468-32559-60-git-send-email-david@xxxxxxxxxxxxx> <51C8B984.3090400@xxxxxxx> <20130624222708.GV29338@dastard> <51C9A3EF.70005@xxxxxxx> <20130626040518.GG29376@dastard> <51CAF11F.4050905@xxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Wed, Jun 26, 2013 at 08:48:15AM -0500, Mark Tinguely wrote:
> On 06/25/13 23:05, Dave Chinner wrote:
> >On Tue, Jun 25, 2013 at 09:06:39AM -0500, Mark Tinguely wrote:
> >>On 06/24/13 17:27, Dave Chinner wrote:
> >>>On Mon, Jun 24, 2013 at 04:26:28PM -0500, Mark Tinguely wrote:
> >>>>On 06/18/13 23:51, Dave Chinner wrote:
> >>>>>+         * 2) If the lsunit option is specified, a transaction requires 
> >>>>>2 LSU
> >>>>>+         *    for the reservation because there are two log writes that 
> >>>>>can
> >>>>>+         *    require padding - the transaction data and the commit 
> >>>>>record which
> >>>>>+         *    are written separately and both can require padding to 
> >>>>>the LSU.
> >>>>>+         *    Consider that we can have an active CIL reservation 
> >>>>>holding 2*LSU,
> >>>>>+         *    but the CIL is not over a push threshold, in this case, 
> >>>>>if we
> >>>>>+         *    don't have enough log space for at one new transaction, 
> >>>>>which
> >>>>>+         *    includes another 2*LSU in the reservation, we will run 
> >>>>>into dead
> >>>>>+         *    loop situation in log space grant procedure. i.e.
> >>>>>+         *    xlog_grant_head_wait().
> >>>>>+         *
> >>>>>+         *    Hence the log size needs to be able to contain two 
> >>>>>maximally sized
> >>>>>+         *    and padded transactions, which is (2 * (2 * LSU + 
> >>>>>maxlres)).
> >>>>>+         *
> >>>>
> >>>>Any thoughts on how we can separate the 2 * log stripe unit from the
> >>>>reservation.
> >>>
> >>>You can't. The reservation, by definition, is the worse case log
> >>>space usage for the transaction. Therefore, it has to take into
> >>>account the LSU padding that may be necessary if this transaction is
> >>>committed by itself to the log (e.g. wsync operation)
> >>
> >>I am thinking we should separate the extra 2*LSU allocated space
> >>from the individual pieces of the transaction
> >
> >Why? How are you going to prevent log space deadlocks if we don't
> >reserve that space for each individual transaction in a permanent
> >transaction reservation?
> Allocate 2 per transaction, not one per segment of a transaction.
> It may be a wash for 2 or 3 segments per transaction, but would help
> if there are 10 segments per transaction.

I'll repeat: The unit reservation used for every commit in a
permanent transaction *requires* LSU padding because we don't know
ahead of time when it will be required.  We can't avoid that worst
case behaviour - violating design constraints of the log subsystem
because the worst case reservations get large is simply not an

> >>>IOWs, rather than building new, large "aggregation" transactions, we
> >>>should be adding infrastructure to the CIL to allow atomic
> >>>transaction aggregation techniques to be used and then just using
> >>>the existing transaction infrastructure to implement operations such
> >>>as "create with ACL"....
> >>
> >>I will have to think on this. I don't see how the allocation of log
> >>space for a new transaction while holding locks is a good thing.
> >
> >Who said anything about requiring new locks? :)
> I meant we have to hold the inode locks between the directory code
> the the attribute code. If we do not, then the attributes quickly
> become out of line with the directory status.

But why would we need to hold locks atomically across all operations
there?  The inodes are already locked at that VFS level via i_mutex,
so nobody is going to see the newly created/modified inodes in the
namespace until we complete the whole create/rename operation, so
the only thing we realy need to be concerned about is whether the
operations can be recovered in whole or in part.....

> >>Even if you are creative, the multipliers add up quickly.
> >>xfs_rename will do the directory ops. possibly a attibset and up to
> >>2 attrrm and we have to hold the src and target inodes locks over
> >>all the operations.
> >
> >We don't have to hold them locked across the entire operation
> >because the VFS is holding them locked so nothing else can do a
> >namespace operation which we are modifying them.
> I find without holding the lock over both dir and attrib routines
> that remove occasionally can't find parent pointer attribute entry.
> I suspect remove racing in rename.

Racing with what? The VFS prevents create/link/unlink/rename races
via the i_mutex locking on the parent directories and the


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>