xfs
[Top] [All Lists]

Re: [PATCH 0/11] xfs: delayed logging

To: xfs@xxxxxxxxxxx
Subject: Re: [PATCH 0/11] xfs: delayed logging
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 6 May 2010 23:26:41 +1000
In-reply-to: <1273110351-2333-1-git-send-email-david@xxxxxxxxxxxxx>
References: <1273110351-2333-1-git-send-email-david@xxxxxxxxxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Thu, May 06, 2010 at 11:45:40AM +1000, Dave Chinner wrote:
> 
> Hi Folks,
> 
> This is version 4 of the delayed logging series.
> 
> I won't repeat everything about what it is, just point you
> here:
> 
> http://marc.info/?l=linux-xfs&m=126862777118946&w=2
> 
> for the description, and here:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/dgc/xfs.git delayed-logging
> 
> for the current code. Note that this is a rebased branch, so you'll
> need to pull it again into a new branch to review.
> 
> This version includes a number of fixes and cleanups related to the
> busy extent tracking. This includes fixing a long standing
> off-by-one that was causing assert failures when inserting busy
> extents that overlapped with existing busy extents.

Ok, so I'm still getting assert failures, but they are much harder
to hit. However, here's the fragment of a trace that points out why
delayed logging is now tripping over this problem:

$ grep _busy: t.t |tail -20|cut -d ":" -f 2-
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 
133 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 
24 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 
91909 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 
75 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100504 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100505 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100506 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100507 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100508 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xfe95d7e3 agno 1 agbno 
100509 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0xbe3d9ca6 agno 0 agbno 
37809 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0x4575b77f agno 2 agbno 
133387 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0xeb882f7e agno 2 agbno 
151935 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0x362c7c31 agno 2 agbno 
133386 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0x4dda728f agno 2 agbno 
151936 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0xc1ca9675 agno 3 agbno 
49832 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0x8a3e0a41 agno 3 agbno 
49833 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff880067c42da0 tid 0x3fe9fb72 agno 1 agbno 
109637 len 1 async
xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0x69f15b1c agno 1 agbno 
91909 len 1 async

First thing to note is that there are only two different addresses
for transaction structures here, but there are 11 different
transaction IDs. That's a bit of a problem, really. The assert fail
was triggered by the last line:

xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0x69f15b1c agno 1 agbno 
91909 len 1 async

Which appears to have already been marked busy by a different
transaction earlier on:

xfs_alloc_busy: dev 253:16 trans 0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 
91909 len 1 async

If I just pull out the last operations on that block:

$ grep "agno 1 agbno 919" t.t |tail -5
         fs_mark-2741  [007] 795063.752648: xfs_alloc_busy: dev 253:16 trans 
0xffff8801067eada0 tid 0xa9a1ac8d agno 1 agbno 91909 len 1 async
         fs_mark-2742  [005] 795063.754591: xfs_alloc_busysearch: dev 253:16 
agno 1 agbno 91909 len 1 found
         fs_mark-2745  [007] 795063.775012: xfs_alloc_busy: dev 253:16 trans 
0xffff8801067eada0 tid 0x69f15b1c agno 1 agbno 91909 len 1 async
       xfslogd/4-497   [004] 795063.775540: xfs_alloc_unbusy: dev 253:16 agno 1 
agbno 91909 len 1
       xfslogd/4-497   [004] 795063.775542: xfs_alloc_busysearch: dev 253:16 
agno 1 agbno 91909 len 1 found

We can see that there are two separate processes marking the same
extent busy, using the same transaction structure address. But they
have to be two different transactions, because a transaction is
always done in the context of a single kernel thread.  i.e:

        fs_mark-2741    fs_mark-2742    fs_mark-2743    xfslogd
        xact alloc
        free 1:91909
        mark busy
        commit xact
        free xact
                        xact alloc
                        alloc 1:91909
                        busy search
                        mark xact sync
                        commit xact
                        free xact
                        force log
                        checkpoint starts
                        ....
                                        xact alloc
                                        free 1:91909
                                        mark busy
                                        finds match, not sync
                                        *** KABOOM! ***
                                        ....
                                                        log IO completes
                                                        unbusy 1:91909
                        checkpoint completes

So, now I think I can explain the causes of that assert failure.
Firstly, the off-by one I found this morning in the search code
which would trigger for both delayed and normal logging modes.  This
had nothing to do with transaction commits and lifecycles, just a
search failing to set the transaction synchronous when it should
have.

Secondly, for delayed logging only, matching by transaction
structure address triggers the failure because busy extents
have a much longer life than the transaction structure. It is clear
why the transaction ID matching didn't trip over - it would have
triggered a log force in this situation, and hence blocked until
the checkpoint that fs_mark-2742 had triggered was complete before
redoing the rbtree insert.

Right now I'm simply going to go back to using the transaction ID
for matching transactions, even though the above analysis points out
that even that is not as efficient as it could be for delayed
logging. That is, we don't even need to force the log or have a
synchronous transaction if the extent was first freed in the current
checkpoint seqeunce. Doing that, however, requires pinning the
checkpoint sequence (i.e. preventing a flush) until the current
transaction commits. While that is in the plan for delayed logging,
it is future functionality and hence I'm not going to attempt to
design and implement it this close to 2.6.35-rc cycle. [*]

Christoph - does this answer all your concerns with the busy extent
tracking modifications, or is there still something that are left
unexplained?

Cheers,

Dave.

[*] Checkpoint pinning is needed to implement atomic
multi-transaction operations such as "create with attribute".
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>