xfs
[Top] [All Lists]

Re: xfs_repair deleting realtime files.

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: xfs_repair deleting realtime files.
From: Anand Tiwari <tiwarikanand@xxxxxxxxx>
Date: Tue, 25 Sep 2012 19:26:32 -0600
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=zGua0sswgZxP/DZkPwVLeS0PfQGZe4LxMC6YkFMj61I=; b=AfO9+dsmhZFn9GlnlVyphLu3W3ksv6BlygIU7MUaL1/fP/1GML43CNeDDWzXJncR+F BBiLZw/qlGWNw2n7p6VCdGfVJcknGYWNxXkJtMA+3v+uCFR0YAkpEnCT4a7f0Wv9o2TI Uwln4nWaj6JFopNQlkV25ODa17TDChpaQUMMD5Rx17SdliBy9OD/d541lQ/V3IQ7W6NK 4aEq0nAbarnWQoBs/P13pOLwEnx1Bjv4sNDFCzZ3t8QbAQoEwPcg8daNH9tJenGLsOoH adBrRcw+pMeMOd6SEYFb89ugnBdbGEk00vzdFJpIgFLRDqQ8V3XHAm46PkTyYXes9IXG KIkw==
In-reply-to: <CAHt31_8rEc93vpnbbKngY4uS0kAct3Z5A+2G0LmBzv5rWKdSfA@xxxxxxxxxxxxxx>
References: <CAHt31_9K_vrzoqwSVsz-6VNVmMUzMyGCFEZfviRV-xPcUqv8-w@xxxxxxxxxxxxxx> <505BF45D.5050909@xxxxxxxxxxx> <20120924075551.GF20960@dastard> <CAHt31_8rEc93vpnbbKngY4uS0kAct3Z5A+2G0LmBzv5rWKdSfA@xxxxxxxxxxxxxx>


On Mon, Sep 24, 2012 at 6:51 AM, Anand Tiwari <tiwarikanand@xxxxxxxxx> wrote:


On Mon, Sep 24, 2012 at 1:55 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Fri, Sep 21, 2012 at 12:00:13AM -0500, Eric Sandeen wrote:
> On 9/20/12 7:40 PM, Anand Tiwari wrote:
> > Hi All,
> >
> > I have been looking into an issue with xfs_repair with realtime sub volume. some times while running xfs_repair I see following errors
> >
> > ----------------------------
> > data fork in rt inode 134 claims used rt block 19607
> > bad data fork in inode 134
> > would have cleared inode 134
> > data fork in rt inode 135 claims used rt block 29607
> > bad data fork in inode 135
> > would have cleared inode 135
.....
> > xfs_db> inode 135
> > xfs_db> bmap
> > data offset 0 startblock 13062144 (12/479232) count 2097000 flag 0
> > data offset 2097000 startblock 15159144 (14/479080) count 2097000 flag 0
> > data offset 4194000 startblock 17256144 (16/478928) count 2097000 flag 0
> > data offset 6291000 startblock 19353144 (18/478776) count 2097000 flag 0
> > data offset 8388000 startblock 21450144 (20/478624) count 2097000 flag 0
> > data offset 10485000 startblock 23547144 (22/478472) count 2097000 flag 0
> > data offset 12582000 startblock 25644144 (24/478320) count 2097000 flag 0
> > data offset 14679000 startblock 27741144 (26/478168) count 2097000 flag 0
> > data offset 16776000 startblock 29838144 (28/478016) count 2097000 flag 0
> > data offset 18873000 startblock 31935144 (30/477864) count 1607000 flag 0
> > xfs_db> inode 134
> > xfs_db> bmap
> > data offset 0 startblock 7942144 (7/602112) count 2097000 flag 0
> > data offset 2097000 startblock 10039144 (9/601960) count 2097000 flag 0
> > data offset 4194000 startblock 12136144 (11/601808) count 926000 flag 0
>
> It's been a while since I thought about realtime, but -
>
> That all seems fine, I don't see anything overlapping there, they are
> all perfectly adjacent, though of interesting size.

Yeah, the size is the problem.

....
> Every extent above is length 2097000 blocks, and they are adjacent.
> But you say your realtime extent size is 512 blocks ... which doesn't go
> into 2097000 evenly.   So that's odd, at least.

Once you realise that the bmapbt is recording multiples of FSB (4k)
rather than rtextsz (2MB), it becomes more obvious what the problem
is: rounding of the extent size at MAXEXTLEN - 2097000 is only 152
blocks short of 2^21 (2097152).

I haven't looked at the kernel code yet to work out why it is
rounding to a non-rtextsz multiple, but that is the source of the
problem.

The repair code is detecting that extents are not of the
correct granularity, but the error message indicates that this was
only ever expected for duplicate blocks occurring rather than a
kernel bug. So "fixing repair" is not what is needd here - finding
and fixing the kernel bug is what you shoul be looking at.

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx


thanks, I started looking at allocator code and and will report if see something 


I think this is what happening.  If we have following conditions,
  1) we have more than 8gb contiguous space available to allocate. ( i.e. more than 2^21 4k blocks)
  2) only one file is open for writing in real-time volume.

To satisfy first condition, I just took empty file-system.

Now lets start allocating, lets say in chucks of 25000, realtime allocator will have no problem allocating "exact" block while searching forward.
xfs_rtfind_forw(). It will allocate 49 "real-time extents", where the 49th "real-time extent" is partially full.  (25000/512 = 48)

everything is fine for first 83 allocations, as we were able to grow the extent. Now we have 2075000 (25000*83) blocks in first extent ie 4053 "real-time extents" (where last "real-time extent" is partially full).

for 84th allocation, real-time allocator will allocate another 49 "real-time extents" as it does not know about maximum extent size, but we can not grow the extent in xfs_bmap_add_extent_unwritten_real().  so we insert a new extent (case BMAP_LEFT_FILLING).  now the new extent starts from 2075000, which is not aligned with rextsize (512 in this case).

To fix this, I see two options,
1) fix real-time allocator and teach it about maximum extent size.
2) for real-time files, aligned new extent before inserting.

In my opinion, we should not worry about either of above, as this looks good method for allocation.  I can fix xfs_repair tool and make it aware of these conditions ("real-time extents" shared by two or more extents).

Let me know what you guys think,

anand

<Prev in Thread] Current Thread [Next in Thread>