xfs
[Top] [All Lists]

Re: Xfs Access to block zero exception and system crash

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Xfs Access to block zero exception and system crash
From: Sagar Borikar <sagar_borikar@xxxxxxxxxxxxxx>
Date: Fri, 04 Jul 2008 15:48:24 +0530
Cc: Nathan Scott <nscott@xxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <486CE9EA.90502@xxxxxxxxxxx>
Organization: PMC Sierra Inc
References: <20080628000516.GD29319@disturbed> <340C71CD25A7EB49BFA81AE8C8392667028A1CA7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx> <20080629215647.GJ29319@disturbed> <20080630034112.055CF18904C4@xxxxxxxxxxxxxxxxxxxxxxxxxx> <4868B46C.9000200@xxxxxxxxxxxxxx> <20080701064437.GR29319@disturbed> <486B01A6.4030104@xxxxxxxxxxxxxx> <20080702051337.GX29319@disturbed> <486B13AD.2010500@xxxxxxxxxxxxxx> <1214979191.6025.22.camel@xxxxxxxxxxxxxxxxxx> <20080702065652.GS14251@xxxxxxxxxxxxxxxxxxxxx> <486B6062.6040201@xxxxxxxxxxxxxx> <486C4F89.9030009@xxxxxxxxxxx> <486C6053.7010503@xxxxxxxxxxxxxx> <486CE9EA.90502@xxxxxxxxxxx>
Sender: xfs-bounce@xxxxxxxxxxx
User-agent: Thunderbird 2.0.0.14 (X11/20080421)


Eric Sandeen wrote:
Sagar Borikar wrote:
Eric Sandeen wrote:


Eric, Could you please let me know about bits and pieces that we need to remember while back porting xfs to 2.6.18?
If you share patches which takes care of it, that would be great.
http://sandeen.net/rhel5_xfs/xfs-2.6.25-for-rhel5-testing.tar.bz2

should be pretty close.  It was quick 'n' dirty and it has some warts
but would give an idea of what backporting was done (see patches/ and
the associated quilt series; quilt push -a to apply them all)
Thanks a lot Eric. I'll go through it .I am actually trying another option of regularly defragmenting the file system under stress.

Ok, but that won't get to the bottom of the problem.  It might alleviate
it at best, but if I were shipping a product using xfs I'd want to know
that it was properly solved.  :)

Even we too don't want to leave it as it is. I still am working on back porting the latest xfs code.
Your patches are helping a lot .
Just to check whether that issue lies with 2.6.18 or MIPS port, I tested it on 2.6.24 x86 platform.
Here we created a loop back device of 10 GB and mounted xfs on that.
What I observe that xfs_repair reports quite a few bad blocks and bad extents here as well. So is developing bad blocks and extents normal behavior in xfs which would be recovered in background or is it a bug? I still didn't see the exception but the bad blocks and extents are
generated within 10 minutes or running the tests.
Attaching the log .
The tarball above should give you almost everything you need to run your
testcase with current xfs code on your older kernel to see if the bug
persists or if it's been fixed upstream, in which case you have a
relatively easy path to an actual solution that your customers can
depend on.

I wanted to understand couple of things for using xfs_fsr utility:

1. What should be the state of filesystem when I am running xfs_fsr. Ideally we should stop all io before running defragmentation.

you can run in any state.  Some files will not get defragmented due to
busy-ness or other conditions; look at the xfs_swap_extents() function
in the kernel which is very well documented; some cases return EBUSY.

2. How effective is the utility when ran on highly fragmented file system? I saw that if filesystem is 99.89% fragmented, the recovery is very slow. It took around 25 min to clean up 100GB JBOD volume and after that system was fragmented to 82%. So I was confused on how exactly the fragmentation works.

Again read the code, but basically it tries to preallocate as much space
as the file is currently using, then checks that it is more contiguous
space than the file currently has and if so, it copies the data from old
to new and swaps the new allocation for the old.  Note, this involves a
fair amount of IO.

Also don't get hung up on that fragmentation factor, at least not until
you've read xfs_db code to see how it's reported, and you've thought
about what that means.  For example: a 100G filesystem with 10 10G files
each with 5x2G extents will report 80% fragmentation.  Now, ask
yourself, is a 10G file in 5x2G extents "bad" fragmentation?

Agreed as in x86 too I see 99.12% fragmentation when I run above mentioned test. and xfs_fsr
doesn't help much even after freezing the file system.
Any pointers on probable optimum use of xfs_fsr?
3. Any precautions I need to take when working with that from data consistency, robustness point of view? Any disadvantages?

Anything which corrupts data is a bug, and I'm not aware of any such
bugs in the defragmentation process.

Assuming that we get some improvement by running xfs_fsr, is it safe to run regularly
in some periodic interval the defragmentation utility?
4. Any threshold for starting the defragmentation on xfs?

Pretty well determined by your individual use case and requirements, I
think.

-Eric
Thanks for the detailed response Eric.

Sagar
bad nblocks 13345 for inode 50331785, would reset to 19431
bad nextents 156 for inode 50331785, would reset to 251
        - process newly discovered inodes...
Phase 4 - check for duplicate blocks...
        - setting up duplicate extent list...
        - check for inodes claiming duplicate blocks...
        - agno = 1
        - agno = 0
entry "testfile" in shortform directory 132 references free inode 142
would have junked entry "testfile" in directory inode 132
entry "testfile" in shortform directory 138 references free inode 143
would have junked entry "testfile" in directory inode 138
entry "testfile" in shortform directory 140 references free inode 144
would have junked entry "testfile" in directory inode 140
bad nblocks 15848 for inode 141, would reset to 18634
bad nextents 269 for inode 141, would reset to 306
bad nblocks 18888 for inode 16777350, would reset to 19144
bad nextents 303 for inode 16777350, would reset to 309
bad nblocks 18704 for inode 16777351, would reset to 19144
bad nextents 291 for inode 16777351, would reset to 299
bad fwd (right) sibling pointer (saw 107678 should be NULLDFSBNO)
        in inode 142 ((null) fork) bmap btree block 236077307437232
would have cleared inode 142
bad fwd (right) sibling pointer (saw 1139882 should be NULLDFSBNO)
        in inode 143 ((null) fork) bmap btree block 4556402090352816
would have cleared inode 143
bad fwd (right) sibling pointer (saw 1138473 should be NULLDFSBNO)
        in inode 144 ((null) fork) bmap btree block 4564279060373680
would have cleared inode 144
bad nblocks 13825 for inode 145, would reset to 18503
bad nextents 221 for inode 145, would reset to 222
        - agno = 2
entry "testfile" in shortform directory 33595588 references free inode 33595593
would have junked entry "testfile" in directory inode 33595588
bad nblocks 18704 for inode 33595589, would reset to 19121
bad nextents 306 for inode 33595589, would reset to 314
bad nblocks 18704 for inode 33595590, would reset to 19432
bad nextents 302 for inode 33595590, would reset to 313
bad nblocks 18640 for inode 33595591, would reset to 19432
bad nextents 311 for inode 33595591, would reset to 317
bad nblocks 18888 for inode 33595592, would reset to 19432
bad nextents 312 for inode 33595592, would reset to 322
bad fwd (right) sibling pointer (saw 104113 should be NULLDFSBNO)
        in inode 33595593 ((null) fork) bmap btree block 9041060911947952
would have cleared inode 33595593
        - agno = 3
bad nblocks 18888 for inode 50331781, would reset to 19432
bad nextents 315 for inode 50331781, would reset to 324
bad nblocks 18888 for inode 50331782, would reset to 19432
bad nextents 326 for inode 50331782, would reset to 333
bad nblocks 18888 for inode 50331783, would reset to 19432
bad nblocks 18428 for inode 50331784, would reset to 19784
bad nextents 285 for inode 50331784, would reset to 306
bad nblocks 18704 for inode 16777352, would reset to 19144
bad nextents 311 for inode 16777352, would reset to 315
bad nblocks 13345 for inode 50331785, would reset to 19431
bad nextents 156 for inode 50331785, would reset to 251
bad nblocks 18888 for inode 16777353, would reset to 19144
bad nextents 318 for inode 16777353, would reset to 321
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
        - traversing filesystem ...
        - agno = 0
entry "testfile" in shortform directory inode 132 points to free inode 142would 
junk entry
entry "testfile" in shortform directory inode 138 points to free inode 143would 
junk entry
entry "testfile" in shortform directory inode 140 points to free inode 144would 
junk entry
        - agno = 1
        - agno = 2
entry "testfile" in shortform directory inode 33595588 points to free inode 
33595593would junk entry
        - agno = 3
        - traversal finished ...
        - moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

        XFS_REPAIR Summary    Fri Jul  4 15:34:47 2008

Phase           Start           End             Duration
Phase 1:        07/04 15:34:00  07/04 15:34:04  4 seconds
Phase 2:        07/04 15:34:04  07/04 15:34:31  27 seconds
Phase 3:        07/04 15:34:31  07/04 15:34:47  16 seconds
Phase 4:        07/04 15:34:47  07/04 15:34:47
Phase 5:        Skipped
Phase 6:        07/04 15:34:47  07/04 15:34:47
Phase 7:        07/04 15:34:47  07/04 15:34:47

Total run time: 47 seconds
<Prev in Thread] Current Thread [Next in Thread>