[Top] [All Lists]

Re: XFS hang during xfs_fsr run

To: Michael Weissenbacher <mw@xxxxxxxxxxxx>
Subject: Re: XFS hang during xfs_fsr run
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 5 Mar 2010 09:26:11 +1100
Cc: Christoph Hellwig <hch@xxxxxxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <4B8FC1B7.3070505@xxxxxxxxxxxx>
References: <4B8F871C.60802@xxxxxxxxxxxx> <20100304112018.GG14317@xxxxxxxxxxxxxxxx> <4B8FA2CD.6010904@xxxxxxxxxxxx> <20100304131511.GH14317@xxxxxxxxxxxxxxxx> <20100304134641.GA26871@xxxxxxxxxxxxx> <4B8FC1B7.3070505@xxxxxxxxxxxx>
User-agent: Mutt/1.5.18 (2008-05-17)
On Thu, Mar 04, 2010 at 03:20:39PM +0100, Michael Weissenbacher wrote:
> Hi Christoph/Dave!
>> Also when you next rebuilt the kernel please make sure to include
>> CONFIG_KALLSYMS in the configuration, possibly CONFIG_KALLSYMS_ALL too.
>> This will help greatly with decoding any kind of warning / oops.
> Thanks for this information. Unfortunately my current kernel was built  
> without CONFIG_KALLSYMS. I'm now recompiling with CONFIG_KALLSYMS and  
> CONFIG_KALLSYMS_ALL set. I reckon that my old traces can't be  
> ksymoops'ed even if i enable that kernel option now? I will see if i can  
> get a fresh trace then (even though i hope it won't happen again).

Yeah, that seems to be the case.

>> Was there anything else in the logs prior to the oops messages
>> that might indicate errors were occurring?
> Unfortunately everything in the logs is dandy until the error happens.  
> It seems that xfs_fsr randomly stops at some files and then locks up the  
> whole /var partition. I searched for the inode numbers where xfs_fsr  
> stopped and one time it was "/var/log/xfs_fsr.log" and the other time it  
> was "/var/spool/imap/x/user/xxxx/cyrus.cache" (username obfuscated).  

If you've got the inode numbers, then your running with the verbose
flag set? Do you still have the logs for those inodes that it hung

> Whats's interesting is that i have the no-defrag flag set on the whole  
> /var/log directory and still it seemed to hang on that log file.

xfs_fsr doesn't do directory traversals to find files for defrag -
it uses more efficient bulkstat+open-by-handle method to visit every
inode in the filesystem once. As a result, it will still open inodes
that have the nodefrag flag set on them, but will then ignore them once
it finds the flag is set.

If xfs_fsr hung before it checked the nodefrag flag, then there's
only a few things it could get stuck on:

        1. fsync() of the file
        2. file lock checks
        3. statvfs64()
        4. ioctl(XFS_IOC_FSGETXATTR)

A trace would tell us which one it was....


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>