xfs
[Top] [All Lists]

Re: bug and fun with XFS: unable to handle kernel NULL pointer dereferen

To: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Subject: Re: bug and fun with XFS: unable to handle kernel NULL pointer dereference
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Mon, 26 Jul 2010 10:18:59 +1000
Cc: xfs@xxxxxxxxxxx
In-reply-to: <201007260019.51568@xxxxxx>
References: <201007260019.51568@xxxxxx>
User-agent: Mutt/1.5.20 (2009-06-14)
On Mon, Jul 26, 2010 at 12:19:47AM +0200, Michael Monnerie wrote:
> I just enjoy an obviously broken XFS filesystem. It was a running 
> server, which I planned to migrate so I did "rsync -aHAX / 
> otherhost::rsyncmodule", and experienced a "killed". At that time I 
> thought it was a one time mistake, so restarted rsync, but Murphy made 
> it get killed again.
> 
> So I looked into dmesg, just to find this: It's the log of all messages, 
> so maybe twice the same, I copy everything for reference. See attachment 
> "xfs-bug.dmesg.txt".

The first occurrence is:

> Pid: 1809, comm: syslog-ng Not tainted 2.6.27.48-0.1-xen #1

That's an old kernel, and doesn't seem related to the rsync
triggered problem, even though it is the same oops signature.

> I started to look, and quickly found a funny problem: Once I mount that 
> partition, I cannot unmount it again:
> 
> # mount /disks/work/
> # umount /disks/work/
> umount: /disks/work: device is busy.
>         (In some cases useful info about processes that use
>          the device is found by lsof(8) or fuser(1))

Some other process has taken a reference to the fs, I'd say.
And if that process triggered an oops, then you'd see this.

> So I rebooted without mounting that partition, and 
> 
> # xfs_repair -n /dev/xvda2 [VERSION:3.1.2]
> xfs_repair: /lib64/libuuid.so.1: no version information available 
> (required by xfs_repair)                                                      
>                                                                               
>       
> Phase 1 - find and verify superblock...                                       
>                                                                               
>                                                                         
> Phase 2 - using internal log                                                  
>                                                                               
>                                                                         
>         - scan filesystem freespace and inode maps...                         
>                                                                               
>                                                                         
>         - found root inode chunk
> Phase 3 - for each AG...
>         - scan (but don't clear) agi unlinked lists...
>         - process known inodes and perform inode discovery...
>         - agno = 0
>         - agno = 1
> local inode 8636461 attr too small (size = 0, min size = 4)
> bad attribute fork in inode 8636461, would clear attr fork
> would have cleared inode 8636461

Corrupt attribute fork - matches with the oops signatures.  I'd
definitely consider upgrading your kernel as a first step...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>