xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 742] Kernel Oops caused by attempting to mount XFS fi

To: xfs-master@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 742] Kernel Oops caused by attempting to mount XFS filesystem on stopped md RAID0 device.
From: bugzilla-daemon@xxxxxxxxxxx
Date: Sun, 15 Apr 2007 17:55:53 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://oss.sgi.com/bugzilla/show_bug.cgi?id=742





------- Additional Comments From dgc@xxxxxxx  2007-04-15 17:55 CST -------
Unstarted md gives:     
     
# mount -t xfs /dev/md1 /data/test     
mount: /dev/md1: can't read superblock     
#     
     
Stopped md gives:    
Apr  5 09:50:23 TPC-DAL-SUSE2 kernel: XFS: osyncisdsync is now the default,    
option is deprecated.    
Apr  5 09:50:23 TPC-DAL-SUSE2 kernel: XFS: SB read failed  
Apr  5 09:50:23 TPC-DAL-SUSE2 kernel: Unable to handle kernel NULL pointer  
dereference at 0000000000000008 RIP:  
Apr  5 09:50:23 TPC-DAL-SUSE2 kernel:<ffffffff8840b333>{:raid0:raid0_unplug+17} 
 
  
These are different errors. The unstarted md error comes from the *mount*    
process, not the kernel trying to mount the filesystem. i.e. mount aborts    
before calling the mount syscall because it can't read the md device.    
    
The stopped md passes this test in the mount process, an makes the mount    
syscall. md is obviously leaving /dev/mdX lying around after it was stopped  
in a state where certain things can be done on it but others will fail  
badly.  
  
The first failure XFS sees is when it tries to read the superblock via  
xfs_readsb(), and that's where the error in the log comes from. Xfs then  
enters the mount failure error handling path where it invalidates the  
block devices and then returns the error.  
    
The system is then oopsing when unpluging the underlying block device   
whilst invalidating the (just allocated) data device before returning   
the read error. Basically, we oops trying to unplug the block device.   
(xfs_flush_buftarg() calls blk_run_address_space() on the block device   
mapping). The other filesystems don't do this unplug, which is why they 
are not oopsing the machine. 
   
So, yes, I'd agree that this is an MD bug as it is leaving enough stubs   
around for the block device to be opened successfully but does not provide 
enough stubs to error out all types of operations, hence some lead 
to panics. 
 
I'll attach a hack to XFS to only do the unplug if we flushed something 
to disk. That should WAR the problem you are seeing, but it doesn't 
prevent the problem if we really had to flush a buffer out. IOWs, the 
MD driver really needs to be fixed.... 
    
Cheers, 
 
Dave. 
     
     
     

-- 
Configure bugmail: http://oss.sgi.com/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>