xfs-masters
[Top] [All Lists]

[xfs-masters] [Bug 8414] soft lockup and filesystem corruption on XFS wr

To: xfs-masters@xxxxxxxxxxx
Subject: [xfs-masters] [Bug 8414] soft lockup and filesystem corruption on XFS write (by nfsd)
From: bugme-daemon@xxxxxxxxxxxxxxxxxxx
Date: Wed, 2 May 2007 06:50:26 -0700
Reply-to: xfs-masters@xxxxxxxxxxx
Sender: xfs-masters-bounce@xxxxxxxxxxx
http://bugzilla.kernel.org/show_bug.cgi?id=8414





------- Additional Comments From dgc@xxxxxxx  2007-05-02 06:50 -------
The original problem - the soft lockup - is probably a result      
of a massively fragmented file as we are searching the extent      
list when the soft-lockup detector fired. The soft lockup      
detector is not indicative of an actual bug being present, though.     
     
The second problem you report (access to block zero) indicates     
something did go wrong and there is on-disk corruption of an   
extent tree - there are zeros instead of real block numbers.  
ou need to run repair to fix this. 
 
> attempt to access beyond end of device  
> dm-6: rw=1, want=0, limit=2097152000  
> Buffer I/O error on device dm-6, logical block 18446744073709551615  
> lost page write due to I/O error on dm-6  
  
Shows an attempt to write to a block marked as either a hole or delayed  
allocate. You're not having memory errors are you?    
     
Now, the mount error:   
   
    982         error = xfs_iget(mp, NULL, sbp->sb_rootino, 0, XFS_ILOCK_EXCL,  
 
&rip, 0);   
    983         if (error) {   
    984                 cmn_err(CE_WARN, "XFS: failed to read root inode");   
    985                 goto error3;   
    986         }   
   
Means teh root inode in the superblock is wrong. Something went really 
wrong here. Can you run: 
 
# xfs_db -r -c "sb 0" -c p <dev>  
 
and 
 
# dd if=<dev> bs=512 count=1 | od -Ax 
 
And attach the output so we can see how badly corrupted the superblock 
is? 
 
And finally, repair. What problem did you encounter? Did you just end 
up with everything in lost+found? 
 
Hmmm - this reminds me of problems seen when the filesystem wraps the 
device at 2TB. How big is the LVM volume this partition is on and where is 
it placed? Along the same trainof thought, I'm wondering 
if the device wasn't reconstructed correctly with the new kernel and that 
led to the problem. Were the any other error messages or warnings in the 
syslog from the boot when the problem first happened? 
 
 
      

------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


<Prev in Thread] Current Thread [Next in Thread>