|To:||Dave Chinner <david@xxxxxxxxxxxxx>|
|Subject:||Re: [PATCH] xfs: fix the xfs_iflush_done callback search|
|From:||Mark Tinguely <tinguely@xxxxxxx>|
|Date:||Thu, 02 Oct 2014 08:27:55 -0500|
|User-agent:||Mozilla/5.0 (X11; FreeBSD amd64; rv:9.0) Gecko/20120122 Thunderbird/9.0|
On 10/01/14 17:34, Dave Chinner wrote:
On Wed, Oct 01, 2014 at 04:18:02PM -0500, Mark Tinguely wrote:Commit "xfs: remove all the inodes on a buffer from the AIL in bulk" made the xfs inode flush callback more efficient by combining all the inode writes on the buffer and the deletions of the inode log item from AIL. The initial loop in this patch should be looping through all the log items on the buffer to see which items have xfs_iflush_done as their callback function. But currently, only the log item passed to the function has its callback compared to xfs_iflush_done. If the log item pointer passed to the function does have the xfs_iflush_done callback function, then all the log items on the buffer are removed from the li_bio_list on the buffer b_fspriv and could be removed from the AIL eventhough they may have not been written yet.Looks like a bug, but what I don't know from this description is the symptoms and impact of this bug being hit? Is there a risk of filesystem corruption on crash or power loss? Perhaps it's a data loss issue? I can't tell, and for anyone scanning the commit logs to determine if they need to backport the fix will be asking the same questions. Also, is there a reproducable test case for it?
I was looking in this code for a way an inode could be removed from the AIL but not written to disk.
I have a metadata dump that shows a truncate on two inodes just before a clean unmount. The free space btrees are updated but neither inode has been updated. A clean shutdown will wait until the AIL is empty, so something removed the inode from the AIL but did not write the latest changes to disk. Yes, there were earlier changes to inode/chunk in previous pushes. This results in the blocks being both free and allocated....if we are lucky, we get a XFS_WANT_CORRUPTED_GOTO forced shutdowns because of partial frees in xfs_free_ag_extent(). Most of the time it is reallocated and that leads to all kinds of data/metadata corruption.
I tried to recreate the problem with truncates on files while unmounting but have not gotten the correct combination. I have been chasing the ghost of the duplicate allocation corruption problem for months and I did not want to over state my beliefs until I could replicate the problem.
I would advise an xfs_repair in addition to the patch, the corruption could have happened a long time ago and could be waiting to trigger.
|<Prev in Thread]||Current Thread||[Next in Thread>|
|Previous by Date:||[PATCH] xfs: fix zero range i_size problems, Brian Foster|
|Next by Date:||Re: XFS issue xfs goes offline with various messages drive not recoverable without reboot, Eric Sandeen|
|Previous by Thread:||Re: [PATCH] xfs: fix the xfs_iflush_done callback search, Dave Chinner|
|Next by Thread:||Re: [PATCH] xfs: fix the xfs_iflush_done callback search, Dave Chinner|
|Indexes:||[Date] [Thread] [Top] [All Lists]|