Thomas J. Teixeira wrote:
On Fri, 2005-04-15 at 10:56, Thomas J. Teixeira wrote:
From the symptoms we are seeing -- no corruption as long as the file
system stays mounted -- I would expect to find some data buffer in a
strange state -- something like dirty but locked in a way that
prevents it from being written, but keeps it around so reads will find
the data. If XFS were simply dropping a dirty bit, I would expect to
see corruption even while the file system stays continuously mounted.
Okay, I've been reading through more of the code and am very suspicious
of this code in xfs_unmountfs:
/*
* Flush out the log synchronously so that we know for sure
* that nothing is pinned. This is important because bflush()
* will skip pinned buffers.
*/
xfs_log_force(mp, (xfs_lsn_t)0, XFS_LOG_FORCE | XFS_LOG_SYNC);
xfs_binval(mp->m_ddev_targp);
xfs_binval seems to be the same as bflush(), and actually invokes
xfs_flush_buftarg which helpfully returns a count of how many buffers it
found pinned. It seems as though unmount should at least make sure
xfs_binval returned 0, although I haven't read enough code to have a
clue about what it should do if it isn't zero: if nothing else, print a
message and maybe try xfs_log_force yet again. This is where I need to
read a bunch more code, but at first guess if a pagebuf got pinned early
in its lifetime, and the pagebuf pin count was never decremented
properly, wouldn't this result in an always-cached-but-never-written
page?
For what it's worth, we are running on a Pentium 4 with hyper-threading
enabled.
The code is very funky here.
A metadata structure gets pinned when a transaction is copied into the
in core log buffers. It gets unpinned when the incore log buffer's
write completes. The xfs_log_force call with LOG_SYNC set it intended
to wait for that to happen. We are in unmount here, so no new
transactions are possible.
Now, if there are indeed still pinned items after the log_force call
then yes, things are broken, and it would be educational to learn
that.
Steve
|