>What kernel(s), exactly, is/are showing this problem?
Well, that part is a bit tricky. The base kernel is 2.6.21
but it has a lot of patches, including the one you mentioned.
(The customer is double checking to make sure they actually have
that patch in.)
>> We have a customer who is seeing data not "make it" to disk on a
>> stress test that involves doing an fsync() or fdatasync() and then
>> deliberately rebooting the machine (to simulate a failure; note
>> that the underlying RAID has its own battery backup and this is
>> just one of many different parts of the stress-test).
>
>What is the symptom? The file size does not change? The file the
>right size but has no data in it?
Their system has a large number of databases (on the order of 50)
all open simultaneously, and is using directIO (with a call to
fdatasync()) to make entries in many of them, and apparently *some*
of them get corrupted. Exactly how, I do not know: naturally, we
cannot reproduce this with our own system, and when they tried a
simplified system with just one database the problem went away on
their end too. (Agh.)
>No, the filemap_fdatawrite() has already been executed by this
>point [by do_fsync()].
D'oh! I somehow missed this in eyeballing the code paths.
>However, I do ask exactly what kernel version you are running ...
It is mostly 2.6.21. We brought in a large number of miscellaneous
XFS fixes, not including the ones that remove the "behavior" layer
stuff, but definitely including this one:
>http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;
>h=978b7237123d007b9fa983af6e0e2fa8f97f9934
(which of course necessitated a bit of hacking on the patches to
fit, as a lot of the later ones assume the bhv* layer has been
removed).
Chris
|