I had posted this problem with data corruption with
nfs+xfs couple of months ago. I thought we have solved the
problem, blaming the gigE driver. However, it appears
that this is not the case. The code we are running had enough
bugs in it that it was difficult to isolate the problem (plus
one other unrelated hardware problem).
Problem: Files are corrupted when they are written under
load. Load is defined as 20+ clients writing to the same
filesystem, but different files, simultaneously. However,
with the testcase we are using now, we can see corruption
with as few as 4 clients writing, but it is more difficult
to generate errors this way.
The IO of the test code is relatively simple. It reads
in some data over NFS. Each case reads the exact same file(s). It then
massages the data and writes out a temporary file. Each file has
about 20 different records, and each record is written as 1 big
unformatted write in fortran. Each run creates 17 files. The
temporary files are written to the NFS filesystem. Each file is then
read back in, 1 record chuck at a time, massaged, then
written back out in the same manner as before, but to a different file.
Configuration:
Clients: Fedora Core 1 + Updates, e100 network driver
Server: Fedora Core 1 + Updates, kernel updated to 2.4.26,
- Broadcom (bcm5700, tg3) and Acenic gigE tested, both fail
- Filesystems are mounted over FC, qla2200 v6.01.10 driver used
- Filesystem is striped using LVM using multiple targets
to increase performance.
In the past other kernels for the clients and servers have been
tested and show the same problem.
Tests attempted had the server exporting the xfs filesystem over
nfs. Although other servers were not tried (to test for bad hardware)
this was done previously and the same results occurred. Tests were
run conducted with the clients mounting the Filesystems over UDP
and TCP. Both cases failed.
Tests run over an nfs filesystem exporting an ext2 filesystem
(LVM striped, FC disks, qla2200 driver) do not exhibit this problem.
Tests configured to write the temporary file and final output
file to local disc never show corruption. However, there isn't
much load on each system in this case. The only heavy load
is on the read of the initial data, but I never get any file
corruption due to these reads.
I looked at all of the bits just after reads and just before writes.
>From this I can conclude that corruption only occurs during writes, and
not reads.
I checked the source code and I could not find anywhere that
HAVE_REFCACHE was enabled. So functions like xfs_refcache_purge_some
should be no-ops.
When there is corruption of the final output files, it appears that
all corruption happens on 4KB boundaries. The first byte that
differs is always at POS%4096 = 0. The last byte corrupt is
always at (POS+1)%4096 = 0, or just at the end of the page.
The range of the corruption usually ranges multiple pages, but
it always starts and stops on page boundaries.
The corrupted data are non-zero, or at least not all zero.
They appear to be more or less random.
I will try any suggestions people might have. I will try and
reduce the size of the test, but running 16 cases of the test
requires over 200 GB.
Thanks,
Craig
|