See
http://marc.theaimsgroup.com/?t=111642565400004&r=1&w=2
for original report.
Summary
-------
When reading a large (>4Gb) file from an XFS realtime partition which has
been opened with the O_DIRECT flag, it has been observed that reads above
the 4Gb point sometimes return data from the wrong location in the file.
This problem only occurs when the file has been written to the disk under a
particular set of conditions which cause XFS to write it in a sequence of
contiguous filesystem extents. These conditions are:
- The data is being written to a realtime subvolume.
- No other processes are writing data to the disk concurrently.
- The file was opened with the O_DIRECT flag.
- There is a contiguous block of >4Gb of available filesystem extents
(For example, on a freshly formatted XFS partition).
- The XFS filesystem has been created with a blocksize of 4kb or
greater.
Symptoms
--------
Reads from a file with offsets larger than 4Gb point up to 8Gb wrap around
to the beginning of the file, reads from the 12Gb point until the 16Gb point
wrap around to the 12Gb point etc... These symptoms have been reproduced on
Linux 2.4.29 compiled for MIPS and on Linux 2.4.29 compiled for x86.
Detail
------
The maximum number of blocks that an XFS extent record may refer to is
MAXEXTLEN, which is defined as 0x1FFFFF blocks. The maximum size of data
that this can be is MAXEXTLEN * blocksize (which mkfs.xfs sets as 4kb by
default). This default gives a total maximum size of 0x1FFFFFFF0 for each
extent record.
When a request is made to read from an offset in a file, XFS translates this
offset into an extent record and a delta (offset within the extent) (see
xfs_bmap_do_search_extents in xfs_bmap.c, and xfs_imap_to_bmap in
xfs_iomap.c). The delta is stored inside xfs_iomap_t (see xfs_iomap.h) and
is 32 bits. When the blocksize is set to 4kb, this delta value is no longer
large enough to contain the offset inside the extent and wraps back down to
zero. This produces the anomalous read results.
This problem also does not occur on XFS data partitions because they make
use of allocation groups, across which extents records may not span.
Allocation groups have a maximum size of just under 4Gb.
A patch is attached which resolves this problem by increasing the storage
size of iomap_delta to loff_t. Please also see the comment attached to
revision 1.2 of xfs_iomap.h in the SGI CVS at
http://oss.sgi.com/cgi-bin/cvsweb.cgi/linux-2.4-xfs/fs/xfs/xfs_iomap.h
I believe that iomap_delta should probably have been increased at the same
time that the iomap_bsize was.
Thanks,
--
Chris Elston
Katalix Systems Ltd
http://www.katalix.com
Catalysts for your Embedded Linux software development
begin 666 XFS_RT_4GB.patch
M+2TM(%1$0S<W-2TR+C0N,CDO9G,O>&9S+WAF<U]I;VUA<"YH(" @(#(P,#4M
M,3 M,S$@,34Z,S<Z-30N,# P,# P,# P("LP,# P"BLK*R!0051#2$5$+V9S
M+WAF<R]X9G-?:6]M87 N:" @,C P-2TQ,"TS,2 Q-3HQ.#HR.2XP,# P,# P
M,# @*S P,# *0$ @+3@V+#<@*S@V+#<@0$ *(" @(" @("!X9G-?8G5F=&%R
M9U]T(" @(" @(" @(" J:6]M87!?=&%R9V5T.PH@(" @(" @(&QO9F9?=" @
M(" @(" @(" @(" @(" @(&EO;6%P7V]F9G-E=#L@(" O*B!O9F9S970@;V8@
M;6%P<&EN9RP@8GET97,@*B\*(" @(" @("!L;V9F7W0@(" @(" @(" @(" @
M(" @("!I;VUA<%]B<VEZ93L@(" @+RH@<VEZ92!O9B!M87!P:6YG+"!B>71E
M<R J+PHM(" @(" @('-I>F5?=" @(" @(" @(" @(" @(" @(&EO;6%P7V1E
M;'1A.R @(" O*B!O9F9S970@:6YT;R!M87!P:6YG+"!B>71E<R J+PHK(" @
M(" @(&QO9F9?=" @(" @(" @(" @(" @(" @(&EO;6%P7V1E;'1A.R @(" O
M*B!O9F9S970@:6YT;R!M87!P:6YG+"!B>71E<R J+PH@(" @(" @(&EO;6%P
M7V9L86=S7W0@(" @(" @(" @(&EO;6%P7V9L86=S.PH@?2!X9G-?:6]M87!?
$=#L*"@``
`
end
|