XFS peculiar behavior
klonatos at ics.forth.gr
Thu Jun 24 10:35:42 CDT 2010
στις 6/24/2010 6:21 PM, O/H Eric Sandeen έγραψε:
> On 06/24/2010 09:11 AM, Yannis Klonatos wrote:
>> Hello again,
>> First of all, thank you all for your quick replies. I attach
>> all the information you requested in your responses.
>> 1) The output of xfs_info is the following:
>> meta-data=/dev/sdf isize=256 agcount=32, agsize=45776328 blks
>> = sectsz=512 attr=0
>> data = bsize=4096 blocks=1464842496, imaxpct=25
>> = sunit=0 swidth=0 blks, unwritten=1
>> naming =version 2 bsize=4096
>> log =internal bsize=4096 blocks=32768, version=1
>> = sectsz=512 sunit=0 blks, lazy-count=0
>> realtime =none extsz=4096 blocks=0, rtextents=0
>> 2) The output of xfs_bmap in the lineitem.MYI table of the TPC-H
>> workload is at one run:
>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET TOTAL
>> 0: [0..6344271]: 11352529416..11358873687 31 (72..6344343) 6344272
>> 1: [6344272..10901343]: 1464842608..1469399679 4 (112..4557183) 4557072
>> 2: [10901344..18439199]: 1831053200..1838591055 5 (80..7537935) 7537856
>> 3: [18439200..25311519]: 2197263840..2204136159 6 (96..6872415) 6872320
>> 4: [25311520..26660095]: 2563474464..2564823039 7 (96..1348671) 1348576
>> Given that all disk blocks are in units of 512-byte blocks, if I
>> interpret the output
>> correctly the first file is at block 1465352792 = 698.4GByte offset and
>> the last block
>> is at 5421.1GByte offset, meaning that this specific table is split over
>> a 4,7TByte distance.
> The file started out in the last AG, and then had to wrap around,
> because it hit the end of the filesystem. :) It was then somewhat
> sequential in AGs 4,5,6,7 after that, though not perfectly so.
> This run was with a clean filesystem? Was the mountpoint
> /mnt/test? XFS distributes new directories into new AGs (allocation
> groups, or disk regions) for parallelism, and then files in those dirs
> start populating the same AG. So if /mnt/test/mysql/tpch ended up in
> the last AG (#31) then the file likely started there, too.
Ok. Your argument makes a lot of sense. However, this is a clean file
system (mount point /mnt/test), and
I am certain that the files copied before the aforementioned index file
(lineitem.MYI) require 28GByte space
in total. So, this still raises the question why XFS splitted these
files in a way that caused the whole file system
space to be "covered", and the lineitem file to be placed starting at
the end of the FS (as you mentioned).
Also, based on my little XFS knowledge and background, i seriously doubt
that parallelism along AGs
is an issue here, since the copy utility copies files sequentially, so a
new AG would be allocated for the
/mnt/test/mysql/tpch directory, and would be populated completely with
all its files, before another AG
was created. This is true of course, only if your observation holds.
> Also, the "inode32" allocator biases data towards the end of the
> filesystem, because inode numbers in xfs reflect their on-disk location,
> and to keep inodes numbers below 2^32, it must save space in the lower
> portions of the filesystem. You might want to re-test with a fresh
> filesystem mounted with the "inode64" mount option.
>> However, in another run (with a clean file system again)
>> EXT: FILE-OFFSET BLOCK-RANGE AG AG-OFFSET
>> 0: [0..26660095]: 11352529416..11379189511 31 (72..26660167)
>> 3) For the copy, as i mentioned in my previous mail, i copied the
>> database over nfs using the cp -R linux program.
>> Thus, i believe all the files are copied sequentially, the one after the
>> other, with no other concurrent write operations
>> running at the background. The file-system was pristine before the cp
>> with no files, and just the mount directory was
>> created (all the other necessary files and directories are created from
>> the cp program).
> IIRC, copies over NFS can affect xfs allocator performance, because
> (IIRC) it tends to close the filehandle periodically and xfs loses the
> allocator context. We used to have a filehandle cache which held them
> open, but that went away some time ago.
> Dave will probably correct significant swaths of this information for
> me, though ;)
>> 4) The version of xfsprogs is 2.9.4 (acquired with xfs_info -v) and the
>> version of the kernel is 2.6.18-164.11.1.el5.
> Ah! A Red Hat kernel; have you asked your Red Hat support folks for
> help on this issue?
I suppose that they will redirect me back to you, won't they? :-)
>> If you require any further information let me know. Let me
>> state that i can also provide you with the complete
>> data-set if you feel it necessary trying to reproduce the issue.
>> Yannis Klonatos
>>>> Hi all!
>>>> I have come across the following peculiar behavior in XFS
>>>> and i would appreciate any information anyone
>>>> could provide.
>>>> In our lab we have a system that has twelve 500GByte hard
>>>> disks (total capacity 6TByte), connected to an
>>>> Areca (ARC-1680D-IX-12) SAS storage controller. The disks are
>>>> configured as a RAID-0 device. Then I create
>>>> a clean XFS filesystem on top of the raid volume, using the whole
>>>> capacity. We use this test-setup to measure
>>>> performance improvement for a TPC-H experiment. We copy the database
>>>> over the clean XFS filesystem using the
>>>> cp utility. The database used in our experiments is 56GBytes in size
>>>> (data + indices).
>>>> The problem is that i have noticed that XFS may - not all
>>>> times - split a table over a large disk distance. For
>>>> example in one run i have noticed that a file of 13GByte is split
>>>> over a 4,7TByte distance (I calculate this distance
>>>> by subtracting the final block used for the file with the first one.
>>>> The two disk blocks values are acquired using the
>>>> FIBMAP ioctl).
>>>> Is there some reasoning behind this (peculiar) behavior? I
>>>> would expect that since the underlying storage is so
>>>> large, and the dataset is so small, XFS would try to minimize disk
>>>> seeks and thus place the file sequentially in disk.
>>>> Furthermore, I understand that there may be some blocks left unused
>>>> by XFS between subsequent file blocks used
>>>> in order to handle any write appends that may come afterward. But i
>>>> wouldn't expect such a large splitting of a single
>>>> Any help?
>> xfs mailing list
>> xfs at oss.sgi.com
More information about the xfs