| To: | Stewart Smith <stewart@xxxxxxxxx> |
|---|---|
| Subject: | Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads |
| From: | Sam Vaughan <sjv@xxxxxxx> |
| Date: | Mon, 13 Nov 2006 15:53:54 +1100 |
| Cc: | xfs@xxxxxxxxxxx |
| In-reply-to: | <1163390942.14517.12.camel@localhost.localdomain> |
| References: | <1163381602.11914.10.camel@localhost.localdomain> <965ECEF2-971D-46A1-B3F2-C6C1860C9ED8@sgi.com> <1163390942.14517.12.camel@localhost.localdomain> |
| Sender: | xfs-bounce@xxxxxxxxxxx |
On 13/11/2006, at 3:09 PM, Stewart Smith wrote: On Mon, 2006-11-13 at 13:58 +1100, Sam Vaughan wrote:Are the two processes in your test writing files to the same directory as each other? If so then their allocations will go into the same AG as the directory by default, hence the fragmentation. If you can limit yourself to an AG's worth of data per directory then you should be able to avoid fragmentation using the default allocator. If you need to reserve more than that per AG, then the files will most likely start interleaving again once they spill out of their original AGs. If that's the case then the upcoming filestreams allocator may be your best bet. Just to be clear, are we talking about intra-file fragmentation, i.e. file data laid out discontiguously on disk, or inter-file fragmentation where each file is continguous on disk but the files from different processes are getting interleaved? Also, are there just a couple of user data files, each of them potentially much larger than the size of an AG, or do you split the data up into many files, e.g. datafile01.dat ... datafile99.dat ...? If you have the flexibility to break the data up at arbitrary points into separate files, you could get optimal allocation behaviour by starting a new directory as soon as the files in the current one are large enough to fill an AG. The problem with the filestreams allocator is that it will only dedicate an AG to a directory for a fixed and short period of time after the last file was written to it. This works well to limit the resource drain on AGs when running file-per-frame video captures, but not so well with a database that writes its data in a far less regimented and timely way. The following two tests illustrate the standard allocation policy I'm referring to here. I've simplified it to take advantage of the fact that it's producing just one extent per file, but you can run `xfs_bmap -v` over all the files to verify that's the case. Standard SLES 10 kernel, standard mount options: $ uname -r 2.6.16.21-0.8-smp $ xfs_info . meta-data=/dev/sdb8 isize=256 agcount=16, agsize=3267720 blks = sectsz=512 attr=0 data = bsize=4096 blocks=52283520, imaxpct=25 = sunit=0 swidth=0 blks, unwritten=1 naming =version 2 bsize=4096 log =internal bsize=4096 blocks=25529, version=1 = sectsz=512 sunit=0 blks realtime =none extsz=65536 blocks=0, rtextents=0 $ mount | grep sdb8 /dev/sdb8 on /spare200 type xfs (rw) $ Create two directories and start two processes off, one per directory. The processes preallocate ten 100MB files each. The result is that their data goes into separate AGs on disk, all nicely contiguous: $ mkdir a b
$ for dir in a b; do
> for file in `seq 0 9`; do
> touch $dir/$file
> xfs_io -c 'allocsp 100m 0' $dir/$file
> done &
> done; wait
[1] 5649
[2] 5650
$ for file in `seq 0 9`; do
> bmap_a=`xfs_bmap -v a/$file | tail -1`
> bmap_b=`xfs_bmap -v b/$file | tail -1`
> ag_a=`echo $bmap_a | awk '{print $4}'`
> ag_b=`echo $bmap_b | awk '{print $4}'`
> br_a=`echo $bmap_a | awk 'printf "%-18s", $3}'`
> br_b=`echo $bmap_b | awk 'printf "%-18s", $3}'`
> echo a/$file: $ag_a "$br_a" b/$file: $ag_b "$br_b"
> done
a/0: 8 209338416..209543215 b/0: 9 235275936..235480735
a/1: 8 209543216..209748015 b/1: 9 235480736..235685535
a/2: 8 209748016..209952815 b/2: 9 235685536..235890335
a/3: 8 209952816..210157615 b/3: 9 235890336..236095135
a/4: 8 210157616..210362415 b/4: 9 236095136..236299935
a/5: 8 210362416..210567215 b/5: 9 236299936..236504735
a/6: 8 210567216..210772015 b/6: 9 236504736..236709535
a/7: 8 210772016..210976815 b/7: 9 236709536..236914335
a/8: 8 210976816..211181615 b/8: 9 236914336..237119135
a/9: 8 211181616..211386415 b/9: 9 237119136..237323935
$Now do the same thing, except have the processes write their files into the same directory using different file names. This time the files are allocated on top of each other. $ dir=c
$ mkdir $dir
$ for process in 1 2; do
> for file in `seq 0 9`; do
> touch $dir/$process.$file
> xfs_io -c 'allocsp 100m 0' $dir/$process.$file
> done &
> done; wait
[1] 5985
[2] 5986
$ for file in c/*; do
> bmap=`xfs_bmap -v $file | tail -1`
> ag=`echo $bmap | awk '{print $4}'`
> br=`echo $bmap | awk '{printf "%-18s", $3}'`
> echo $file: $ag "$br"
> done
c/1.0: 11 287559456..287764255
c/1.1: 11 287969056..288173855
c/1.2: 11 288378656..288583455
c/1.3: 11 288788256..288993055
c/1.4: 11 289197856..289402655
c/1.5: 11 289607456..289812255
c/1.6: 11 290017056..290221855
c/1.7: 11 290426656..290631455
c/1.8: 11 290836264..291041063
c/1.9: 11 291450664..291655463
c/2.0: 11 287764256..287969055
c/2.1: 11 288173856..288378655
c/2.2: 11 288583456..288788255
c/2.3: 11 288993056..289197855
c/2.4: 11 289402656..289607455
c/2.5: 11 289812256..290017055
c/2.6: 11 290221856..290426655
c/2.7: 11 290631464..290836263
c/2.8: 11 291041064..291245863
c/2.9: 11 291245864..291450663
$Now in your case you're using different directories, so your files are probably OK at the start of day. Once the AGs they start in fill up though, the files for both processes will start getting allocated from the next available AG. At that point, allocations that started out looking like the first test above will end up looking like the second. The filestreams allocator will stop this from happening for applications that write data regularly like video ingest servers, but I wouldn't expect it to be a cure-all for your database app because your writes could have large delays between them. Instead, I'd look into ways to break up your data into AG-sized chunks, starting a new directory every time you go over that magic size. Sam |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads, Stewart Smith |
|---|---|
| Next by Date: | Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads, Stewart Smith |
| Previous by Thread: | Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads, Stewart Smith |
| Next by Thread: | Re: XFS_IOC_RESVSP64 versus XFS_IOC_ALLOCSP64 with multiple threads, Stewart Smith |
| Indexes: | [Date] [Thread] [Top] [All Lists] |