Preallocation with direct IO?
Amit Sahrawat
amit.sahrawat83 at gmail.com
Sat Dec 31 06:46:00 CST 2011
On Sat, Dec 31, 2011 at 2:13 AM, Dave Chinner <david at fromorbit.com> wrote:
> On Fri, Dec 30, 2011 at 08:37:00AM +0530, Amit Sahrawat wrote:
>> On Fri, Dec 30, 2011 at 2:27 AM, Dave Chinner <david at fromorbit.com> wrote:
>> > On Thu, Dec 29, 2011 at 01:10:49PM +0000, amit.sahrawat83 at gmail.com wrote:
>> >> Hi, I am using a test setup which is doing write using multiple
>> >> threads using direct IO. The buffer size which is used to write is
>> >> 512KB. After continously running this for long duration - i
>> >> observe that number of extents in each file is getting
>> >> huge(2K..4K..). I observed that each extent is of 512KB(aligned to
>> >> write buffer size). I wish to have low number of extents(i.e,
>> >> reduce fragmentation)... In case of buffered IO- preallocation
>> >> works good alongwith the mount option 'allocsize'. Is there
>> >> anything which can be done for Direct IO? Please advice for
>> >> reducing fragmentation with direct IO.
>> >
>> > Direct IO does not do any implicit preallocation. The filesystem
>> > simply gets out of the way of direct IO as it is assumed you know
>> > what you are doing.
>> This is the supporting line I was looking for.
>> >
>> > i.e. you know how to use the fallocate() or ioctl(XFS_IOC_RESVSP64)
>> > calls to preallocate space or to set up extent size hints to use
>> > larger allocations than the IO being done during syscalls...
>> I tried to make use of preallocating space using
>> ioctl(XFS_IOC_RESVSP64) - but over time - this is also not working
>> well with the Direct I/O.
>
> Without knowing how you are using preallocation, I cannot comment on
> this. Can you describe how your application does IO (size,
> frequency, location in file, etc) and preallocation (same again), as
> well as xfs_bmap -vp <file> output of fragmented files? That way I
> have some idea of what your problem is and so might be able to
> suggest fixes...
Prealloction was done using - snippets like these:
fl.l_whence = SEEK_SET;
fl.l_start = 0;
fl.l_len = (long long) PREALLOC; /* 1GB */
printf ("Preallocating %lld MB\n", (fl.l_len / (1024 * 1024)));
err = ioctl (hFile, XFS_IOC_RESVSP64, &fl);
I verified the prealloc working by taking a look at the file size (ls
-l) disk usage using 'df -kh' and also taking a look at the file
extents using xfs_bmap
xfs_bmap shows the extent of the preallocated length.
i.e., preallocation was working as expected.
To share the test case, due to some reasons - I cannot share the exact
code - but the working is like this:
In the Test case - there are 5 threads
WRITE_SIZE - 512KB
TRUNCSIZE - 250MB
1st Thread - this is doing actual amongst all the threads
buffer = valloc(WRITE_SIZE);
fd= open64(file,O_CREAT|O_DIRECT|O_WRONLY|O_TRUNC)
Initial write to file data of 5GB using 512KB buffer size
for(i=0; i < WRITE_COUNT; i++)
{
write(fd, buffer,WRITE_SIZE);
}
fsync(fd)
while(1)
{
if(ncount++ < TRUNCSIZE)
{
write(fd,buffer,WRITE_SIZE);
}
else
{
close(fd)
open(fd, O_RDWR|O_CREAT)
gettimeofday() - Start Point
sync(fd); // At times this sync is taking time around 5sec even
though the test case is doing I/O using O_DIRECT
gettimeofday() - End Point
If(sync time greater than 2secs)
exit(0);
gettimeofday() - Start Point
ftruncate(fd,TRUNCSIZE);
gettimeofday() - End Point
if(truncate time greater than 2sec)
exit(0);
fsync(fd)
close(fd);
open64(file, O_WRONLY|O_APPEND|O_DIRECT);
ncount = 0;
}
fsync(fd);
}
2nd Thread - Writing to a file in while loop
while (1)
{
write(10 bytes)
fsync();
usleep(100 * 1000);
}
3rd Thread - Reading the file from 2nd Thread
while(1){
read(file, buffer,10);
lseek(file, 0,0);
usleep(10000);
}
4th thread - Just printing the the size information for the '2' files
which are written
5th thread - Also, reading the file from 2nd thread
>
>> Is there any call to set up extent size
>> also? please update I can try to make use of that also.
>
> `man xfsctl` and search for XFS_IOC_FSSETXATTR.
thanks Dave, this is exactly what was needed - this is working as of now.
But there continues to be a problem with the sync time. Even though
there is no dirty data - but still sync is taking time around 5sec(but
this is very rare - and observed very few times in overnight runnings)
So, also very difficult to debug what could be the issue and who could
be culprit. At one time - tried to check the trace during this sync
time issue - please find as given below:
(dump_backtrace+0x0/0x11c) from [<c0389520>] (dump_stack+0x20/0x24)
(dump_stack+0x0/0x24) from [<c0067b70>] (__schedule_bug+0x7c/0x8c)
(__schedule_bug+0x0/0x8c) from [<c0389bc0>] (schedule+0x88/0x5fc)
(schedule+0x0/0x5fc) from [<c020a0c8>] (_xfs_log_force+0x238/0x28c)
(_xfs_log_force+0x0/0x28c) from [<c020a320>] (xfs_log_force+0x20/0x40)
(xfs_log_force+0x0/0x40) from [<c02308c4>] (xfs_commit_dummy_trans+0xc8/0xd4)
(xfs_commit_dummy_trans+0x0/0xd4) from [<c0231468>] (xfs_quiesce_data+0x60/0x88)
(xfs_quiesce_data+0x0/0x88) from [<c022e080>] (xfs_fs_sync_fs+0x2c/0xe8)
(xfs_fs_sync_fs+0x0/0xe8) from [<c015cccc>] (__sync_filesystem+0x8c/0xa8)
(__sync_filesystem+0x0/0xa8) from [<c015cd1c>] (sync_one_sb+0x34/0x38)
(sync_one_sb+0x0/0x38) from [<c013b1f0>] (iterate_supers+0x7c/0xc0)
(iterate_supers+0x0/0xc0) from [<c015cbf4>] (sync_filesystems+0x28/0x34)
(sync_filesystems+0x0/0x34) from [<c015cd68>] (sys_sync+0x48/0x78)
(sys_sync+0x0/0x78) from [<c003b4c0>] (ret_fast_syscall+0x0/0x48)
In order to resolve this - applied the below patche:
xfs: dummy transactions should not dirty VFS state
http://git.kernel.org/?p=linux/kernel/git/stable/linux-stable.git;a=commitdiff;h=1a387d3be2b30c90f20d49a3497a8fc0693a9d18
But still continued to observe the sync timing issue.
One thing, do we need fsync() - when performing write using O_DIRECT?I
think 'no'
Also, should sync() be taking time when there is no 'dirty' data?
Please share your opinion.
Thanks & Regards,
Amit Sahrawat
>
> Cheers,
>
> Dave.
> --
> Dave Chinner
> david at fromorbit.com
More information about the xfs
mailing list