[Top] [All Lists]

Re: concurrent direct IO write in xfs

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: concurrent direct IO write in xfs
From: Zheng Da <zhengda1936@xxxxxxxxx>
Date: Mon, 23 Jan 2012 14:34:34 -0500
Cc: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=cmknQ0G8nYG21L1NePPEVi20ifWhlcGDB0QT79V3UhQ=; b=uHz8iN0c7D+i6iY6uXvkEO0Q0M0fLhx6dVw5I1MT2IBUsgU0gSqpqKAkJvAnsoAlNa PcXvukhDdFZdIY31X0Ngz5eChKVjUT1FEiI8jqZzz8wBYMFij+c/oQWPGoZRvMqgOeEI vbofaqkHEckpu3+eSMqQkGx7VraZEF7ud72gU=
In-reply-to: <20120123051155.GI15102@dastard>
References: <CAFLer83FBZG9ZCrT2jUZBcTC2a2tx_CDmykyPF4cTP0dbHGw7Q@xxxxxxxxxxxxxx> <20120116232549.GC6922@dastard> <CAFLer81XkMTh_gxd95pzxCEs1yGRsTrZijX3c7ewgRzeA7DCSQ@xxxxxxxxxxxxxx> <20120123051155.GI15102@dastard>

On Mon, Jan 23, 2012 at 12:11 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> >
> This is weird. Yes, I'm sure. I use pwrite() to write data to a 4G file,
> and I check the offset of each write and they are always smaller than 4G.
> I instrument the code with systemtap and it shows me that ip->i_new_size
> and new_size in xfs_aio_write_newsize_update are both 0.
> Since in my case there is only overwrite, ip->i_new_size will always be 0
> (the only place that updates ip->i_new_size is xfs_file_aio_write_checks).
> Because of the same reason, new_size returned by xfs_file_aio_write_checks
> is always 0.
> Is it what you expected?

No idea. I don't know what the problem you are seeing is yet, or if
indeed there even is a problem as I don't really understand what you
are trying to do or what results you are expecting to see...
Here I was just wondering if i_new_size is always 0 if there are only overwrites. I think it has nothing to do with the pattern of my workloads or the device I used for the test.

Indeed, have you run the test on something other than a RAM disk and
confirmed that the problem exists on a block device that has real IO
latency? If your IO takes close to zero time, then there isn't any
IO level concurrency you can extract from single file direct IO; it
will all just serialise on the extent tree lookups.
It's difficult to test the scalability problem in the traditional disks. They provide very low IOPS (IO per second). Even two SSDs can't provide enough IOPS. 
I don't think all direct IO will serialized on the extent tree lookups. Direct IO reads can parallelized pretty well and they also need extent tree lookups.

> > >  0xffffffff812829f4 : __xfs_get_blocks+0x94/0x4a0 [kernel]
> >
> > And for direct IO writes, this will be the block mapping lookup so
> > always hit.
> >
> >
> > What this says to me is that you are probably doing is lots of very
> > small concurrent write IOs, but I'm only guessing.  Can you provide
> > your test case and a description of your test hardware so we can try
> > to reproduce the problem?
> >
> I build XFS on the top of ramdisk. So yes, there is a lot of small
> concurrent writes in a second.
> I create a file of 4GB in XFS (the ramdisk has 5GB of space). My test
> program overwrites 4G of data to the file and each time writes a page of
> data randomly to the file. It's always overwriting, and no appending. The
> offset of each write is always aligned to the page size. There is no
> overlapping between writes.

Why are you using XFS for this? tmpfs was designed to do this sort
of stuff as efficiently as possible....
OK, I can try that. 

> So the test case is pretty simple and I think it's easy to reproduce it.
> It'll be great if you can try the test case.

Can you post your test code so I know what I test is exactly what
you are running?
I can do that. My test code gets very complicated now. I need to simplify it.


<Prev in Thread] Current Thread [Next in Thread>