definitions for /proc/fs/xfs/stat
Mark Seger
mjseger at gmail.com
Sat Jun 15 11:22:35 CDT 2013
I was thinking a little color commentary might be helpful from a
perspective of what the functionally is that's driving the need for
fallocate. I think I mentioned somewhere in this thread that the
application is OpenStack Swift, which is a highly scalable cloud object
store. If you're not familiar with it, it doesn't do successive sequential
writes to a preallocated file but rather writes out a full object in one
shot. In other words, object = file. The whole purpose of preallocation,
at least my understanding of it, is to make sure there is enough room when
the time comes to write the actual object so if there isn't, a redundant
server elsewhere can do it instead. This then makes the notion of
speculative preallocation for future sequential writes moot, the ideal
being to only preallocate the object size with minimal extra I/O. Does
that help?
-mark
On Sat, Jun 15, 2013 at 6:35 AM, Mark Seger <mjseger at gmail.com> wrote:
> Basically everything do it with collectl, a tool I wrote and opensourced
> almost 10 years ago. it's numbers are very accurate - I've compared with
> iostat on numerous occasions whenever I might have had doubts and they
> always agree. Since both tools get their data from the same place,
> /proc/diskstats, it's hard for them not to agree AND its numbers also agree
> with /proc/fs/xfs.
>
> Here's an example of comparing the two on a short run, leaving off the -m
> since collectl reports its output in KB.
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
> avgqu-sz await r_await w_await svctm %util
> sdc 0.00 0.00 0.00 494.00 0.00 126464.00
> 512.00 0.11 0.22 0.00 0.22 0.22 11.00
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time Name KBytes Merged IOs Size KBytes Merged IOs Size
> RWSize QLen Wait SvcTim Util
> 10:18:32 sdc1 0 0 0 0 127488 0 498 256
> 256 1 0 0 7
> 10:18:33 sdc1 0 0 0 0 118784 0 464 256
> 256 1 0 0 4
>
> for grins I also ran a set of numbers at a monitoring interval of 0.2
> seconds just to see if they were steady and they are:
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time Name KBytes Merged IOs Size KBytes Merged IOs Size
> RWSize QLen Wait SvcTim Util
> 10:19:50.601 sdc1 0 0 0 0 768 0 3 256
> 256 0 0 0 0
> 10:19:50.801 sdc1 0 0 0 0 23296 0 91 256
> 256 1 0 0 19
> 10:19:51.001 sdc1 0 0 0 0 32256 0 126 256
> 256 1 0 0 14
> 10:19:51.201 sdc1 0 0 0 0 29696 0 116 256
> 256 1 0 0 19
> 10:19:51.401 sdc1 0 0 0 0 30464 0 119 256
> 256 1 0 0 4
> 10:19:51.601 sdc1 0 0 0 0 32768 0 128 256
> 256 1 0 0 14
>
> but back to the problem at hand and that's the question why is this
> happening?
>
> To restate what's going on, I have a very simple script that I'm
> duplicating what openstack swift is doing, namely to create a file with
> mkstmp and than running an falloc against it. The files are being created
> with a size of zero but it seems that xfs is generating a ton of logging
> activity. I had read your posted back in 2011 about speculative
> preallocation and can't help but wonder if that's what hitting me here. I
> also saw where system memory can come into play and this box has 192GB and
> 12 hyperthreaded cores.
>
> I also tried one more run without falloc, this is creating 10000 1K files,
> which should be about 10MB and it looks like it's still doing 140MB of I/O
> which still feels like a lot but at least it's less than the
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time Name KBytes Merged IOs Size KBytes Merged IOs Size
> RWSize QLen Wait SvcTim Util
> 10:29:20 sdc1 0 0 0 0 89608 0 351 255
> 255 1 0 0 11
> 10:29:21 sdc1 0 0 0 0 55296 0 216 256
> 256 1 0 0 5
>
> and to repeat the full run with falloc:
>
> # DISK STATISTICS (/sec)
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time Name KBytes Merged IOs Size KBytes Merged IOs Size
> RWSize QLen Wait SvcTim Util
> 10:30:50 sdc1 0 0 0 0 56064 0 219 256
> 256 1 0 0 2
> 10:30:51 sdc1 0 0 0 0 409720 148 1622 253
> 252 1 0 0 26
> 10:30:52 sdc1 0 0 0 0 453240 144 1796 252
> 252 1 0 0 36
> 10:30:53 sdc1 0 0 0 0 441768 298 1800 245
> 245 1 0 0 37
> 10:30:54 sdc1 0 0 0 0 455576 144 1813 251
> 251 1 0 0 25
> 10:30:55 sdc1 0 0 0 0 453532 145 1805 251
> 251 1 0 0 35
> 10:30:56 sdc1 0 0 0 0 307352 145 1233 249
> 249 1 0 0 17
> 10:30:57 sdc1 0 0 0 0 0 0 0 0
> 0 0 0 0 0
>
> If there is anything more I can provide I'll be happy to do so. Actually
> I should point out I can easily generate graphs and if you'd like to see
> some examples I can provide those too. Also, if there is anything I can
> report from /proc/fs/xfs I can relatively easily do that as well and
> display it side by side with the disk I/O.
>
> -mark
>
>
> On Fri, Jun 14, 2013 at 10:04 PM, Dave Chinner <david at fromorbit.com>wrote:
>
>> On Fri, Jun 14, 2013 at 09:55:17PM -0400, Mark Seger wrote:
>> > I'm doing 1 second samples and the rates are very steady. The reason I
>> > ended up at this level of testing was I had done a sustained test for 2
>> > minutes at about 5MB/sec and was seeing over 500MB/sec going to the
>> disk,
>> > again sampling at 1-second intervals. I'd be happy to provide detailed
>> > output and can even sample more frequently if you like.
>>
>> Where are you getting your IO throughput numbers from?
>>
>> How do they compare to, say, the output of `iostat -d -x -m 1`?
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david at fromorbit.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130615/d24940da/attachment-0001.html>
More information about the xfs
mailing list