definitions for /proc/fs/xfs/stat

Mark Seger mjseger at gmail.com
Sat Jun 15 11:22:35 CDT 2013


I was thinking a little color commentary might be helpful from a
perspective of what the functionally is that's driving the need for
fallocate.  I think I mentioned somewhere in this thread that the
application is OpenStack Swift, which is  a highly scalable cloud object
store.  If you're not familiar with it, it doesn't do successive sequential
writes to a preallocated file but rather writes out a full object in one
shot.  In other words, object = file.  The whole purpose of preallocation,
at least my understanding of it, is to make sure there is enough room when
the time comes to write the actual object so if there isn't, a redundant
server elsewhere can do it instead.  This then makes the notion of
speculative preallocation for future sequential writes moot, the ideal
being to only preallocate the object size with minimal extra I/O.  Does
that help?
-mark



On Sat, Jun 15, 2013 at 6:35 AM, Mark Seger <mjseger at gmail.com> wrote:

> Basically everything do it with collectl, a tool I wrote and opensourced
> almost 10 years ago.  it's numbers are very accurate - I've compared with
> iostat on numerous occasions whenever I might have had doubts and they
> always agree.  Since both tools get their data from the same place,
> /proc/diskstats, it's hard for them not to agree AND its numbers also agree
> with /proc/fs/xfs.
>
> Here's an example of comparing the two on a short run, leaving off the -m
> since collectl reports its output in KB.
>
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz
> avgqu-sz   await r_await w_await  svctm  %util
> sdc               0.00     0.00    0.00  494.00     0.00 126464.00
> 512.00     0.11    0.22    0.00    0.22   0.22  11.00
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
>  RWSize  QLen  Wait SvcTim Util
> 10:18:32 sdc1             0      0    0    0  127488      0  498  256
> 256     1     0      0    7
> 10:18:33 sdc1             0      0    0    0  118784      0  464  256
> 256     1     0      0    4
>
> for grins I also ran a set of numbers at a monitoring interval of 0.2
> seconds just to see if they were steady and they are:
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time         Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
>  RWSize  QLen  Wait SvcTim Util
> 10:19:50.601 sdc1             0      0    0    0     768      0    3  256
>     256     0     0      0    0
> 10:19:50.801 sdc1             0      0    0    0   23296      0   91  256
>     256     1     0      0   19
> 10:19:51.001 sdc1             0      0    0    0   32256      0  126  256
>     256     1     0      0   14
> 10:19:51.201 sdc1             0      0    0    0   29696      0  116  256
>     256     1     0      0   19
> 10:19:51.401 sdc1             0      0    0    0   30464      0  119  256
>     256     1     0      0    4
> 10:19:51.601 sdc1             0      0    0    0   32768      0  128  256
>     256     1     0      0   14
>
> but back to the problem at hand and that's the question why is this
> happening?
>
> To restate what's going on, I have a very simple script that I'm
> duplicating what openstack swift is doing, namely to create a file with
> mkstmp and than running an falloc against it.  The files are being created
> with a size of zero but it seems that xfs is generating a ton of logging
> activity.  I had read your posted back in 2011 about speculative
> preallocation and can't help but wonder if that's what hitting me here.  I
> also saw where system memory can come into play and this box has 192GB and
> 12 hyperthreaded cores.
>
> I also tried one more run without falloc, this is creating 10000 1K files,
> which should be about 10MB and it looks like it's still doing 140MB of I/O
> which still feels like a lot but at least it's less than the
>
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
>  RWSize  QLen  Wait SvcTim Util
> 10:29:20 sdc1             0      0    0    0   89608      0  351  255
> 255     1     0      0   11
> 10:29:21 sdc1             0      0    0    0   55296      0  216  256
> 256     1     0      0    5
>
> and to repeat the full run with falloc:
>
> # DISK STATISTICS (/sec)
> #
> <---------reads---------><---------writes---------><--------averages-------->
> Pct
> #Time     Name       KBytes Merged  IOs Size  KBytes Merged  IOs Size
>  RWSize  QLen  Wait SvcTim Util
> 10:30:50 sdc1             0      0    0    0   56064      0  219  256
> 256     1     0      0    2
> 10:30:51 sdc1             0      0    0    0  409720    148 1622  253
> 252     1     0      0   26
> 10:30:52 sdc1             0      0    0    0  453240    144 1796  252
> 252     1     0      0   36
> 10:30:53 sdc1             0      0    0    0  441768    298 1800  245
> 245     1     0      0   37
> 10:30:54 sdc1             0      0    0    0  455576    144 1813  251
> 251     1     0      0   25
> 10:30:55 sdc1             0      0    0    0  453532    145 1805  251
> 251     1     0      0   35
> 10:30:56 sdc1             0      0    0    0  307352    145 1233  249
> 249     1     0      0   17
> 10:30:57 sdc1             0      0    0    0       0      0    0    0
>   0     0     0      0    0
>
> If there is anything more I can provide I'll be happy to do so.  Actually
> I should point out I can easily generate graphs and if you'd like to see
> some examples I can provide those too.  Also, if there is anything I can
> report from /proc/fs/xfs I can relatively easily do that as well and
> display it side by side with the disk I/O.
>
> -mark
>
>
> On Fri, Jun 14, 2013 at 10:04 PM, Dave Chinner <david at fromorbit.com>wrote:
>
>> On Fri, Jun 14, 2013 at 09:55:17PM -0400, Mark Seger wrote:
>> > I'm doing 1 second samples and the rates are very steady.  The reason I
>> > ended up at this level of testing was I had done a sustained test for 2
>> > minutes at about 5MB/sec and was seeing over 500MB/sec going to the
>> disk,
>> > again sampling at 1-second intervals.  I'd be happy to provide detailed
>> > output and can even sample more frequently if you like.
>>
>> Where are you getting your IO throughput numbers from?
>>
>> How do they compare to, say, the output of `iostat -d -x -m 1`?
>>
>> Cheers,
>>
>> Dave.
>> --
>> Dave Chinner
>> david at fromorbit.com
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20130615/d24940da/attachment-0001.html>


More information about the xfs mailing list