[Top] [All Lists]

Re: understanding speculative preallocation

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: understanding speculative preallocation
From: Jason Rosenberg <jbr@xxxxxxxxxxxx>
Date: Fri, 26 Jul 2013 13:40:16 -0400
Cc: xfs@xxxxxxxxxxx
Delivered-to: xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=squareup.com; s=google; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=xn3V8v95jmRLeaKZhGX0o/8OZCTBbLx/zUB2gfs05kU=; b=XKYTGjkCXlmVquHAwJiWlsE8KFT/GImsOdPczbatwM66z0fvli8qTegcm/CNIdcJjk Y/0CPxMJHXr4L6EUZCOY4hIwCC/NyZYgR4Cu3MEIbxFRL3TinE12V+pMUofr0e51oJPa kjvHnBYKJZ3JRmPYkV078QjMplRTt/e8EO4aI=
In-reply-to: <20130726115021.GO13468@dastard>
References: <1374823420041-35002.post@xxxxxxxxxxxxx> <20130726115021.GO13468@dastard>
Hi Dave,

Thanks for your responses.  I'm a bit confused, as I didn't see your responses on the actual forum (only in my email inbox).

Anyway, I'm surprised that you don't have some list or other way to correlate version history of XFS, with os release versions.  I'm guessing the version I have is not using the latest/greatest.  We actually have another system that uses an older version of the kernel (2.6.32-279), and it behaves differently (it still preallocates space beyond what will ever be used, but not by quite as much).  When we rolled out our newer machines to 2.6.32-358, we started seeing a marked increase in disk full problems.

If, say you tell me, the mainline xfs code has improved behavior, it would be nice to have a way to know which version of CentOS might include that?  Telling me to read source code across multiple kernel versions sounds like an interesting endeavor, but not something that is the most efficient use of my time, unless there truly is no one who can easily tell me anything about xfs version history.

Do you have any plans to have some sort of improved documentation story around this?  This speculative preallocation behavior is truly unexpected and not transparent to the user.   I can see that it's probably a great performance boost (especially for something like kafka), but kafka does have predictable log file rotation capped at fixed sizes, so it would be great if that could be factored in.

I suppose using the allocsize setting might work in the short term.  But I probably don't want to set allocsize to 1Gb, since that would mean every single file created would start with that size, is that right?  Does the allocsize setting basically work by always keeping the file size ahead of consumed space by the allocsize amount?



On Fri, Jul 26, 2013 at 7:50 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
On Fri, Jul 26, 2013 at 12:23:40AM -0700, jbr wrote:
> Hello,
> I'm looking for general documentation/help with the speculative
> preallocation feature in xfs.  So far, I haven't really been able to find
> any definitive, up to date documentation on it.

Read the code - it's documented in the comments. ;)

Or ask questions here, because the code changes and the only up to
date reference is the code and/or the developers that work on it...

> I'm wondering how I can find out definitively which version of xfs I am
> using, and what the preallocation scheme in use is.

Look at the kernel version, then look at the corresponding source

> We are running apache kafka on our servers, and kafka uses sequential io to
> write data log files.  Kafka uses, by default, a maximum log file size of
> 1Gb.  However, most of the log files end up being 2Gb, and thus the disk
> fills up twice as fast as it should.
> We are using xfs on CentOS 2.6.32-358.  Is there a way I can know which
> version of xfs is built into this version of the kernel?

The XFS code is part of the kernel, so look at the kernel source
code that CentOS ships.

> We are using xfs (mounted with no allocsize specified).  I've seen varying
> info suggesting this means it either defaults to an allocsize of 64K (which
> doesn't seem to match my observations), or that it will use dynamic
> preallocation.
> I've also seen hints (but no actual canonical documentation) suggesting that
> the dynamic preallocation works by progressively doubling the current file
> size (which does match my observations).

Well, it started off that way, but it has been refined since to
handle many different cases where this behaviour is sub-optimal.

> What I'm not clear on, is the scheduling for the preallocation. At what
> point does it decide to preallocate the next doubling of space.

Depends on the type of IO being done.

> Is it when
> the current preallocated space is used up,


> or does it happen when the
> current space is used up within some threshold.


> What I'd like to do, is
> keep the doubling behavior in tact, but have it capped so it never increases
> the file beyond 1Gb.  Is there a way to do that?


> Can I trick the
> preallocation to not do a final doubling, if I cap my kafka log files at
> say, 900Mb (or some percentage under 1Gb)?


> There are numerous references to an allocation schedule like this:
> freespace       max prealloc size
>   >5%             full extent (8GB)
>   4-5%             2GB (8GB >> 2)
>   3-4%             1GB (8GB >> 3)
>   2-3%           512MB (8GB >> 4)
>   1-2%           256MB (8GB >> 5)
>   <1%            128MB (8GB >> 6)
> I'm just not sure I understand what this is telling me.  It seems to tell me
> what the max prealloc size is, with being reduced if the disk is nearly
> full.

Yes, that's correct. Mainline also does this for quota exhaustion,

> But it doesn't tell me about the progressive doubling in
> preallocation (I assume up to a max of 8Gb).  Is any of this configurable?


> Can we specify a max prealloc size somewhere?

Use the allocsize mount option. It turns off dynamic behaviour and
fixes the pre-allocation size.

> The other issue seems to be that after the files are closed (from within the
> java jvm), they still don't seem to have their pre-allocated space
> reclaimed.  Are there known issues with closing the files in java not
> properly causing a flush of the preallocated space?

Possibly. There's a heuristic that turns of truncation at close - if
your applicatin keeps doing "open-write-close" it will not truncate
preallocation. Log files typically see this IO pattern from
applications, and hence triggering that "no truncate" heuristic is
exactly what you want to have happen to avoid severe fragmentation
of the log files.

> Any help pointing me to any documentation/user guides which accurately
> describes this would be appreciated!

The mechanism is not documented outside the code as it changes from
kernel release to kernel release and supposed to be transparent to
userspace. It's being refined and optimisaed as issues are reported.
Indeed, I suspect that all your problems would disappear on mainline
due to the background removal of preallocation that is no longer
needed, and Centos doesn't have that...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>