[Top] [All Lists]

Re: spurious -ENOSPC on XFS

To: Lachlan McIlroy <lachlan@xxxxxxx>, Christoph Hellwig <hch@xxxxxxxxxxxxx>, Mikulas Patocka <mpatocka@xxxxxxxxxx>, linux-kernel@xxxxxxxxxxxxxxx, xfs@xxxxxxxxxxx
Subject: Re: spurious -ENOSPC on XFS
From: Lachlan McIlroy <lachlan@xxxxxxx>
Date: Thu, 15 Jan 2009 11:57:08 +1100
In-reply-to: <20090114221655.GX8071@disturbed>
Organization: SGI
References: <Pine.LNX.4.64.0901120509550.11089@xxxxxxxxxxxxxxxxxxxxxxxxxxx> <20090112151133.GA24852@xxxxxxxxxxxxx> <496C2D69.2010301@xxxxxxx> <20090114221655.GX8071@disturbed>
Reply-to: lachlan@xxxxxxx
User-agent: Thunderbird (X11/20081209)
Dave Chinner wrote:
On Tue, Jan 13, 2009 at 04:58:01PM +1100, Lachlan McIlroy wrote:
Christoph Hellwig wrote:
On Mon, Jan 12, 2009 at 06:14:36AM -0500, Mikulas Patocka wrote:

I discovered a bug in XFS in delayed allocation.

When you take a small partition (52MB in my case) and copy many small files on it (source code) that barely fits there, you get -ENOSPC. Then sync the partition, some free space pops up, click "retry" in MC an the copy continues. They you get again -ENOSPC, you must sync, click "retry" and go on. And so on few times until the source code finally fits on the XFS partition.

This misbehavior is apparently caused by delayed allocation, delayed allocation does not exactly know how much space will be occupied by data, so it makes some upper bound guess. Because free space count is only a guess, not the actual data being consumed, XFS should not return -ENOSPC on behalf of it. When the free space overflows, XFS should sync itself, retry allocation and only return -ENOSPC if it fails the second time, after the sync.
This sounds like a problem with speculative allocation - delayed allocations
beyond eof.  Even if we write a small file, say 4k, a 64k chunk of delayed
allocation will be credited to the file.

The second retry occurs without speculative EOF allocation. That's
what the BMAPI_SYNC flag does....
By then it's too late.  There could already be many files with delayed
allocations beyond eof that are unneccesarily consuming space.  I suspect
that when those files are flushed some are not able to convert the full
delayed allocation in one extent and only convert what is needed to write
out data.  The remaining unused delayed allocation is released and that's
why the freespace is going up and down.

That being said, it can't truncate away pre-existing speculative
allocations on other files, which is why there is a global flush
and wait before the third retry.....



<Prev in Thread] Current Thread [Next in Thread>