| To: | Trond Myklebust <trond.myklebust@xxxxxxxxxx> |
|---|---|
| Subject: | Re: several messages |
| From: | Stephane Doyon <sdoyon@xxxxxxxxx> |
| Date: | Thu, 5 Oct 2006 11:39:45 -0400 (EDT) |
| Cc: | David Chinner <dgc@xxxxxxx>, xfs@xxxxxxxxxxx, nfs@xxxxxxxxxxxxxxxxxxxxx, Shailendra Tripathi <stripathi@xxxxxxxxx> |
| In-reply-to: | <1159893642.5592.12.camel@lade.trondhjem.org> |
| References: | <Pine.LNX.4.64.0609191533240.25914@madrid.max-t.internal> <451A618B.5080901@agami.com> <Pine.LNX.4.64.0610020939450.5072@madrid.max-t.internal> <20061002223056.GN4695059@melbourne.sgi.com> <Pine.LNX.4.64.0610030917060.31738@madrid.max-t.internal> <1159893642.5592.12.camel@lade.trondhjem.org> |
| Sender: | xfs-bounce@xxxxxxxxxxx |
On Tue, 3 Oct 2006, Trond Myklebust wrote: On Tue, 2006-10-03 at 09:39 -0400, Stephane Doyon wrote:Sorry for insisting, but it seems to me there's still a problem in need of fixing: when writing a 5GB file over NFS to an XFS file system and hitting ENOSPC, it takes on the order of 22hours before my application gets an error, whereas it would normally take about 2minutes if the file system did not become full.
You are allowing the kernel to cache 5GB, and that means you only get the error message when close() completes. But it's not actually caching the entire 5GB at once... I guess you're saying that doesn't matter...? If you want faster error reporting, there are modes like O_SYNC, O_DIRECT, that will attempt to flush the data more quickly. In addition, you can force flushing using fsync().
Finally, you can tweak the VM into flushing more often using /proc/sys/vm. It doesn't look to me like a question of degrees about how early to flush. Actually my client can't possibly be caching all of 5GB, it doesn't have the RAM or swap for that. Tracing it more carefully, it appears dirty data starts being flushed after a few hundred MBs. No error is returned on the subsequent writes, only on the final close(). I see some of the write() calls are delayed, presumably when the machine reaches the dirty threshold. So I don't see how the vm settings can help in this case. I hadn't realized that the issue isn't just with the final flush on close(). It's actually been flushing all along, delaying some of the subsequent write()s, getting NOSPC errors but not reporting them until the end. I understand that since my application did not request any syncing, the system cannot guarantee to report errors until cached data has been flushed. But some data has indeed been flushed with an error; can't this be reported earlier than on close? Would it be incorrect for a subsequent write to return the error that occurred while flushing data from previous writes? Then the app could decide whether to continue and retry or not. But I guess I can see how that might get convoluted. Thanks for your patience, |
| <Prev in Thread] | Current Thread | [Next in Thread> |
|---|---|---|
| ||
| Previous by Date: | Re: several messages, Stephane Doyon |
|---|---|
| Next by Date: | An extra layer of protection, MBNA Bank Online |
| Previous by Thread: | Re: several messages, Trond Myklebust |
| Next by Thread: | Re: several messages, David Chinner |
| Indexes: | [Date] [Thread] [Top] [All Lists] |