[Top] [All Lists]

Re: xfs_repair: "fatal error -- ran out of disk space!"

To: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Subject: Re: xfs_repair: "fatal error -- ran out of disk space!"
From: "Patrick J. LoPresti" <lopresti@xxxxxxxxx>
Date: Thu, 23 Jun 2011 07:16:16 -0700
Cc: Dave Chinner <david@xxxxxxxxxxxxx>, Eric Sandeen <sandeen@xxxxxxxxxxx>, xfs@xxxxxxxxxxx
Dkim-signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:in-reply-to:references:date :message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=7rLnN7TEQFvaeW3BQAno0ge1X3w9zCv+hon6Ya10bLk=; b=HJ1wH0fn3EoDdHcB6uW4+8yFP+oWFzZgL+FXit5xKvFsNc5z6NZlvmYrS2p6H/R9MR x/OfXV6ZF6i8kGCcGeYiPosceLzf1z5my+i1Bmz3AhY60pbHyUItojZDw4tAzumdUzUH hVbvz0QOX79G/WZJFE0eamtGtOb3eUWXfjxK0=
Domainkey-signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; b=k/Vkr/Wi3CtbBChU95Swsq8vAOC3s7fHepl3JeLTOimc7bDoSggu0b5xl8v3oj10ii VmCtxpEFBmV/1xmZzgUitNKOYg5cIJ5Y1QpDhlGn8ULuPFL0a8emh38islK6NRwOC7HC vl29Nne2A4nG57apfMa/SF7cVZ+/9vfjOPXFg=
In-reply-to: <4E02EE71.7010300@xxxxxxxxxxxxxxxxx>
References: <BANLkTi=gS5iO9R9pVk_df-4ofkkb0ZJgfw@xxxxxxxxxxxxxx> <4E026C42.2030500@xxxxxxxxxxx> <20110622232418.GV32466@dastard> <BANLkTinu03WnKTL=SzoWt+Sd9YHjy0_w6g@xxxxxxxxxxxxxx> <4E02EE71.7010300@xxxxxxxxxxxxxxxxx>
Of course we monitor our file systems.   But as I thought I made
clear, we were "unaware" the file system was full because df said it
still had 399 gigabytes of free space.  Granted, this is "only" 7%,
but it could just as easily been 30% or 50% or 80% because _the file
system was corrupt_.

Also, given your "80% rule", I suspect you have never worked in an
environment like mine. This file system is one of around 40 of similar
size in a single pool.  Am I supposed to tell my boss that we need
more disk as soon as our free space goes below 40 terabytes?

The bottom line is that the file system was full but appeared  not to
be, and thus xfs_repair bombed out.  I realize this is a corner case,
but it is a nasty one, and it has nothing to do with my "policy on
filesystem space management".

But thank you for your input.

 - Pat

On Thu, Jun 23, 2011 at 12:42 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
> On 6/22/2011 6:41 PM, Patrick J. LoPresti wrote:
>> I guess one question is how xfs_repair should behave in this case.  I
>> mean, what if the file system had been full, but too corrupt for me to
>> delete anything?
> Maybe you should rethink your policy on filesystem space management.
> From what you stated the FS in question actually was full.  You
> apparently were unaware of it until a problem (misbehaving nfsd process)
> brought it to your attention.  You should be monitoring your FS usage.
> Something as simple as logwatch daily summaries can save your bacon here.
> As a general rule, when an FS begins steadily growing past the 80% mark
> heading toward 90%, you need to take action, either adding more disk to
> the underlying LVM device and growing the FS, mounting a new device/FS
> into a new directory in the tree and manually moving files, or making
> use of some HSM software.
> Full filesystems have been a source of problems basically forever.  It's
> best to avoid such situations instead of tickling the dragon.
> --
> Stan

<Prev in Thread] Current Thread [Next in Thread>