Of course we monitor our file systems. But as I thought I made
clear, we were "unaware" the file system was full because df said it
still had 399 gigabytes of free space. Granted, this is "only" 7%,
but it could just as easily been 30% or 50% or 80% because _the file
system was corrupt_.
Also, given your "80% rule", I suspect you have never worked in an
environment like mine. This file system is one of around 40 of similar
size in a single pool. Am I supposed to tell my boss that we need
more disk as soon as our free space goes below 40 terabytes?
The bottom line is that the file system was full but appeared not to
be, and thus xfs_repair bombed out. I realize this is a corner case,
but it is a nasty one, and it has nothing to do with my "policy on
filesystem space management".
But thank you for your input.
On Thu, Jun 23, 2011 at 12:42 AM, Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx> wrote:
> On 6/22/2011 6:41 PM, Patrick J. LoPresti wrote:
>> I guess one question is how xfs_repair should behave in this case. I
>> mean, what if the file system had been full, but too corrupt for me to
>> delete anything?
> Maybe you should rethink your policy on filesystem space management.
> From what you stated the FS in question actually was full. You
> apparently were unaware of it until a problem (misbehaving nfsd process)
> brought it to your attention. You should be monitoring your FS usage.
> Something as simple as logwatch daily summaries can save your bacon here.
> As a general rule, when an FS begins steadily growing past the 80% mark
> heading toward 90%, you need to take action, either adding more disk to
> the underlying LVM device and growing the FS, mounting a new device/FS
> into a new directory in the tree and manually moving files, or making
> use of some HSM software.
> Full filesystems have been a source of problems basically forever. It's
> best to avoid such situations instead of tickling the dragon.