[Top] [All Lists]

Re: 1B files, slow file creation, only AG0 used

To: Dave Chinner <david@xxxxxxxxxxxxx>
Subject: Re: 1B files, slow file creation, only AG0 used
From: Michael Spiegle <mike@xxxxxxxxxxxxxxxx>
Date: Mon, 12 Mar 2012 14:54:20 -0700
Cc: xfs@xxxxxxxxxxx
In-reply-to: <20120312005632.GY5091@dastard>
References: <CAEm1Pvny7Q2rrsCLURvo5kQM3vt+yMg17WxoSYGKVWm7Lgp8MA@xxxxxxxxxxxxxx> <20120312005632.GY5091@dastard>
Reply-to: mike@xxxxxxxxxxxxxxxx
I believe we figured out what was going wrong:
1) You definitely need inode64 as a mount option
2) It seems that the AG metadata was being cached.  We had to unmount
the system and remount it to get updated counts on per-AG usage.

For the moment, I've written a script to copy/rename/delete our files
so that they are gradually migrated to new AGs.  FWIW, I noticed that
this operation is significantly faster on an EL6.2-based kernel
(2.6.32) compared to EL5 (2.6.18).  I'm also using the 'delaylog'
mount option which probably helps a bit.  I still have a few other
curiosities about this particular issue though:

On Sun, Mar 11, 2012 at 5:56 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> Entirely normal. some operations require Io to complete (e.g.
> reading directory blocks to find where to insert the new entry),
> while adding the first file to a directory generally requires zero
> IO. You're seeing the difference between cold cache and hot cache
> performance.

In this situation, any files written to the same directory exhibited
this issue regardless of cache state.  For example:

Takes 300ms to complete:
touch tmp/0

Takes 600ms to complete:
touch tmp/0 tmp/1

Takes 1200ms to complete:
touch tmp/0 tmp/1 tmp/2 tmp/3

I would expect the directory to be cached after the first file is
created.  I don't understand why all subsequent writes were affected
as well.

> Go look up what the inode32 and inode64 mount options do. The
> default is inode32....

So now that we're mounting inode64, I wonder if we'll see degraded
performance in the future due to a sub-optimal on-disk layout of our
data.  Even though I am migrating data to other AGs, will there be any
permanent "damage" to AG0 since it had to allocate 1B inodes?  What
happens to all of that metadata when the files are removed?


<Prev in Thread] Current Thread [Next in Thread>