I believe we figured out what was going wrong:
1) You definitely need inode64 as a mount option
2) It seems that the AG metadata was being cached. We had to unmount
the system and remount it to get updated counts on per-AG usage.
For the moment, I've written a script to copy/rename/delete our files
so that they are gradually migrated to new AGs. FWIW, I noticed that
this operation is significantly faster on an EL6.2-based kernel
(2.6.32) compared to EL5 (2.6.18). I'm also using the 'delaylog'
mount option which probably helps a bit. I still have a few other
curiosities about this particular issue though:
On Sun, Mar 11, 2012 at 5:56 PM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> Entirely normal. some operations require Io to complete (e.g.
> reading directory blocks to find where to insert the new entry),
> while adding the first file to a directory generally requires zero
> IO. You're seeing the difference between cold cache and hot cache
In this situation, any files written to the same directory exhibited
this issue regardless of cache state. For example:
Takes 300ms to complete:
Takes 600ms to complete:
touch tmp/0 tmp/1
Takes 1200ms to complete:
touch tmp/0 tmp/1 tmp/2 tmp/3
I would expect the directory to be cached after the first file is
created. I don't understand why all subsequent writes were affected
> Go look up what the inode32 and inode64 mount options do. The
> default is inode32....
So now that we're mounting inode64, I wonder if we'll see degraded
performance in the future due to a sub-optimal on-disk layout of our
data. Even though I am migrating data to other AGs, will there be any
permanent "damage" to AG0 since it had to allocate 1B inodes? What
happens to all of that metadata when the files are removed?