On Jul 26, 2010, at 3:20 AM, Dave Chinner wrote:
> On Sun, Jul 25, 2010 at 11:46:29PM -0700, Eli Morris wrote:
>> On Jul 25, 2010, at 11:06 PM, Dave Chinner wrote:
>>> On Sun, Jul 25, 2010 at 09:04:03PM -0700, Eli Morris wrote:
>>>> On Jul 25, 2010, at 8:45 PM, Dave Chinner wrote:
>>> I've just confirmed that the problem does not exist at top-of-tree.
>>> The following commands gives the right output, and the repair at the
>>> end does not truncate the filesystem:
>>> xfs_io -f -c "truncate $((13427728384 * 4096))" fsfile
>>> mkfs.xfs -f -l size=128m,lazy-count=0 -d
>>> xfs_io -f -c "truncate $((16601554944 * 4096))" fsfile
>>> mount -o loop fsfile /mnt/scratch
>>> xfs_growfs /mnt/scratch
>>> xfs_info /mnt/scratch
>>> umount /mnt/scratch
>>> xfs_db -c "sb 0" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_db -c "sb 1" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_db -c "sb 127" -c "p agcount" -c "p dblocks" -f fsfile
>>> xfs_repair -f fsfile
>>> So rather than try to triage this any further, can you upgrade your
>>> kernel/system to something more recent?
>> I can update this to Centos 5 Update 4, but I can't install
>> updates forward of it's release date of Dec 15, 2009. The reason
>> is that this is the head node of a cluster and it uses the Rocks
>> cluster distribution. The newest of Rocks is based on Centos 5
>> Update 4, but Rocks systems do not support updates (via yum, for
>> Updating the OS takes me a day or two for the whole cluster and
>> all the user programs. If you're pretty sure that will fix the
>> problem, I'll go for it tomorrow. I'd appreciate it very much if
>> you could let me know if Centos 5.4 is recent enough that it will
>> fix the problem..
> The only way I can find out is to load CentOS 5.4 onto a
> system and run the above test. You can probably do that just as
> easily as I can...
>> I will note that I've grown the filesystem several times, and
>> while I recall having to unmount and remount the filesystem each
>> time for it to report its new size, I've never seen it fall back
>> to its old size when running xfs_repair. In fact, the original
>> filesystem is about 12 TB, so xfs_repair only reverses the last
>> grow and not the previous ones.
> Hmmm - I can't recall any bug where unmount was required before
> the new size would show up. I know we had problems with arithmetic
> overflows in both the xfs_growfs binary and the kernel code, but
> they did not manifest in this manner. Hence I can't really say why
> you are seeing that behaviour or why this time it is different.
> The suggestion of using a recent live CD to do the grow is a good
> one - it might be your best option, rather than upgrading everything....
> Dave Chinner
Thanks for all the help. I was finally able to get a USB thumb drive made up
with Fedora 13 (64 bit version-that turned out to be important!). I did the
xfs_growfs after booting off that, then rebooted back to my normal
configuration, ran xfs_repair, and this time the file system stayed OK. I'm
doing an overnight write test and will run xfs_repair again tomorrow morning,
but I think that solved the problem. BTW, Fedora has a great tool for making
USB thumb drives with the live distro on it. It does everything for you,
including downloading the disc image. nice. That's a pretty nasty bug.