> Eli Morris put forth on 7/9/2010 6:07 PM:
>> Hi All,
>>
>> I've got this problem where if I run xfs_repair, my filesystem shrinks by 11
>> TB, from a volume size of 62 TB to 51 TB. I can grow the filesystem again
>> with xfs_growfs, but then rerunning xfs_repair shrinks it back down again.
>> The first time this happened was a few days ago and running xfs_repair took
>> about 7 TB of data with it. That is, out of the 11 TB of disk space that
>> vanished, 7 TB had data on it, and 4 TB was empty space. XFS is running on
>> top of an LVM volume. It's on an Intel/Linux system running Centos 5
>> (2.6.18-128.1.14.el5). Does anyone have an idea on what would cause such a
>> thing and what I might try to keep it from continuing to happen. I could
>> just never run xfs_repair again, but that doesn't seem like a good thing to
>> count on. Major bonus points if anyone has any ideas on how to get my 7 TB
>> of data back also. It must be there somewhere and it would be very bad to
>> lose.
>>
>> thanks for any help and ideas. I'm just stumped right now.
>
> It may be helpful if you can provide more history (how long has this been
> happening, recent upgrade?), the exact xfs_repair command line used, why you
> were running xfs_repair in the first place, hardware or software RAID, what
> xfsprogs version, relevant log snippets, etc.
Hi Stan,
Thanks for responding. Sure, I'll try and give more information.
I got some automated emails this Sunday about I/O errors coming from the
computer (which is a Dell Poweredge 2950 w/ a connected 16 bay hardware RAID
which is connected itself to 4 16 bay JBODs. The RAID controller is connected
via SAS / LSI Fusion card to the Poweredge - Nimbus). It was Sunday, so I just
logged in, rebooted, ran xfs_repair, then mounted the filesystem back. I tried
a quick little write test, just to make sure I could write a file to it and
read it back and called it a day until work the next day. When I came into
work, I looked at the volume more closely and noticed that the filesystem
shrank as I stated. Each of the RAID/JBODs is configured as a separate device
and represents one physical volume in my LVM2 scheme, and those physical
volumes are then combined into one logical volume. Then the filesystem sits on
top of this. One one of the physical volumes (PVs) - on /dev/sdc1, I noticed
when I ran pvdisplay that of the 12.75 TB comprising the volume, 12.00 TB was
being shown as 'not usable'. Usually this number is a couple of megabytes. So,
after staring at this a while, I ran pvresize on that PV. The volume then
listed 12.75 as usable, with a couple of megabytes not usable as one would
expect. I then gave the command xfs_growfs on my filesystem and once again the
file system was back to 62 TB. But it was showing the increased space as free
space, instead of only 4.x TB of it as free as before all this happened. I then
ran xfs_repair on this again, thinking it might find the missing data. Instead
the filesystem decreased back to 51 TB. I rebooted and tried again a couple of
times and the same thing happened. I'd really, really like to get that data
back somehow and also to get the filesystem to where we can start using it
again.
Version 2.9.4 of xfsprogs. xfs_repair line used 'xfs_repair /dev/vg1/vol5',
vol5 being the LVM2 logical volume. I spoke with tech support from my RAID
vendor and he said he did not see any sign of errors with the RAID itself for
what that is worth.
Nimbus is the hostname of the computer that is connected to the RAID/JBODs
unit. The other computers (compute-0-XX) are only connected via NFS to the
RAID/JBODs.
I've tried to provide a lot here, but if I can provide any more information,
please let me know. Thanks very much,
Eli
I'm trying to post logs, but my emails keep getting bounced. I'll see if this
one makes it.
|