Thanks for the reply!
> > We have a RHEL 6.3 machine with a large XFS mount that suffered a
> > power outage.
> For starters, have you engaged your RH support folks?
Unfortunately we don't have support for these machines. We have tons of RH
machines and licenses, but only a few with paid support. Generally the
(grant-funded) research machines don't include RH support. (And generally we
don't run into problems like this. :))
> > When it came back up, it allegedly fixed itself, but
> > now many files are zero bytes. I found a bug report/errata fix at RH
> > that mentions something similar, which might be what we ran into.
> Which one? RH support can probably help you decide if that bug report
> applies, and where/when it was fixed.
This one: https://access.redhat.com/site/solutions/272673
You need a login to view that, though... I think this is the same one, which I
just found today:
That URL is currently broken for me, so here is a cache of it:
Reading this, I'm no longer sure we have a kernel with the fix. That machine is
I'm not really sure when the files were created or how long it was idle before
the crash... I wonder if ctime/mtime would be reliable for the files. I also
don't know how to reproduce the situation in order to test if it's fixed in a
later kernel. I can pull the power out to test if I knew how to modify files
ahead of time such that they would zero themselves out.
> > We
> > are running a kernel that should have the fix as far as I can tell,
> > but we definitely have zero byte files that shouldn't be.
> shouldn't be because they had all been properly synced to disk
> before the power loss, or? (just in general, files not fsynced
> aren't guaranteed to be in any particular state if you lose power,
> though of course there are certain expectations of timely flushing).
No, I mean they shouldn't be zero normally. They weren't zero a week ago. In
other words, the files definitely changed unexpectedly, I'm assuming due to the
power outage. The files had not been touched in at least a few days before the
crash, according to the researcher working on those files. If I read the report
correctly, though, that might not matter much.
> > My question is: is there a way to restore this or fix it before going
> > to backups? Is it worth it to unmount and run xfs_check or similar?
> > Unfortunately, since the system came up and appeared to be working,
> > some users have been using that mount point.
> If you have backups that's probably the best option.
There aren't any backups of these files. The researchers should be able to
recreate them (I hope so); the data sets come from various places. It's a lot
of data, so I was hoping I could recover something to lessen the downtime. They
opted not to back up that directory because it's just too many TBs for normal
I'm not really expecting to be able to restore everything, I just want to put
some effort in to getting back what I can before telling them they need to