[Top] [All Lists]

Re: Unable to mount and repair filesystems

To: Gerard Beekmans <GBeekmans@xxxxxxxx>
Subject: Re: Unable to mount and repair filesystems
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 30 Jan 2015 09:57:50 +1100
Cc: Eric Sandeen <sandeen@xxxxxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <D90435AEFF34654AA1122988C66C8678023F027956@xxxxxxxxxxxxxxxxxxx>
References: <D90435AEFF34654AA1122988C66C8678023F0277C9@xxxxxxxxxxxxxxxxxxx> <54CA9586.1010607@xxxxxxxxxxx> <D90435AEFF34654AA1122988C66C8678023F027956@xxxxxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jan 29, 2015 at 09:27:32PM +0000, Gerard Beekmans wrote:
> > -----Original Message-----
> > Are you certain that the volume /
> > storage behind dm-9 is in decent shape?  (i.e. is it really even
> > an xfs filesystem?)
> The outage occurred at the SAN level making the NFS storage
> unavailable which in turn turned off all the VMs running on it
> (turned off in the virtual sense).

Define "SAN" outage. All this tells me is that the backing store
went bad in some way and needed recovery, not what the actual
problem in the SAN was. If it was a potential data loss event, then
that's the prime candidate for the storage returning zeros where
there should be data.

The second candidate is the NFS server. What was the NFS server?
Did the NFS server get rebooted? Did the NFS clients (i.e. the
physical machines running the hypervisor, not the guests) get
rebooted too?  If you reboot the server, the NFS clients are
supposed to retransmit any unstable data they have to the server. If
the clients are rebooted or the NFS mount forcible unmounted while
the server is down, then that unstable data is lost forever.

Really, fully zeroed blocks in critical XFS metadata blocks is
almost always an indication of data loss somewhere in the lower
layers of the storage stack.  As a precaution, though, if one vmdk
is bad, I'd consider all the others as suspect, even if the
filesystem checkers haven't thrown errors.  Random block data loss
can really only be reliably recovered from backups, as user data is
notoriously difficult to validate as correct.


> It is possible that it is the vmware VMDK file that belongs to
> this VM that is the issue but it does not appear to be corrupt
> from a vmdk standpoint. Just the data inside of it.

Also, you are using VMDK image files, that implies you are running
ESX as your hypervisor, yes? If so, that limits our ability to help
you track the source of the corruption...


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>