xfs
[Top] [All Lists]

Re: Unable to mount and repair filesystems

To: Eric Sandeen <sandeen@xxxxxxxxxxx>
Subject: Re: Unable to mount and repair filesystems
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Fri, 30 Jan 2015 10:12:31 +1100
Cc: Gerard Beekmans <GBeekmans@xxxxxxxx>, "xfs@xxxxxxxxxxx" <xfs@xxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <54CAB11A.7040509@xxxxxxxxxxx>
References: <D90435AEFF34654AA1122988C66C8678023F0277C9@xxxxxxxxxxxxxxxxxxx> <54CA9586.1010607@xxxxxxxxxxx> <D90435AEFF34654AA1122988C66C8678023F027956@xxxxxxxxxxxxxxxxxxx> <54CAAAEC.1080803@xxxxxxxxxxx> <D90435AEFF34654AA1122988C66C8678023F0279AB@xxxxxxxxxxxxxxxxxxx> <54CAB11A.7040509@xxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Jan 29, 2015 at 04:15:54PM -0600, Eric Sandeen wrote:
> On 1/29/15 3:59 PM, Gerard Beekmans wrote:
> I'm sure it's not related to this issue (unless it was very recently grown?
> Was it grown shortly before the failures?)
> 
> Hm, it would have started at 4 AGs by default, and it's the 5th one that
> looks bad; maybe that's a clue.  Are agf 6, 7, 8 etc also full of 0s?

Gerard is using the default mount options, so XFS is issuing cache
flushes and FUA with log writes. Hence if the new AG headers are
zero yet the superblock says they are valid, then that's a storage
bug.

In more detail: we force the new AGs to be written to disk
synchronously during the growfs operation before we commit the
transaction. The superblock with the larger AG count can only get on
disk after the transaction has been written to the log. Log writes
trigger a storge device cache flush, which results in the IO
ordering of:

new AG header IO
IO complete
transaction commit
....
Device cache flush
        (new AG headers guaranteed to be on disk)
journal write (FUA)
        (journal write guaranteed to be on disk)
.....
superblock write IO.

Hence if the superblock is showing 25 AGs and the new ags from 4-25
are not found on disk then either:

        a) if the grow was very recent the storage is not obeying
        cache flushes and hence breaking fundamental IO ordering
        behaviour; or,

        b) if the growfs happened long ago, the storage has lost the
        data that was written to stable media...

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

<Prev in Thread] Current Thread [Next in Thread>