XFS filesystem corruption
Dave Chinner
david at fromorbit.com
Sun Mar 10 17:45:36 CDT 2013
On Sat, Mar 09, 2013 at 12:51:25PM -0600, Stan Hoeppner wrote:
> On 3/9/2013 3:11 AM, Dave Chinner wrote:
> > On Fri, Mar 08, 2013 at 12:59:22PM -0600, Stan Hoeppner wrote:
> >> On 3/8/2013 6:20 AM, Ric Wheeler wrote:
> >>>> Something that none of us mentioned WRT write barriers is that while the
> >>>> filesystem structure may avoid corruption when the power is cut, files
> >>>> may still be corrupted, in conditions such as any/all of these:
> >>
> >> I made it very clear I was discussing file corruption here, not
> >> filesystem corruption. You already covered that base. I was
> >> specifically addressing the fact that XFS performs barriers on metadata
> >> writes but not file data writes.
> >
> > Actually, you're not correct there, either, Stan. ;)
>
> With "either" you're implying I was incorrect twice, and I wasn't, not
> in whole anyway, maybe in part. ;)
The "either" was in reference to you correcting someone else...
> > XFS only issues cache flushes/FUA writes for log IO. Metadata IO is
> > done exactly the same way that data IO is done - without barriers.
> > It's because metadata lost in drive caches at the time of a crash is
> > rewritten by journal replay that filesystem corruption does not
> > occur.
>
> Technical semantics. Geeze, give the non dev a break now and then. ;)
It's the technical semantics that matter when it comes to behaviour
at power loss. That's why I pick on "technical semantics" - it's
makes your analysis and understanding of problems better, and that
means there's less for me to do in future ;)
> Does everyone remember the transitive property of equality from math
> class decades ago? It states "If A=B and B=C then A=C". Thus if
> barrier writes to the journal protect the journal, and the journal
> protects metadata, then barrier writes to the journal protect metadata.
Yup, but the devil is in the detail - we don't protect individual
metadata writes at all and that difference is significant enough to
comment on.... :P
> I had a detail incorrect, but not the big picture. And I'd bet the OP
> is more interested in the big picture. So surely I'd get a B or a C
> here, but certainly not an F.
Certainly a B+ - like I said, I'm being picky because you seem to
understand the details once explained... :)
> > As it is, if the application uses direct IO (likely, as it
> > sounds like video capture/editing/playout here) then log IO
> > will also ensure that the data written by the app is on disk (i.e.
> > that's ithe mechanism by which fsync works).
>
> So this would be an interesting upside down case for XFS, as the file
> data may be intact, but the filesystem gets corrupted, the opposite of
> the design point.
Well, if barriers are working correctly, then there won't be any
filesystem corruption, either...
> >>> Also, if there are active writers, this is inherently racy. A better
> >>> script would unmount the file systems :)
> >>
> >> Yes, a umount would be even better.
> >
> > Change the bios so that the power button does not cause a power down
> > so the OS can capture the button event and trigger an orderly
> > shutdown.
>
> Dare I say "Dave you're incorrect". ;)
Heh. Not so much incorrect as "unaware of the entire scope". I
browsed the thread and didn't pick up on this little detail...
Cheers,
Dave.
--
Dave Chinner
david at fromorbit.com
More information about the xfs
mailing list