[Top] [All Lists]

Re: Transactional XFS?

To: Stewart Smith <stewart@xxxxxxxxxxxxxxxx>
Subject: Re: Transactional XFS?
From: Dave Chinner <david@xxxxxxxxxxxxx>
Date: Thu, 16 Feb 2012 17:42:30 +1100
Cc: Grozdan <neutrino8@xxxxxxxxx>, xfs@xxxxxxxxxxx
In-reply-to: <87ehtvz6bp.fsf@xxxxxxxxxxxxxxxx>
References: <CAFLt3phWcaQ4K3OPSVUkyN0BXqh+jQgQbeA59Oav23aOPLYMYw@xxxxxxxxxxxxxx> <20120216002237.GW14132@dastard> <87k43nzj5e.fsf@xxxxxxxxxxxxxxxx> <20120216014338.GX14132@dastard> <87ehtvz6bp.fsf@xxxxxxxxxxxxxxxx>
User-agent: Mutt/1.5.21 (2010-09-15)
On Thu, Feb 16, 2012 at 04:38:02PM +1100, Stewart Smith wrote:
> On Thu, 16 Feb 2012 12:43:38 +1100, Dave Chinner <david@xxxxxxxxxxxxx> wrote:
> > Oh, so making some set of random user changes to random user data
> > have ACID properties? That's what databases are for, isn't it?  :P
> Yep :)
> > I dont see us implementing anything like this in XFS anytime soon.
> > We are looking to add transaction grouping so that we can make
> > things that currently require multiple transactions (e.g. create a
> > file, add a default ACL) atomic, but I don't have any plans to
> > open the can of worms that is userspace controlled transactions any
> > time soon.
> The worst part is working out the semantics as to not break existing apps
> (without completely sacrificing concurrency).

That doesn't seem like a show stopper to me.

The part that I see is that it is basically impossible to do
arbitrarily large transactions in a filesystem - they are limited by
the size of the log. e.g. you can't have a user transaction that
writes more data or modifies more data than the log allows in a
single checkpoint/transaction. e.g. you can't just overwrite a 100MB
file in a transaction and expect it to work. It might work if you've
got a 2GB log, but if you've only got a 10MB log, then that
overwrite transaction is full of fail.

It's issues that like that that doom the generic usefulness of
userspace controlled filesystem transactions as part of the normal
filesystem operation. If you need this sort of functionality, it has
to be layered over the top of the filesystem to avoid filesystem
atomicity limitations. i.e. another layer of tracking and
journalling. And at that point you're talking about implementing a
database on top of the filesystem in the filesystem....

> > We already have this upgrade rollback functionality in development
> > with none of that complexity - it uses filesystem snapshots so is
> > effectively filesystem independent and already works with yum and
> > btrfs. You don't need any special application support for this -
> > rollback from a failed upgrade is as simple as a reboot.
> The downside being you also roll back your logs and any other changes
> made during that time. On the whole though, it's probably sufficient.

That, IMO, is one of the good things about it. You go back to a
pristine condition, but still have the failed upgrade image that you
can mount and debug. The logs and all the failed state is still
intact in the upgrade image, and when you are done debugging it you
can blow it away and try again....

> > Sure, Microsoft have been trying to make their filesystem a database
> > for years. It's theoretically possible, but in practice they've
> > fallen short in every attempt in the past 15 years.
> err... try 20 years :)

Time gets aways from me these days ;)

> It's funny in a way, sqlite succeeds at effectively doing this for an
> awful large number of applications.

/me nods


Dave Chinner

<Prev in Thread] Current Thread [Next in Thread>