[Top] [All Lists]

Re: trying to avoid a lengthy quotacheck by deleting all quota data

To: xfs@xxxxxxxxxxx
Subject: Re: trying to avoid a lengthy quotacheck by deleting all quota data
From: Harry <harry@xxxxxxxxxxxxxxxxxx>
Date: Thu, 05 Mar 2015 17:34:39 +0000
Cc: "developers@xxxxxxxxxxxxxxxxxx" <developers@xxxxxxxxxxxxxxxxxx>
Delivered-to: xfs@xxxxxxxxxxx
In-reply-to: <54F89201.60805@xxxxxxxxxxx>
References: <54EC958E.2000001@xxxxxxxxxxxxxxxxxx> <20150224215907.GA18360@dastard> <54EF1A8F.7030505@xxxxxxxxxxxxxxxxxx> <54F856E7.10006@xxxxxxxxxxxxxxxxxx> <54F87BF3.3000405@xxxxxxxxxxx> <54F88CEC.4030009@xxxxxxxxxxxxxxxxxx> <54F89201.60805@xxxxxxxxxxx>
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.5.0
We're on 3.13.0-39 (Ubuntu Trusty).

If you're interested in looking into it further, I'd be happy to provide any extra info you'd like?

But just to make sure I'm not wasting any of your time -- I think the team have pretty much decided to make the switch no matter what. The quotacheck issue is one thing, but actually the switch to ext4 simplifies lots of other aspects of our quota system (one of the reasons we picked nfs was to be able to use project quotas, but it turns out we don't need them any more, so user quotas are simpler...)

On 05/03/15 17:27, Eric Sandeen wrote:
On 3/5/15 11:05 AM, Harry wrote:
Thanks for the reply Eric.

One of our problems is that we're limited in terms of what
manipulations we can apply to the live system, and so instead we've
been running our experiments against the backup system, and you're
quite right that DRBD may be introducing some weirdness of its own,
so those experiments may not be safe to draw conclusions from.

Here's what we know about the live system
-> it had an outage, equivalent to having its power cable yanked, or doing an 
'echo b > /proc/sysrq-trigger'
-> when it came back, it decided to mount the drive without quotas.
-> we saw a message in syslog saying " Failed to initialize disk quotas"
-> last time we had to run a quotacheck (several months ago) it took about 2 

We can repro the quotacheck issue on our test clusters, as follows:
-> kick off a job that writes to the disk
-> hard reboot with "echo b > /proc/sysrq-trigger"
-> on next boot, see "Failed to initialize disk quotas" message, xfs mounts 
without quotas
-> soft reboot with "reboot"
-> on next boot, see "Quotacheck needed: Please wait." message.
-> Quotacheck completes some time later.

So our best-case scenario is that, next time we reboot, we'll have an
outage of about 2 hours. And our paranoid worst-case scenario,
induced by our experiments with our drbd backup drives, are that the
disk will actually turn out not to be mountable at all.

is that "quotacheck always required after hard reboot" behaviour that
we're observing something you expected? you seemed to be saying that
the fact that quota are journaled should mean it's not needed?
In general, that's correct.  It's not clear why "Failed to initialize disk 
appeared; that seems closer to the root cause.  But again, we don't have your
full logs to look at, I don't know if anything else offers a clue.  (For that
matter, we don't even know what kernel version you're on...)

here, on a recent 4.0-rc1 kernel:

# mount -o quota /dev/sdc6 /mnt/test
# cp -aR /lib/modules/ /mnt/test
# echo b > /proc/sysrq-trigger

[152807.209688] sysrq: SysRq : Resetting

# mount -o quota /dev/sdc6 /mnt/test
# dmesg | tail -n 3
[   90.822601] XFS (sdc6): Mounting V4 Filesystem
[   90.921346] XFS (sdc6): Starting recovery (logdev: internal)
[   93.399133] XFS (sdc6): Ending recovery (logdev: internal)


Harry + the PythonAnywhere team.

Harry Percival

PythonAnywhere - a fully browser-based Python development and hosting 

PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK

<Prev in Thread] Current Thread [Next in Thread>