trying to avoid a lengthy quotacheck by deleting all quota data
Harry
harry at pythonanywhere.com
Thu Mar 5 11:09:57 CST 2015
PS. We might be interested in getting a better estimate of how long a
quotacheck would take. From an old thread on the mailing list, we see
this suggestion:
xfstests:src/bstat
We're a bit worried about running this on the live system, because we're
worried it will impact its performance substantially. Is that an
unfounded worry? I presume it's a read-only operation, so it would be
safe to kill it if we see performance degradation?
rgds,
Harry + the team.
On 05/03/15 17:05, Harry wrote:
> Thanks for the reply Eric.
>
> One of our problems is that we're limited in terms of what
> manipulations we can apply to the live system, and so instead we've
> been running our experiments against the backup system, and you're
> quite right that DRBD may be introducing some weirdness of its own, so
> those experiments may not be safe to draw conclusions from.
>
> Here's what we know about the live system
> -> it had an outage, equivalent to having its power cable yanked, or
> doing an 'echo b > /proc/sysrq-trigger'
> -> when it came back, it decided to mount the drive without quotas.
> -> we saw a message in syslog saying " Failed to initialize disk quotas"
> -> last time we had to run a quotacheck (several months ago) it took
> about 2 hours.
>
> We can repro the quotacheck issue on our test clusters, as follows:
> -> kick off a job that writes to the disk
> -> hard reboot with "echo b > /proc/sysrq-trigger"
> -> on next boot, see "Failed to initialize disk quotas" message, xfs
> mounts without quotas
> -> soft reboot with "reboot"
> -> on next boot, see "Quotacheck needed: Please wait." message.
> -> Quotacheck completes some time later.
>
> So our best-case scenario is that, next time we reboot, we'll have an
> outage of about 2 hours. And our paranoid worst-case scenario,
> induced by our experiments with our drbd backup drives, are that the
> disk will actually turn out not to be mountable at all.
>
> is that "quotacheck always required after hard reboot" behaviour that
> we're observing something you expected? you seemed to be saying that
> the fact that quota are journaled should mean it's not needed?
>
> HP
>
> On 05/03/15 15:53, Eric Sandeen wrote:
>> On 3/5/15 7:15 AM, Harry wrote:
>>> Update -- so far, we've not managed to gain any confidence that we'll
>>> ever be able to re-mount that disk. The general consensus seems to be
>>> to fish all the data off the disk using rsync, and then move off XFS
>>> to ext4.
>>>
>>> Not a very helpful message for y'all to hear, I know. But if it's any
>>> help in prioritising your future work, i think the dealbreaker for us
>>> was the inescapable quotacheck on mount, which means that any time a
>>> fileserver goes down unexpectedly, we have an unavoidable,
>>> indeterminate-but-long period of downtime...
>>>
>>> hp
>> What you decide to use is up to you of course, and causes us no
>> heartbreak. :) But I think you fundamentally misunderstand the situation;
>> an unexpected fileserver failure should not result in a lengthy quotacheck
>> on xfs, because xfs quota is journaled, and will simply be replayed along with
>> the rest of the log.
>>
>> I honestly don't know what has led you to the conclusion that remounting
>> the filesystem will lead to any quotacheck at all, let alone a lengthy one.
>>
>>> * We're even a bit worried the disk might be in a broken state, such
>>> that the quotacheck won't actually complete successfully at all.
>> If your disk is broken, that's not a filesystem issue. It seems possible
>> that whatever drbd manipulation you're doing is causing an issue, but because
>> you haven't really explained it in detail, I don't know.
>>
>>> We take DRBD offline, so it's no longer writing, then we take
>>> snapshots of the drives, then remount those elsewhere so we can
>>> experiment without disturbing the live system.
>> Did you quiesce the filesystem first with i.e. xfs_freeze?
>>
>> So far this thread has been long on prose and speculation, and short
>> on actual analysis, log messages, etc. Feel free to use ext4 or whatever
>> suits you, but given that nothing in this thread has implicated misbehavior
>> by xfs, I don't think that switching filesystems will solve the perceived
>> problem.
>>
>> -Eric
>
> Rgds,
> Harry + the PythonAnywhere team.
>
> --
> Harry Percival
> Developer
> harry at pythonanywhere.com
>
> PythonAnywhere - a fully browser-based Python development and hosting environment
> <http://www.pythonanywhere.com/>
>
> PythonAnywhere LLP
> 17a Clerkenwell Road, London EC1M 5RD, UK
> VAT No.: GB 893 5643 79
> Registered in England and Wales as company number OC378414.
> Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK
Rgds,
Harry + the PythonAnywhere team.
--
Harry Percival
Developer
harry at pythonanywhere.com
PythonAnywhere - a fully browser-based Python development and hosting environment
<http://www.pythonanywhere.com/>
PythonAnywhere LLP
17a Clerkenwell Road, London EC1M 5RD, UK
VAT No.: GB 893 5643 79
Registered in England and Wales as company number OC378414.
Registered address: 28 Ely Place, 3rd Floor, London EC1N 6TD, UK
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://oss.sgi.com/pipermail/xfs/attachments/20150305/e3cba2f3/attachment-0001.html>
More information about the xfs
mailing list