[Top] [All Lists]

Re: OOM on quotacheck (again?)

To: xfs@xxxxxxxxxxx
Subject: Re: OOM on quotacheck (again?)
From: Volker <mail@xxxxxxxxxx>
Date: Tue, 02 Oct 2012 18:29:27 +0200
In-reply-to: <5060727D.4000009@xxxxxxxxxx>
References: <5059D2B4.8010300@xxxxxxxxxx> <20120919205924.GC31501@dastard> <505AE2A1.5060703@xxxxxxxxxx> <20120924132113.GL20960@dastard> <5060727D.4000009@xxxxxxxxxx>
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:15.0) Gecko/20120907 Thunderbird/15.0.1
Hi again,

> Great! That answered all my questions! Thanks a lot!
> 3.6.0-rc6-x64 ist currently running fine on 6 machines.

just as a follow up i would like to share some info.

The six machines mentioned above are still running fine. So are few more
we tested with the new kernel. All of the servers tested so far, were
rebooted immediately after the new 3.6 kernel was installed.

Because of that, we decided to roll out the new kernel to all our
servers (approximately 330) and have the kernel "sink in" over the next
few days if the machines get rebooted.

This morning we experienced some problems with the superblock being
corrupted on 6 machines that had been rebooted during the night. For all
of them, the following was true:

a) the server was still running the old buggy 2.6.37 and had
filesystem-troubles on heavy i/o (that was our problem to begin with
besides the OOM)

b) because of the filesystem-troubles the server had been rebooted by
our hardware-support-team (sadly not necessarily using sys-requests)
because the xfs-partition was unresponsive

c) after being rebooted with the new 3.6 kernel, the server complained
about the super-block of the xfs-partition being corrupted and was not
able to mount the partition

d) by running xfs_repair -L -P <device> we were able to fix the problem

e) trying a remount of the fixed partition caused a quota-check which
always ended in a stack-trace, after a reboot, the quota-check was fine
and the partition successfully mounted

Has anyone ever experienced problems like this updating from an older
kernel to the current 3.6?

Any Idea what could have caused the bad superblock the 3.6 kernel
complained about?

Is it possible that the 2.6.37 kernel left a superblock behing that
could not be recognized by the 3.6 kernel?

If its of any interest, i can supply the stack-traces.

- volker

<Prev in Thread] Current Thread [Next in Thread>