OOM on quotacheck (again?)
Volker
mail at blafoo.org
Thu Oct 4 09:19:28 CDT 2012
Hi
> So you had a hang on 2.6.37 to do with dquot reclaim, you rebooted
> the server into what I think is a 3.6 kernel.
Correct.
> Log recovery failed with "bad clientid 0x0", so no superblock
> problem.
I was told by 'mount' that its a superblock-problem :-)
###
server044:~# mount -a
mount: /dev/sdb1: can't read superblock
###
What does the bad client-id in syslog indicate?
It does tend to indicate that 2.6.37 wrote bad data to the
> log, though. If you reboot into 2.6.37, does log recovery run
> successfully?
Yes. A server which was rebooted on Oct 3rd 07:18am, running 2.6.37 with
a stacktrace involving xfs_qm_dqreclaim_one came back up fine a couple
minutes later on 2.6.37.
If this would have not been working, we would have had way more trouble
with crashed xfs-partitions in the the past since the
xfs_qm_dqreclaim_one-stacktrace has been a very common error for us.
> i.e. does the failure only occur on 2.6.37 -> 3.6
> with a dirty log?
Yes. All 6 servers failed to mount the xfs-partition after they had
xfs-troubles on 2.6.37 and came back up on new 3.6 kernel. I did not try
to reboot them into 2.6.37 though.
> You them mounted the filesystem on the same kernel (has
> xfs_trans_read_buf_map() in the trace, hence the 3.6 version)
Correct. A quota-check was performed on all servers which ended in the
shown stack-trace also on all servers (see pastebin). After a reboot the
partition mounted just fine.
> What mount options are you using on the 2.6.37 kernel?
2.6.37 and 3.6 use the same options:
noatime,nosuid,nodev,gquota
> If you are upgrading your kernel, you should also upgrade your
> xfsprogs installation as well.
Will do.
- volker
More information about the xfs
mailing list