OOM on quotacheck (again?)
blafoo
mail at blafoo.org
Wed Sep 19 09:12:04 CDT 2012
Hi all,
for the last couple of days i've been trying to compile a new kernel for
our webserver-platform which is based on debian-squeeze.
Hardware: a mix of Dell PE2850, 2950, R710
- raid-10 with 4 disks (old setup, PE2850)
- raid-1 system, raid-10 content (current setup)
- currently running linux-2.6.37 custom built, vmalloc set to default
(128MB)
All systems have an xfs-filesystem as their content-partition and have
group-quota enabled (no other xfs-settings active). the
content-partition varies in size between 250GB and 1TB and contains
between 3 and 10 million files.
Every time i try to mount the xfs-file-system and a quota-check is
needed, the server goes out of memory (oom). I can easily reproduce this
by rebooting the server, resetting the quota-flags with
xfs_db -x -c 'sb 0' -c 'write qflags 0'
and rerun the quota-check.
This is true for various kernels but not all. What i've tried so far:
2.6.37.x - fails with OOM
2.6.39.4 - suprisingly works (see below why)
3.2.29 - fails with OOM
3.4.10 - fails with OOM
3.6.0rc5 - fails with vmalloc error (XFS (sda7): xfs_buf_get_map: failed
to map pages), with vmalloc=256 the systems hangs on mount infitly.
Some more infos from my test-system are available here:
http://pastebin.com/2DkDyH4R
I found a couple of references regarding this problem but no final
solution so far.
Please correct the following if i misunderstood anything:
1. There was an OOM problem with quota-checks which was fixed in
2.6.39.4 which is mentioned here:
a) http://permalink.gmane.org/gmane.comp.file-systems.xfs.general/43565
and fixed here:
b) http://patchwork.xfs.org/patch/3337/
That is why 2.6.39.4 works for me.
2. That fix was later replaced (not extended) with a nicer patch which
is mentioned/published here:
c) http://oss.sgi.com/archives/xfs/2011-03/msg00240.html
I checked all kernel-versions above for the patch mentioned in 2. and
can confirm its presence in each kernel-tree. Still our servers fail to
check quota successfully.
Am i missing something here?
PS: As a side-note: we've been running xfs for years without any
problems. But after we activated the gquota-feature, we've been having
problems in a couple of places. One is the OOM on quota-check, another
is xfs-errors on high-io volumes with gquota enabled. But since the
high-io-problem problem might be connected to the OOM-problem, we'll try
to fix the latter first :-)
best regards
Volker
More information about the xfs
mailing list