for the last couple of days i've been trying to compile a new kernel for
our webserver-platform which is based on debian-squeeze.
Hardware: a mix of Dell PE2850, 2950, R710
- raid-10 with 4 disks (old setup, PE2850)
- raid-1 system, raid-10 content (current setup)
- currently running linux-2.6.37 custom built, vmalloc set to default
All systems have an xfs-filesystem as their content-partition and have
group-quota enabled (no other xfs-settings active). the
content-partition varies in size between 250GB and 1TB and contains
between 3 and 10 million files.
Every time i try to mount the xfs-file-system and a quota-check is
needed, the server goes out of memory (oom). I can easily reproduce this
by rebooting the server, resetting the quota-flags with
xfs_db -x -c 'sb 0' -c 'write qflags 0'
and rerun the quota-check.
This is true for various kernels but not all. What i've tried so far:
2.6.37.x - fails with OOM
22.214.171.124 - suprisingly works (see below why)
3.2.29 - fails with OOM
3.4.10 - fails with OOM
3.6.0rc5 - fails with vmalloc error (XFS (sda7): xfs_buf_get_map: failed
to map pages), with vmalloc=256 the systems hangs on mount infitly.
Some more infos from my test-system are available here:
I found a couple of references regarding this problem but no final
solution so far.
Please correct the following if i misunderstood anything:
1. There was an OOM problem with quota-checks which was fixed in
126.96.36.199 which is mentioned here:
and fixed here:
That is why 188.8.131.52 works for me.
2. That fix was later replaced (not extended) with a nicer patch which
is mentioned/published here:
I checked all kernel-versions above for the patch mentioned in 2. and
can confirm its presence in each kernel-tree. Still our servers fail to
check quota successfully.
Am i missing something here?
PS: As a side-note: we've been running xfs for years without any
problems. But after we activated the gquota-feature, we've been having
problems in a couple of places. One is the OOM on quota-check, another
is xfs-errors on high-io volumes with gquota enabled. But since the
high-io-problem problem might be connected to the OOM-problem, we'll try
to fix the latter first :-)