Chris Allen wrote:
So..... questions:
1. How much is known about this problem? Seeing as it is 100% reproducible,
is there any active development underway to fix it?
XFS is a lot less stack-heavy than it used to be, but if you put enough
IO code between sys_write and your disks, it can all add up to a problem.
2. I have seen postings that say compiling a kernel with 8K stacks will
fix the
problem. Is this the case? Or will I be able to trigger it again by
running 100 or
200 simultaneous writes?
More threads probably won't matter.
3. Any suggestions as to what I should try? At present it looks like I
am stuck between
finding a fix for XFS and splitting the box into 2 or 3 EXT3 partitions
(which I really don't
want to do). I have tried ReiserFS (max FS size is 8TB even though the
FAQ says 16), and
JFS (jfs_fsck segfaults which doesn't fill me with confidence).
If you can run w/ 8k stacks you will probably be in better shape.
If you want to do a bit of testing, go into do_IRQ() and change the
warning threshold (STACK_WARN) to something slightly bigger, so that
you'll get the warning message earlier, and you should also get a
backtrace that tells you how you got there.
-Eric
Many thanks for any suggestions,
Chris Allen.
|