xfs
[Top] [All Lists]

Page failure allocation - in-memory data corruption

To: <linux-xfs@xxxxxxxxxxx>
Subject: Page failure allocation - in-memory data corruption
From: Hümmer Andreas <andreas.huemmer@xxxxxxxxx>
Date: Sat, 14 Aug 2004 16:05:00 +0200
Sender: linux-xfs-bounce@xxxxxxxxxxx
Thread-index: AcSCB7EGHAB7Ot9pSkOfSMdaiY3BvA==
Thread-topic: Page failure allocation - in-memory data corruption
Hi there,

Recently I observed some annoying errors:

"Corruption of in-memory data detected" along with page faillure allocations of 
ohter applications (smbd or nfsd).

Basicly I m running a suse 9.1 (2.6.5-7.104-smp) on a dual athlon with 4 Gigs 
of RAM and "some" storage devices attached. A Gbit Networkinterface is also 
part of the system and all filesystems run on an encryted loop-device.

Well, fortunately it is currently only partial in production state, so that I 
had the chance to do some investigations.

In every case the failures occured under load via the fileserver application. 
Therefore I started observing the MemFree state of the machine and found, that 
it goes down (as expected) to the limit defined in vm.nin_free_kbytes whitch is 
in this case set to 1914. 
Every failure started with an "page allocation failure" mostly from smbd. This 
process was dead afterwards. Then followed by in-memory datacorruptions 
reported by xfs.
Thus finally (after a handfull of tests) resulting in a 50% data-loss on a 
completely garbled 1.8TB xfs-partition.

After doing some investigations the following behaviour was observed:

- changing eth interface speed from 1GB to 100MB: The errors were occuring less 
often
- changing the sync-bahaviour (strictsync, etc) in smbd: The errors occured 
less often
- Nevertheless there was no clear picture unter what circumsdances these errors 
can occure
- As mentioned above the vm.min_free_kbyte is set to about 2MB (default suse 
oder 2.6 setting), so the idea was to rize this to a higher value to give the 
system a little more space for its bufferhandling. And: It worked for now, 
setting this value to about 20M, all problems even under full-load-conditions 
up to the systems limit were gone.

The behauvior I observed after setting vm.min_free to 20 MB was that some 
processes (for the most part smbd, xfs) were allocating memory really quick. 
And it seems (unfortunately I have no proof for it) that there can be a race 
condition while concurrent applications allocate memory buffers. I cant state 
clearly by now where the problem originaly comes from (kernel, samba, xfs or 
intel e100/1000 driver), but the ones mentioned before are my favorites.

The questions I have now are:

- Is there a "known" problem with xfs and memory-allocation ?
- Even if this bug which is not originated by xfs itself, is there or can there 
be a function to avoid damage to the filesystems in case another app goes 
"wild"?
- Can this issue be used by an attacker to damage a system?
- Is there a table or list stating some basic (known to be good or best choice) 
(kernel|fs|application)parameters for given filesystem(sizes)?

If usefull, I can provide more infos and do some more tests - well I can do 
this until end of august, the the machine will go to production state.

Ciao
Andi





Andreas Hümmer
IT-Service 
Mobile: +49 (0) 1 60.90 53 02 04
Mailto:andreas.huemmer@xxxxxxxxx 
_____________________________________ 

     ELAXY GmbH 
     Spitalgasse 23 
     D - 96450 Coburg 
     Phone: +49 (0) 95 61.5543.0 
     FAX:   +49 (0) 95 61.5543.344 
     http://www.elaxy.com 
_____________________________________ 

 


<Prev in Thread] Current Thread [Next in Thread>
  • Page failure allocation - in-memory data corruption, Hümmer Andreas <=