Author: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 1 Sep 2010 01:30:41 +0200
I'm just trying the delaylog mount option on a filesystem (LVM over 2x 2TB 4K sector drives), and I see this while running 8 processes of "rm -r * & 2>/dev/null": Device: rrqm/s wrqm/s r/s w/s rkB/s
You're probably getting RMW cycles on inode writeback. I've been noticing this lately with my benchmarking - the VM is being _very aggressive_ reclaiming page cache pages vs inode caches and as a res
Author: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 1 Sep 2010 02:22:31 +0200
Nice explanation. This is a hexa-core AMD Phenom(tm) II X6 1090T Processor with up to 3.2GHz per core, so that shouldn't be - or is there only one core used? I think I read somewhere that each AG sho
Dave Chinner put forth on 8/31/2010 7:06 PM: 7200 rpm is the highest spindle speed for 2TB drives--5400 is most common. None of them are going to do much over 200 random seeks/second, if that. That's
I'm getting a 8core/16thread server being CPU bound with multithreaded unlink workloads using delaylog, so it's entirely possible that all CPU cores are fully utilised on your machine. If all the fil
Absolutely. Nothing in XFS is simple. ;) Unlinks that free the inode clusters results in no inode writeback load, so the majority of the IO is log traffic. Hence they are either log IO bound or read
Dave Chinner put forth on 8/31/2010 10:19 PM: What's your disk configuration on this 8 core machine? Are you implying/stating that the performance of the disk subsystem is irrelevant WRT multithreade
Depends on where I place the disk image for the VM's I run on it ;) For example, running fs_mark with 4 threads to create then delete 200k files in a directory per thread in a 4p VM w/ 2GB RAM with t
Author: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Wed, 1 Sep 2010 09:45:58 +0200
Just as Stan I'm puzzled by this. Why is it such a hard work for the CPU, what does it do? Is it really about calculating something, or has it to do with lock contention, cold caches, cache line bou
Ok, it seems that people don't have any real idea of th complexity of directory operations in XFS, so I'll give you a quick overview. The XFS directory structure is excedingly complex and the algorit
Author: Michael Monnerie <michael.monnerie@xxxxxxxxxxxxxxxxxxx>
Date: Thu, 2 Sep 2010 04:15:47 +0200
Is the SSD-needed "trim" belonging into here? Now I understand, thanks again for that great explanation. Time to invent the "XFS rm co-processor". Should be multi-core so it scales better. Maybe some
Dave Chinner put forth on 9/1/2010 1:44 AM: Is this a single socket quad core Intel machine with hyperthreading enabled? That would fully explain the results above. Looks like you ran out of memory b
No, It's a dual socket (8c/16t) server. No, that's definitely not the case. A different kernel in the same 8p VM, 12x2TB SAS storage, w/ 4 threads, mount options "logbsize=262144" FSUse% Count Size F
Dave Chinner put forth on 9/1/2010 8:17 PM: More like an FPGA. As we see on list, daily, the XFS code changes far too rapidly for implementation in an ASIC. ;) Hay, there's a sales opportunity for SG
Dave Chinner put forth on 9/2/2010 2:01 AM: What happens when you bump each of these to 8 threads, 1 per core? If the test consumes all cpus/cores, what instrumentation are you viewing that tells you
FSUse% Count Size Files/sec App Overhead 0 1600000 0 127979.3 13156823 So, 1 thread does 19k files/s, 2 thread does 37k files/s, 4 gets 67k, and 8 gets 128k. I'd say that's almost linear scaling and
Thanks Dave. I don't normally top post, but I just wanted to quickly say I _really_ enjoyed reading your reply below. It was seriously educational. I really enjoyed your note about the 24p Altix syst