xfs
[Top] [All Lists]

Re: deleting 2TB lots of files with delaylog: sync helps?

To: xfs@xxxxxxxxxxx
Subject: Re: deleting 2TB lots of files with delaylog: sync helps?
From: Stan Hoeppner <stan@xxxxxxxxxxxxxxxxx>
Date: Thu, 02 Sep 2010 03:41:59 -0500
In-reply-to: <20100902070108.GY705@dastard>
References: <201009010130.41500@xxxxxx> <20100901000631.GO705@dastard> <201009010222.57350@xxxxxx> <20100901031954.GP705@dastard> <4C7DD99F.7000401@xxxxxxxxxxxxxxxxx> <20100901064439.GR705@dastard> <4C7F3823.1040404@xxxxxxxxxxxxxxxxx> <20100902070108.GY705@dastard>
User-agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2.8) Gecko/20100802 Thunderbird/3.1.2
Dave Chinner put forth on 9/2/2010 2:01 AM:

> No, that's definitely not the case. A different kernel in the 
> same 8p VM, 12x2TB SAS storage, w/ 4 threads, mount options "logbsize=262144"
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>      0       800000            0      39554.2          7590355
> 
> 4 threads with mount options "logbsize=262144,delaylog"
> 
> FSUse%        Count         Size    Files/sec     App Overhead
>      0       800000            0      67269.7          5697246

What happens when you bump each of these to 8 threads, 1 per core?  If
the test consumes all cpus/cores, what instrumentation are you viewing
that tells you the cpu utilization _isn't_ due to memory b/w starvation?

A modern 64 bit 2 GHz core from AMD or Intel has an L1 instruction issue
rate of 8 bytes/cycle * 2,000 MHz = 16,000 MB/s = 16 GB/s per core.  An
8 core machine would therefore have an instruction issue rate of 8 * 16
GB/s = 128 GB/s.  A modern dual socket system is going to top out at
24-48 GB/s, well short of the instruction issue rate.  Now, this doesn't
even take the b/w of data load/store operations into account, but I'm
guessing the data size per directory operation is smaller than the total
instruction sequence, which operates on the same variable(s).

So, if the CPUs are pegging, and we're not running out of memory b/w,
then this would lead me to believe that the hot kernel code, core
fs_mark code and the filesystem data are fully, or near fully, contained
in level 2 and 3 CPU caches.  Is this correct, more or less?

> You are free to choose to believe I don't know I'm doing - if you
> can get XFS to perform better, then I'll happily take the patches ;)

Not at all.  I have near total faith in you Dave.  I just like to play
Monday morning quarterback now and then.  It allows me to show my
knuckles drag the ground, and you an opportunity to educate me, and
others, so we can one day walk upright when discussing XFS. ;)

> Did that a long time ago - it's in the archives a few months back.

I'll have to dig around.  I've never even looked for the archives for
this list.  It's hopefully mirrored in the usual places.

Out of curiosity, have you ever run into memory b/w starvation before
peaking all CPUs while running this test?  I could see that maybe
occurring with dual 1GHz+ P3 class systems with their smallish caches
and lowly single channel PC100, back before the switch to DDR memory,
but those machines were probably gone before XFS was open sourced, IIRC,
so you may not have had the pleasure (if you could call it that).

-- 
Stan

<Prev in Thread] Current Thread [Next in Thread>