xfs
[Top] [All Lists]

Re: TAKE - make xfs's in memory extents host byte ordered

To: Andi Kleen <ak@xxxxxxx>
Subject: Re: TAKE - make xfs's in memory extents host byte ordered
From: Stephen Lord <lord@xxxxxxx>
Date: 10 Oct 2002 19:31:34 -0500
Cc: linux-xfs@xxxxxxxxxxx
In-reply-to: <20021010213259.A23019@xxxxxxxxxxxxx>
References: <200210101909.g9AJ9pd12773@xxxxxxxxxxxxxxxxxxxx> <20021010213259.A23019@xxxxxxxxxxxxx>
Sender: linux-xfs-bounce@xxxxxxxxxxx
On Thu, 2002-10-10 at 14:32, Andi Kleen wrote:
> On Thu, Oct 10, 2002 at 02:09:51PM -0500, Steve Lord wrote:
> > Switch xfs from using a big endian internal representation for
> > the in memory copy of extents to a host byte order representation.
> > The internal extents are read in once, then modified seperately
> > from the on disk ones. Since we search and manipulate the extents
> > multiple times, it is cheaper to convert them to host byte order
> > once and then keep them in that format. Worth about 5 to 10%
> > reduction in cpu time for some loads. Complicated by the fact
> > that the in memory extents are written out to the log sometimes,
> > and when expanding extents are used to write out the initial
> > block of extents.
> 
> 
> That is quite surprising. BSWAP (= htonl) is a single cycle on Athlon.
> On Intel it's likely similar.
> 
> When you do it for 64bit it's a bit more expensive, but not much when you
> use BSWAP and shift. Are you sure it's just not bogus code generated by the 
> compiler ? Perhaps some problem with the way your 64bit conversion works
> (does it use BSWAP for the 64bit conversion too?)
> 
> BSWAP should be only used when you compile the kernel for 586+
> So i386 compiled generic kernels will be slower too.
> 

I should be compiling a 586 kernel (P-III right?), possibly there is
something going on in the endian swapping code which is inefficient.
we do call down to the kernel primitives that networking uses. The
preprocessor output looks horrible, it should all compile down to
nothing, but perhaps it is not always. Hmm, need to find someone
to take a closer look at this perhaps.

One definite number I do have (for a dual 450MHz platform) is cutting
cpu time to remove a complete kernel tree from ~3.7 to ~3.5 seconds.

Steve



<Prev in Thread] Current Thread [Next in Thread>