As more and more diverse groups try to add different optimizations to Pro64,
one important aspect has not been talked about is how memory should be
handled.
I've had some discussion with Raymond Lo and Wilson Ho a couple of months
ago.
The following are notes I've taken that gives our own conclusion on how to
get memory (in place of calling malloc and friends directly).
In case you don't know, in Pro64, the preferred method to get memory is
through
MEMPOOLS. This write up is to fill in some gap for new writers to Pro64.
The notes are taken w.r.t. MEMPOOL usage in cg. But the same really holds
true
for all backend phases (namely, IPA, LNO, WOPT and CG)
The following are ways to handle mempool in CG.
1. The old C style way
MEM_POOL_Push(&MEM_local_pool);
MEM_POOL_Push(&MEM_local_nz_pool);
the two different pools are passed to relevant routines that needs memory,
later on, used like the following:
loop_set = BB_SET_Create_Empty(PU_BB_Count + 2, pool);
where pool is the parameter with actual being &MEM_local_pool
There needs to be a finalize routine that pops the mempools.
2. a little newer way, for local temp memory, this method can simply
use alloca
e.g.
Local_Insn_Sched (void) {
for (bb = REGION_First_BB; bb != NULL; bb = BB_next(bb)) {
if (bb->Spilled() || !bb->Scheduled()) {
MEM_POOL_Push(&MEM_local_pool);
sched = CXX_NEW(LOCAL_SCHEDULER(bb), &MEM_local_pool);
sched->Schedule();
CXX_DELETE(sched, &MEM_local_pool);
MEM_POOL_Pop(&MEM_local_pool);
}
}
}
3. a more sophistical way that works in cg phase also, I am taking
the fb_cfg.h and fg_cfg.cxx as example. I've enclosed the source as
attachment.
essentially, a class
class FB_CFG_MEM {
protected:
MEM_POOL _m;
FB_CFG_MEM() {
MEM_POOL_Initialize( &_m, "FB_CFG_MEM", true);
MEM_POOL_Push( &_m );
}
~FB_CFG_MEM() {
MEM_POOL_Pop( &_m );
MEM_POOL_Delete( &_m );
}
};
FB_CFG: public FB_CFG_MEM {
...
hashmap< LABEL_IDX, FB_NODEX, hash<LABEL_IDX>, equal_to<LABEL_IDX>,
mempool_allocator< pair<LABEL_IDX, FBNODEX> > > _lblx_to_nx;
deque< FB_EDGE_DELAYED, mempool_allocator<FB_EDGE_DELAYED> > _delayed_edges;
FB_NODEX _curr_nx;
...
};
The FB_CFG graph is implemented as a deque of nodes and edges.
Here, no mempool operation is exposed in the routines such as
FB_CFG::New_Node, Add_edge.
The constructors for the containers _nodes, _lblx_to_nx and _delayed_edges
are explicitly
called from the constructor of FB_CFG and the mempool _m is passed down
there,
making sure mempool is properly initialized, stacked and deleted upon exit.
We all agreed the method #1 is NOT desirable since there hides an assumed
method of
push and pop with a initialize routine and finalize routine, but no way to
enforce implementors
to do that in the right sequence.
Currently, CG uses #1 or #2. The problem with #2 is (according to Raymond):
when you "break" out of loop or "return" from a procedure, you have to
insert MEM_POOL_pop at each exit paths. And I usually don't get it
correct the first time, or forget to add the necessary insert
when I fix bugs (the MEM_POOL_Push is too far up in the procedure!).
For #3, let me do more explanation:
If you look at CXX_MEM_POOL, it does the same thing as FB_CFG_MEM.
So class FB_CFG could have been declared as
class FB_CFG : public CXX_MEM_POOL {...};
Memory used by the containers (i.e., the deque and hashmap here)
is released by the containers themselves. In this example, it
simply tells the containers to use the specified mempool as memory
allocator. Any object created by these containers will accquire
memory from this mempool. And the memory will be released back to
the mempool when these objects' destructors are called.
According to Wilson:
There were two reasons to implement (3). First, that's the
standard way to integrate a user-defined memory allocation scheme
with STL. You can't use mempool with STL if you stay with (1) or
(2). Second, even if you are not using STL, wrapping the mempool
in a class is better because it guarantees that each mempool is
properly deallocated at the right scope.
Some minor comments from Raymond on the side:
Just a personal style, I never use MEM_POOL_Push (except to initialize a
MEMPOOL). I always create a new MEMPOOL, since there is no reason to
reuse of MEMPOOL for different things. That's enforced by (3)
On a different issue, I prefer not to use MEMPOOL with STL, since it makes
the STL constructor very ugly. I am not convinced that MEMPOOL is any
better than malloc in the case of STL, if it is not worse. For example,
STL vector already does bulk allocation. MEMPOOL might speed up STL
linked list, but who is using linked list. The advantage of malloc is that
memory is immediately returned and there is not much overheads. The
disadvantage of MEMPOOL is that it allocates in chunks of 4K or more and
is very inefficient for small data structures.
Hope this will give you enough input to talk about mempool. Also,
I suppose that's the reason it was suggested that for really temporary
memory pool usage that stays locally in the same routines, one should also
consider using alloca, it is a good compromise to avoid unnecessary
overhead of memory allocation. On the other hand, this usage should
rarely be needed.
Note that so far, we've recommended using #3 method to allocate memory.
This is using in conjunction with STL. For people new to or not familiar
with STL or for whatever reason don't want to use STL container classes,
The following usage is also possible:
in cg_swp.cxx
CXX_MEM_POOL swp_local_pool(...);
SWP_OP_vector swp_op_vector(..., swp_local_pool());
...
no cxx_new is needed, nor do you do push and pop. It
is taken care of through the constructor/destructor.
This usage is a simply variation of #3.
Why not use this method as the recommended? STL provides a lot
of data structures and access routines, there's really no need to
write your own linked list, hash, vector etc. The example cited is
because it wants to use SWP_OP_vector which is an old data structure
before STL is part of ANSI standard. I would say, the guideline is
if you have to used existing data structure that is not STL, use this
method.
|