You are here:
Foswiki
>
Main Web
>
SimpleScalar
>
ResearchTopics
>
ReconfigurableComputingReadingList
>
Bala2000
(09 Apr 2011,
ToanMai
)
Edit
Attach
Dynamo: A Transparent Dynamic Optimization System
Introduction
Overview
Startup & Initialization
Fragment Formation
Trace selection
Introduction
What?
Dynamo - a software dynamic optimization system
transparently improve performance of native instruction stream
Why?
Static Compiler Optimizations is becoming less effective
Software side:
Softwares now often need Dynamic Linked Libraries
Hardware side:
offloading more complexity to sw compiler: CISC->RISC->VLIW.
->
greater performance burden
on static compiler
while more obstacles
to static compiler analysis.
-> leads to:
complex compiler software
modest performance gains
on
general-purpose
apps
highly customized
compilers for
very narrow
classes of apps
How?
COMPLEMENT
not
COMPETE
with the compiler
Dynamo operates @ runtime
Interprets native instruction stream either from
traditional optimizing compiler
dynamically generated by an app (JIT...)
Opportunities for Dynamo depends on the source of input.
Overview
Dynamo interprets the stream until a "hot" instruction sequence (
trace
) is identified
generates optimized version of that trace (
fragment
) into
fragment cache
next time encountering the
entry address
of the
fragment
-> get from the cache (no need to interpret anymore)
Flow of control:
starts by interpreting until a taken branch is encountered (A)
lookup the branch in the
fragment cache
(B)
if
found
-> jump to fragment in cache (F)
if
not
-> check
start-of-trace
condition (C)
what's
start-of-trace
?
loop headers
exits from
previously identified hot traces
if
yes
-> increase counter associated with that
branch target address
(D)
if counter > preset hot threshold (E)
get into
code generation mode
(G)
interpreted sequence is recorded in a
hot trace buffer
check
end-of-trace
condition (H)
what's
end-of-trace
?
backward taken branch
if
yes
hot trace buffer
is optimized (I) into
fragment
what's
fragment
?
single-entry, multi-exit, contiguous sequence of instructions
if
too long
? ->
truncated
->
why?
how?
save to cache (J)
index =
app binary address
of the
start-of-trace
instruction
connect
to other fragments if possible! -> minimize expensive fragment cache exits
if
no
-> back to normal interpretation
Startup & Initialization
Dynamo
a user-mode DLL (shared lib)
entry point:
dynamo_exec
routine invoked by app
-> remainder of the app code is
under Dynamo control
dynamo_exec
saves app's context (machine regs, stack env, etc.) to
app-context
swaps the stack env to
Dynamo's stack
-> no interference with the runtime stack of app
Interpreter (A)
starts interpreting app code from
return-pc
using the context saved in
app-context
The interpreter
never returns
to
dynamo_exec
With Nynamo installed, need an invoke for
dynamo_start
prior to the jump to
_start
(the app's main entry point).
Dynamo
maps & manages
a separate area of memory:
contains all dynamically allocated objects in Dynamo code
access to this area is
protected
Fragment Formation
Performance improvement opportunities:
redundancies cross static program boundaries:
procedure calls, returns, virtual function calls, indirect branches & dynamically linked function calls
instruction cache utilization:
frequently
executing instructions are often
non-contiguous
in
app binary
unit
of optimization is a
trace
Trace selection
Use MRET (most recently executed tail) instead of profile-based approach for speculating
E
dit
|
A
ttach
|
P
rint version
|
H
istory
: r13
<
r12
<
r11
<
r10
|
B
acklinks
|
V
iew wiki text
|
Edit
w
iki text
|
M
ore topic actions
Topic revision: r13 - 09 Apr 2011,
ToanMai
Main
Log In
Main Web
Create New Topic
Index
Search
Changes
Notifications
RSS Feed
Statistics
Preferences
Webs
Main
Sandbox
System
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback