An improvement for the compiler of Garp reconfigurable coprocessor.
Extension of techniques from software compilation (rather than high-level synthesis)
Uniform schema for pipelining the HW execution of a broad class of loops on coprocessor.
Based heavily on compilation techniques for VLIW architectures exploiting operation-level parallelism ~ adaptation of VLIW software pipelining techniques.
Why
Apps often contain a mix of computation:
spatial computation associated with hardware
temporal computation associated with fetch-and-execute model of microprocessor.
-> motivation for hybrid architectures.
As embedded apps grow in size, the productivity benefits of automatic compilation for general purpose computing also grow.
designers begin with automatic compilation
then focus efforts on improving loops specifically for their own hardware.
The Garp Architecture
2-d array of configurable logic blocks (CLBs)
has 4 32-bit data buses & 1 32-bit address bus.
while idle, processor can use the memory buses to load configurations or transfer data between processor regs & array regs
Novel feature: fixed clock frequency & simplified timing model
Timing model enumerates exactly which combination of routing & computation delay in series can be traversed in 1 clock cycle or less
example: a module can receive an operand over a short vertical bus & perform a carry chain computation within a single cycle.
Basic Compilation Approach
Garp compiler targets loops for acceleration using reconfigurable array.
One thread of control hops back and forth between the microprocessor & the array
When encountering a loop utilizing the array during the execution:
the correct configuration for that loop is loaded. That takes minimal time if it was the most recently used or resident in the configuration cache
Live values are transferred to the array
array is activated & microprocessor goes to sleep
When loop exits:
microprocessor wakes up & determines which exit was taken & retries live variables (from the array) appropriate for that exit
continues software execution.
Loops
Garp-specific part of the compiler first recognizes natural loops in the control flow graph (CFG)
Natural loop = strongly connected sub-graph of the CFG with exactly one entry block dominating all other blocks of the loop & back-reachable from them. Blocks inside a loop form its body.
Recognizes FOR, WHILE loops or even backwards GOTO statements in the C code.