Adapting software pipelining for reconfigurable computing, in CASES


  • What
    • An improvement for the compiler of Garp reconfigurable coprocessor.
      • Extension of techniques from software compilation (rather than high-level synthesis)
      • Uniform schema for pipelining the HW execution of a broad class of loops on coprocessor.
      • Based heavily on compilation techniques for VLIW architectures exploiting operation-level parallelism ~ adaptation of VLIW software pipelining techniques.
  • Why
    • Apps often contain a mix of computation:
      • spatial computation associated with hardware
      • temporal computation associated with fetch-and-execute model of microprocessor.
      • -> motivation for hybrid architectures.
    • As embedded apps grow in size, the productivity benefits of automatic compilation for general purpose computing also grow.
      • designers begin with automatic compilation
      • then focus efforts on improving loops specifically for their own hardware.

The Garp Architecture

  • 2-d array of configurable logic blocks (CLBs)
    • has 4 32-bit data buses & 1 32-bit address bus.
    • while idle, processor can use the memory buses to load configurations or transfer data between processor regs & array regs
  • Novel feature: fixed clock frequency & simplified timing model
    • Timing model enumerates exactly which combination of routing & computation delay in series can be traversed in 1 clock cycle or less
    • example: a module can receive an operand over a short vertical bus & perform a carry chain computation within a single cycle.

Basic Compilation Approach

  • Garp compiler targets loops for acceleration using reconfigurable array.
    • One thread of control hops back and forth between the microprocessor & the array
    • When encountering a loop utilizing the array during the execution:
      • the correct configuration for that loop is loaded. That takes minimal time if it was the most recently used or resident in the configuration cache
      • Live values are transferred to the array
      • array is activated & microprocessor goes to sleep
    • When loop exits:
      • microprocessor wakes up & determines which exit was taken & retries live variables (from the array) appropriate for that exit
      • continues software execution.


  • Garp-specific part of the compiler first recognizes natural loops in the control flow graph (CFG)
    • Natural loop = strongly connected sub-graph of the CFG with exactly one entry block dominating all other blocks of the loop & back-reachable from them. Blocks inside a loop form its body.
    • Recognizes FOR, WHILE loops or even backwards GOTO statements in the C code.
    • All are treated as WHILE

Hyperblock Selection

  • From each loop, form a hyperblock:
    • contains the commonly executed paths in the loop
    • only hyperblock is implemented on the array
Topic revision: r3 - 27 Mar 2011, ToanMai
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback