Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling

Introduction

  • What?
    • Modulo scheduling algorithm to exploit loop-level parallelism for coarse-grained reconfigurable architectures (CGRAs).
  • Why?
    • CGRAs have several advantages:
      • delay, area, power & configuration time are reduced compared with FPGAs
      • predictable timing, small configuration storage space
      • combination with a GP processor
      • partial connectivity makes them scalable but still cost- & power-efficient
    • CGRAs have become important & Automatic design and compilation is essential to their success.
  • How?
    • combine ideas from FPGA P&R and Modulo Scheduling from VLIW compilation

The Target Architecture

  • a regular array of FUs & regs
    • FUs:
      • execute heterogeneous operations
      • support predicate operation
        • to be applicable to different types of loops
        • essential to remove loop-back operation

Modulo Scheduling

  • combination of 3 sub-problems: placement, routing and scheduling
    • Placement: determine which FU to put an operation on
    • Scheduling: determine which cycle to execute the operation
    • Routing: connects the operations according to their data dependencies
  • Overlap different iterations of a loop -> exploit a higher degree of ILP
Topic revision: r3 - 09 Apr 2011, ToanMai
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback