Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling

Introduction

What?
- Modulo scheduling algorithm to exploit loop-level parallelism for coarse-grained reconfigurable architectures (CGRAs).
Why?
- CGRAs have several advantages:
  - delay, area, power & configuration time are reduced compared with FPGAs
  - predictable timing, small configuration storage space
  - combination with a GP processor
  - partial connectivity makes them scalable but still cost- & power-efficient
- CGRAs have become important & Automatic design and compilation is essential to their success.
How?
- combine ideas from FPGA P&R and Modulo Scheduling from VLIW compilation

The Target Architecture

a regular array of FUs & regs
- FUs:
  - execute heterogeneous operations
  - support predicate operation
    - to be applicable to different types of loops
    - essential to remove loop-back operation

Modulo Scheduling

combination of 3 sub-problems: placement, routing and scheduling
- Placement: determine which FU to put an operation on
- Scheduling: determine which cycle to execute the operation
- Routing: connects the operations according to their data dependencies
Overlap different iterations of a loop -> exploit a higher degree of ILP

This topic: Main > SimpleScalar > ResearchTopics > ReconfigurableComputingReadingList > Mei2003
Topic revision: 09 Apr 2011, ToanMai

Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback