Exploiting loop-level parallelism on coarse-grained reconfigurable architectures using modulo scheduling
Introduction
The Target Architecture
Modulo Scheduling
Introduction
What?
Modulo scheduling algorithm to exploit
loop-level parallelism
for
coarse-grained
reconfigurable architectures (CGRAs).
Why?
CGRAs have several advantages:
delay, area, power & configuration time are reduced compared with FPGAs
predictable timing, small configuration storage space
combination with a GP processor
partial connectivity
makes them
scalable
but still
cost- & power-efficient
CGRAs have become important & Automatic design and compilation is
essential
to their success.
How?
combine ideas from FPGA P&R and Modulo Scheduling from VLIW compilation
The Target Architecture
a regular array of FUs & regs
FUs:
execute heterogeneous operations
support
predicate
operation
to be applicable to different types of loops
essential to remove loop-back operation
Modulo Scheduling
combination of 3 sub-problems:
placement, routing and scheduling
Placement: determine
which FU
to put an operation on
Scheduling: determine
which cycle
to execute the operation
Routing: connects the operations according to their
data dependencies
Overlap different iterations of a loop -> exploit a higher degree of ILP
This topic: Main
>
SimpleScalar
>
ResearchTopics
>
ReconfigurableComputingReadingList
>
Mei2003
Topic revision:
09 Apr 2011,
ToanMai
Copyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki?
Send feedback