Reducing communication overhead (1) — shared memory approach
In a computer system employing on-chip accelerator processors such as streaming co-processor or reconfigurable hardware co-processor, it is vital to minimize communication overhead lest the communication overhead cancel out any performance improvement by the co-processor accelerator.
Our first cut approach to this problem, tailored to reconfigurable hardware co-processors that we’ve been working with, is an on-chip shared memory that is little more intelligent than just a scratch-pad memory but less so than a cache. We call it Configurable Range Memory (CRM). Here is an article introducing its basic ideas.
Application-specific hardware and reconfigurable processors can dramatically speed up compute-intensive kernels of applications, offloading the burden of main processor. To minimize the communication overhead in such a coprocessor approach, the two processors can share an on-chip memory, which may be considered by each processor as a scratchpad memory. However, this setup poses a significant challenge to the main processor, which now must manage data on the scratchpad explicitly, often resulting in superfluous data copy. This paper presents an enhancement to scratchpad, called Configurable Range Memory (CRM), that can reduce the need for explicit management and thus reduce data copy and promote data reuse on the shared memory. Our experimental results using benchmarks from DSP and multimedia applications demonstrate that our CRM architecture can significantly reduce the communication overhead compared to the architecture without shared memory, while not requiring explicit data management.
Read the full text via this.