Archive

Archive for the ‘Projects’ Category

Runtime and Install-time Binary Translation for Reconfigurable Accelerators

January 28th, 2013 Comments off

In his masters thesis defense on Jan 9th, 2013, Toan presented an approach called Runtime Binary Translation (RBT) that can dynamically make use of the CGRA accelerator existing in a CGRA-based reconfigurable computing system to speedup the execution of application intermediate representation (IR) on top of a just-in-time (JIT) compiler.  A simpler approach called Install-time Binary Translation (IBT) based on RBT is also proposed. Here is the abstract of the thesis. 😀

Nowadays, softwares are often distributed in form of some machine-independent intermediate representation (IR), because compared to machine-dependent native binary, the IR is more portable across a wide range of architectures, has better security, and contains richer semantic information. However, the problem of making use of the accelerator in a target machine to speedup the execution of the IR on top of a just-in-time compiler (JIT) is challenging, mainly because the discovery of compute-intensive kernels and the partitioning of the application to the kernel and sequential parts must be done based on the IR alone, without the access to the program source code as well as the kernel information in the IR.

In this work, we propose a Runtime Binary Translation (RBT) technique that can dynamically identify and translate kernels IR to Coarse-Grained Reconfigurable Array (CGRA) accelerator configuration, and offload the execution of the kernels onto the accelerator. Also, we simplify the RBT approach to make the Install-time Binary Translation (IBT) approach, which does the partitioning and the translation right at the install-time instead of at the runtime. Experimental results show that our RBT and IBT techniques can improve the runtime of the application IR by 1.44 times and 1.61 times, respectively, compare to the runtime on the JIT that does not support making use of the accelerator.

 

Runtime Binary Translation Virtual Machine Optimized Design

Circular Stack Management for Scratchpad Memory

August 20th, 2010 Comments off
A nice Stack Management technique for a limited local memory architecture like IBM Cell, is presented at ASP-DAC and in IEEE Trans. CAD in 2009. The pointer problem was extremely difficult to handle correctly, but we managed to do it, at least from the correctness point of view.

A dynamic scratch pad memory (SPM) management scheme for program stack data with the objective of processor power  reduction is presented.  Basic  technique  does  not  need the SPM size at compile time, does not mandate any hardware changes, does not need profile information, and seamlessly integrates support for recursive functions. Stack frames are managed using a software SPM manager, integrated into the application binary, and shows average energy savings of 32% along with a performance improvement of 13%, on benchmarks from MiBench. SPM management  can  be further optimized and made pointer-safe, by knowing the SPM size.

Read the full paper: “A Software-Only Solution to Use Scratch Pads for Stack Data,” by Aviral Shrivastava, Arun Kannan, and Jongeun Lee*, published in IEEE Transactions on CAD, vol. 28, no. 11, pp. 1719-1727, November 2009.

Categories: Multicore, Publications Tags: , , , ,