Exploiting Java Through Binary Translation for Low Power Embedded Reconfigurable Systems

Introduction

  • What?
    • Completely run-time Binary Translation algorithm to
      • detect sequences of instructions to be executed in a reconfigurable array coupled to an embedded Java processor.
      • speed-up the system
      • reduce energy consumption
  • Why?
    • continuous growing demand for more {functional, portable, complex} appliances
      • -> embedded systems must have enough processing power.
    • Java
      • becoming more and more popular in embedded environments:
        • consumer electronics, industrial automation...
        • estimated amount of >721 million devices being shipped with Java in 2005.
      • -> careful look on embedded Java architectures & performance vs. power tradeoffs must be included in design goals.
  • Solutions:
    • Reconfigurable fabric:
      • Translate sequence of operations into combinational circuit -> speed-up & reduce energy
      • However:
        • need of special tools & compilers -> preclude software portability.
      • -> coarse-grain array + dynamic binary translation (BT)

Java Architectures and The Reconfigurable Array

  • The architecture used is Java processor has 5 stages pipeline:
    • Instruction fetch
    • Instruction decoding
    • Operand fetch
    • Execution
    • Write back
  • Reconfigurable array:
    • tightly coupled
    • implemented as an ordinary functional unit in the execution stage.
    • divided in blocks (cells)

The BT Algorithm

  • Classifies sequence of instructions that depend on each other in an operand block.
    • -> how to find an operand block?
      • when stack pointer returns to the start address previously saved, an operand block is found.
  • operand block is found:
    • a write command saving the content of the buffer to the reconfigurable cache is sent.
      • content of the buffer: list of the decoded instructions of the operand block.
      • list is made in real time while the instructions are fetched from memory.
    • PC values must also be saved in bitmap list.
      • for the detector to know when to execute which sequence of instructions corresponding to a configuration in the cache.
    • reconfiguration cache is fully associative -> future: use some replacement cache techniques
  • Main advantage of run-time analysis:
    • if the sequence of instructions is detected again, it can be executed in the array.

Experiments

  • Tool utilized to provide data on the energy consumption, memory usage and performance is a configurable compiled-code cycle accurate simulator
Topic revision: r5 - 30 Mar 2011, ToanMai
This site is powered by FoswikiCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding Foswiki? Send feedback