Intelligent Computing & Codesign Lab

Hardware-friendly quantization for efficient DNN accelerators

December 8th, 2021 editor Comments off

Quarry: Quantization-based ADC Reduction for ReRAM-based Deep Neural Network Accelerators, Azat Azamat, Faaiz Asim and Jongeun Lee**, Proc. of International Conference on Computer-Aided Design (ICCAD), November, 2021.

Automated Log-Scale Quantization for Low-Cost Deep Neural Networks, Sangyun Oh, Hyeonuk Sim, Sugil Lee and Jongeun Lee**, Proc. of Conference on Computer Vision and Pattern Recognition (CVPR), June, 2021.

RRNet: Repetition-Reduction Network for Energy Efficient Depth Estimation, Sangyun Oh, Hye-Jin S. Kim, Jongeun Lee and Junmo Kim, IEEE Access, 8, pp. 106097-106108, IEEE, June, 2020.

Successive Log Quantization for Cost-Efficient Neural Networks Using Stochastic Computing, Sugil Lee, Hyeonuk Sim, Jooyeon Choi and Jongeun Lee**, Proc. of the 56th Annual ACM/IEEE Design Automation Conference (DAC), pp. 7:1-7:6, June, 2019.

Categories: Research Tags:

In-Memory Computing DNN Hardware Using Emerging Memory

December 8th, 2021 editor Comments off

Related Publications

Fast and Low-Cost Mitigation of ReRAM Variability for Deep Learning Applications, Sugil Lee, Mohammed Fouda, Jongeun Lee**, Ahmed Eltawil and Fadi Kurdahi, Proc. of International Conference on Computer Design (ICCD), October, 2021.

Cost- and Dataset-free Stuck-at Fault Mitigation for ReRAM-based Deep Learning Accelerators, Giju Jung, Mohammed Fouda, Sugil Lee, Jongeun Lee**, Ahmed Eltawil and Fadi Kurdahi, Proc. of Design, Automation and Test in Europe (DATE), pp. 1733-1738, February, 2021.

IR-QNN Framework: An IR Drop-Aware Offline Training of Quantized Crossbar Arrays, Mohammed E. Fouda, Sugil Lee, Jongeun Lee, Gun Hwan Kim, Fadi Kurdahi and Ahmed Eltawil, IEEE Access, 8, pp. 228392-228408, IEEE, December, 2020.

Architecture-Accuracy Co-optimization of ReRAM-based Low-cost Neural Network Processor, Segi Lee, Sugil Lee, Jongeun Lee**, Jong-Moon Choi, Do-Wan Kwon, Seung-Kwang Hong and Kee-Won Kwon, Proc. of the 30th ACM Great Lakes Symposium on VLSI (GLSVLSI), pp. 427-432, September, 2020.

Learning to Predict IR Drop with Effective Training for ReRAM-based Neural Network Hardware, Sugil Lee, Mohammed Fouda, Jongeun Lee**, Ahmed Eltawil and Fadi Kurdahi, Proc. of the 57th Annual ACM/IEEE Design Automation Conference (DAC), pp. 1-6, July, 2020.

Categories: Research Tags:

Deep Neural Network based on Stochastic Computing

December 8th, 2021 editor Comments off

Related Publications

Bitstream-based Neural Network for Scalable, Efficient and Accurate Deep Learning Hardware, Hyeonuk Sim and Jongeun Lee**, Frontiers in Neuroscience, 14, pp. 1198, Frontiers, December, 2020.

Cost-effective Stochastic MAC Circuits for Deep Neural Networks, Hyeonuk Sim and Jongeun Lee**, Neural Networks, 117, pp. 152-162, Elsevier, September, 2019.

Log-Quantized Stochastic Computing for Memory and Computation Efficient DNNs, Hyeonuk Sim and Jongeun Lee**, Proc. of the 24th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 280-285, January, 2019.

An Efficient and Accurate Stochastic Number Generator Using Even-distribution Coding, Aidyn Zhakatayev, Kyounghoon Kim, Jongeun Lee** and Kiyoung Choi, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 37(12), pp. 3056-3066, December, 2018.

DPS: Dynamic Precision Scaling for Stochastic Computing-Based Deep Neural Networks, Hyeonuk Sim, Saken Kenzhegulov and Jongeun Lee**, Proc. of the 55th Annual ACM/IEEE Design Automation Conference (DAC), pp. 13:1-13:6, June, 2018.

Sign-Magnitude SC: Getting 10X Accuracy for Free in Stochastic Computing for Deep Neural Networks, Aidyn Zhakatayev, Sugil Lee, Hyeonuk Sim and Jongeun Lee**, Proc. of the 55th Annual ACM/IEEE Design Automation Conference (DAC), pp. 158:1-158:6, June, 2018.

FPGA Implementation of Convolutional Neural Network Based on Stochastic Computing, Daewoo Kim, Mansureh S. Moghaddam, Hossein Moradian, Hyeonuk Sim, Jongeun Lee** and Kiyoung Choi, Proc. of IEEE International Conference on Field-Programmable Technology (FPT), pp. 287-290, December, 2017.

Accurate and Efficient Stochastic Computing Hardware for Convolutional Neural Networks, Joonsang Yu, Kyounghoon Kim, Jongeun Lee* and Kiyoung Choi, Proc. of IEEE International Conference on Computer Design (ICCD), pp. 105-112, November, 2017.

A New Stochastic Computing Multiplier with Application to Deep Convolutional Neural Networks, Hyeonuk Sim and Jongeun Lee**, Proc. of the 54th Annual ACM/IEEE Design Automation Conference (DAC), pp. 29:1-29:6, June, 2017.

Scalable Stochastic-Computing Accelerator for Convolutional Neural Networks, Hyeonuk Sim, Dong Nguyen, Jongeun Lee** and Kiyoung Choi, Proc. of the 22nd Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 696-701, January, 2017.

Categories: Research Tags:

Architecture Optimization for AI Accelerators

December 7th, 2021 editor Comments off

Related Publications

Specializing CGRAs for Light-Weight Convolutional Neural Networks, Jungi Lee and Jongeun Lee**, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), November, 2021. (online publication)

NP-CGRA: Extending CGRAs for Efficient Processing of Light-weight Deep Neural Networks, Jungi Lee and Jongeun Lee**, Proc. of Design, Automation and Test in Europe (DATE), pp. 1408-1413, February, 2021.

SparTANN: Sparse Training Accelerator for Neural Networks with Threshold-based Sparsification, Hyeonuk Sim, Jooyeon Choi and Jongeun Lee**, Proc. of ACM/IEEE International Symposium on Low Power Electronics and Design (ISLPED), pp. 211–216, August, 2020.

Double MAC on a DSP: Boosting the Performance of Convolutional Neural Networks on FPGAs, Sugil Lee, Daewoo Kim, Dong Nguyen and Jongeun Lee**, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 38(5), pp. 888-897, May, 2019.

Efficient FPGA Implementation of Local Binary Convolutional Neural Network, Aidyn Zhakatayev and Jongeun Lee**, Proc. of the 24th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 699-704, January, 2019.

XOMA: Exclusive On-Chip Memory Architecture for Energy-Efficient Deep Learning Acceleration, Hyeonuk Sim, Jason H. Anderson and Jongeun Lee**, Proc. of the 24th Asia and South Pacific Design Automation Conference (ASP-DAC), pp. 651-656, January, 2019.

FPGA Prototyping of Local Binary Convolutional Neural Network, Segi Lee, Aidyn Zhakatayev and Jongeun Lee**, Proc. of the 26th Korean Conference on Semiconductors, February, 2019.

Design Space Exploration of FPGA Accelerators for Convolutional Neural Networks, Atul Rahman, Sangyun Oh, Jongeun Lee** and Kiyoung Choi, Proc. of Design, Automation and Test in Europe (DATE), pp. 1147-1152, March, 2017.

Double MAC: Doubling the Performance of Convolutional Neural Networks on Modern FPGAs, Dong Nguyen, Daewoo Kim and Jongeun Lee**, Proc. of Design, Automation and Test in Europe (DATE), pp. 890-893, March, 2017.

Efficient FPGA Acceleration of Convolutional Neural Networks Using Logical-3D Compute Array, Atul Rahman, Jongeun Lee** and Kiyoung Choi, Proc. of Design, Automation and Test in Europe (DATE), pp. 1393-1398, March, 2016.

Categories: Research Tags:

What do we about Deep Learning / AI?

May 5th, 2020 editor Comments off

이종은 교수는 UNIST AI 대학원에 참여하고 계십니다. AI 대학원의 커리큘럼, 진로, 참여교원 등에 관해 궁금하신 분은 UNIST AI 대학원 홈페이지 참고하세요: http://aigs.unist.ac.kr/.

이종은 교수는 설계자동화 분야의 전문가로서, 삼성전자 등 대기업 연구소 및 국내외 대학들과 활발한 공동연구를 진행하고 있습니다.

주된 연구 주제는 AI 하드웨어에 설계 자동화 기술을 적용하는 것인데요 (아래 그림), 대개의 AI 연구와는 달리, AI 알고리즘의 성능 뿐만 아니라 하드웨어 구현 특성을 고려한 알고리즘-하드웨어 통합설계를 주로 연구해오고 있습니다.

또한 최근에는 stochastic computing 기반의 딥뉴럴넷 프로세서를 개발하여, 세계 최고 수준의 효율성과 정확도를 선보였습니다.

가장 최근에는 AI를 설계자동화에 적용하는 연구로도 확장하고 있습니다.

질문이나 연구 참여 관심 있으신 분은 여기에 메모 남겨주세요.

Categories: Research Tags:

Heterogeneous Parallel Computing

February 10th, 2014 editor Comments off

Heterogeneous Parallel Computing (HPC) is about utilizing a set of very different processors such as CPU and GPU efficiently and with ease. The HPC Research Group in ICCL lab is actively addressing the following research questions to help realize better heterogeneous parallel computing platforms and applications.

Research Agenda

communication problem
- between CPU and accelerators (GPU, VLIW, loop accelerators, ASIC, etc.)
memory organization and management problem
- shared memory? cache? scratch-pad memory?
domain specific architecture
- for example, computer vision or machine learning

Publications

Fast Shared On-Chip Memory Architecture for Efficient Hybrid Computing with CGRAs, Jongeun Lee, Yeonghun Jeong, and Sungsok Seo, Proc. of Design, Automation and Test in Europe (DATE ’13), March, 2013.
Software-Managed Automatic Data Sharing for Coarse-Grained Reconfigurable Coprocessors, Toan X. Mai and Jongeun Lee*, Proc. of International Conference on Field-Programmable Technology (FPT ’12), pp. 277-284, December, 2012.
CRM: Configurable Range Memory for Fast Reconfigurable Computing, Jongkyung Paek, Jongeun Lee*, and Kiyoung Choi, Proc. of Reconfigurable Architecture Workshop (RAW ’11), pp. 158-165, May, 2011.

Reducing communication overhead (1) — shared memory approach

Categories: Research Tags:

Reconfigurable Computing

December 9th, 2013 editor Comments off

Multi-core and even many-core processors have been successfully used in other domains. Reconfigurable array processors, for instance, have been actively researched and used as an on-chip accelerator for stream processing applications and embedded processors, due to their extremely low power and high performance execution, compared to general purpose processors or even DSPs (digital signal processors).

However, the main challenge in such accelerator-type reconfigurable processors is compilation — the problem of how to map applications onto the architecture. At the heart of this problem is the 2D placement-and-routing problem, which is traditionally recognized as a CAD problem, which is why this problem is often discussed in the design automation communities. Still the problem needs more research and development efforts (such as mature tool chains) for more wide-spread adoption of the architecture.

The ICCL Lab is actively pursuing research on this topic, with a few specific goals in mind. We have two granted projects on this, partially in collaboration with other labs.

Figure 1: An accelerator-type reconfigurable processor architecture (Bougard et al. ’08).

Research Questions

how to compile the usual C programs (“legacy”) onto coarse-grained reconfigurable architectures?
can there be good architectural solutions (such as architecture extensions) to make it much easier to map programs to these architectures (“compiler-friendly architectures”)?
what are the real bottleneck to enhancing performance through these processors and how to address them?
- application level mapping problem

Publications

Compiling Control-Intensive Loops for CGRAs with State-Based Full Predication, Kyuseung Han, Kiyoung Choi, and Jongeun Lee, Proc. of Design, Automation and Test in Europe (DATE ’13), March, 2013.
Architecture Customization of On-Chip Reconfigurable Accelerators, Jonghee W. Yoon, Jongeun Lee*, Sanghyun Park, Yongjoo Kim, Jinyong Lee, Yunheung Paek, and Doosan Cho, ACM Transactions on Design Automation of Electronic Systems (TODAES), 18(4), pp. 52:1-52:22, ACM, October, 2013.
Improving Performance of Nested Loops on Reconfigurable Array Processors, Yongjoo Kim, Jongeun Lee*, Toan X. Mai, and Yunheung Paek, ACM Transactions on Architecture and Code Optimization (TACO), 8(4), pp. 32:1-32:23, ACM, January, 2012.
Exploiting Both Pipelining and Data Parallelism with SIMD Reconfigurable Architecture, Yongjoo Kim, Jongeun Lee*, Jinyong Lee, Toan X. Mai, Ingoo Heo, and Yunheung Paek, Proc. of International Symposium on Applied Reconfigurable Computing (ARC ’12), Lecture Notes in Computer Science, vol. 7199, pp. 40-52, March, 2012.
High Throughput Data Mapping for Coarse-Grained Reconfigurable Architectures, Yongjoo Kim, Jongeun Lee*, Aviral Shrivastava, Jonghee W. Yoon, Doosan Cho, and Yunheung Paek, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(11), pp. 1599-1609, IEEE, November, 2011.
Memory Access Optimization in Compilation for Coarse Grained Reconfigurable Architectures, Yongjoo Kim, Jongeun Lee*, Aviral Shrivastava, and Yunheung Paek, ACM Transactions on Design Automation of Electronic Systems (TODAES), 16(4), pp. 42:1-42:27, ACM, October, 2011.

Memory-Aware Mapping for Reconfigurable Architectures

Categories: Research Tags:

Multi-core Computing

February 7th, 2011 editor Comments off

Parallel processing, or multi-core compilation, is rather a challenge than a blessing, since it means the era of free ride is over. When the hardware performance kept doubling every 18 months, the same software could be run twice as before by running it on a new hardware. But now it is no longer true, unless the software can somehow exploit the increased parallelism by the new machine.

There are many challenges including how to compile applications, or in particular, how to map code and data onto various heterogeneous as well as homogeneous processor cores, and manage them efficiently. The management must include aspects of not only performance optimization, but of thermal management, energy and power optimization (such as Dynamic Voltage and Frequency Scaling), and reliability (such as soft error resilience). These multi-dimensional, multi-objective problems require innovative ideas and approaches on architecture, compiler, computer-aided design, operating system, and algorithm levels.

In ICCL Lab, we are particularly looking at data management problems for distributed memory architectures such as the Cell processor architecture, where simple scratchpad memories are used instead of power-hungry caches to save power.

Figure 1: Sony/Toshiba/IBM Cell.

Publications

Software-based Register File Vulnerability Reduction for Embedded Processors, Jongeun Lee and Aviral Shrivastava, ACM Transactions on Embedded Computing Systems (TECS), 13(1s), pp. 38:1-38:20, ACM, November, 2013.
Return Data Interleaving for Multi-Channel Embedded CMPs Systems, Fei Hong, Aviral Shrivastava, and Jongeun Lee, IEEE Transactions on Very Large Scale Integration Systems (TVLSI), 20(7), pp. 1351-1354, IEEE, July, 2012.
Static Analysis of Register File Vulnerability, Jongeun Lee and Aviral Shrivastava, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(4), pp. 607-616, IEEE, April, 2011.
A Compiler-Microarchitecture Hybrid Approach to Soft Error Reduction for Register Files, Jongeun Lee and Aviral Shrivastava, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 29(7), pp. 1018-1027, IEEE Press, July, 2010.
Cache Vulnerability Equations for Protecting Data in Embedded Processor Caches from Soft Errors, Aviral Shrivastava, Jongeun Lee*, and Reiley Jeyapaul, ACM SIGPLAN Notices (LCTES ’10), 45(4), pp. 143-152, April, 2010.
A Software-Only Solution to Use Scratch Pads for Stack Data, Aviral Shrivastava, Arun Kannan, and Jongeun Lee*, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 28(11), pp. 1719-1727, November, 2009.

Circular Stack Management for Scratchpad Memory

Categories: Research Tags:

Intelligent Computing & Codesign Lab

Archive

Hardware-friendly quantization for efficient DNN accelerators

In-Memory Computing DNN Hardware Using Emerging Memory

Deep Neural Network based on Stochastic Computing

Architecture Optimization for AI Accelerators

What do we about Deep Learning / AI?

Heterogeneous Parallel Computing

Research Agenda

Publications

Related Articles

Reconfigurable Computing

Research Questions

Publications

Related Articles

Multi-core Computing

Publications

Related articles

Recent Posts

Links

Categories

Archives

Meta