EEE326 Projects

Table of Contents

Fall 2024
Instructor: Jongeun Lee
TAs (officially): Minsang Yu, Tairali Assylbekov


Note 1: See Useful Links below for links to tools and documentation. See Final Project.
Note 2: The Labs description below is based on a previous offering of the same course, and may change during the semester.

1. Lab 1: Learning Pynq Z2 board using FFT example <2024-09-13 Fri>

1.1. Step 1

  • Install Vivado
    • Remember to add pinmap file

1.2. Step 2: generate the bitstream file for FFT design

  • Open top-level design and do synthesis, creating a bitstream file

1.3. Step 3

  • Upload the bitstream file to Pynq-Z2 via web browswer

1.4. Step 4 (jupyter)

  • Run FFT on Pynq FPGA, analyze runtime between SW numpy vs. Pynq FPGA acceleration

1.5. Report

  • Draw a graph showing runtime vs. array size for (i) SW numpy and (ii) Pynq FPGA. You may plot both graphs in a single figure.
  • Verify the functionality of FFT, by feeding a sign function as the input to FFT. Verify the output of FFT.

1.5.1. How to submit a report

  • Submit one report per group via Bb
  • Specify the contribution of each person in the report

2. Lab 2: Vivado HLS - Scalar Add <2024-09-20 Fri>

2.1. Dependencies

  • Make sure you have installed Vivado HLS

2.2. Procedure

  1. Write a module in Vivado HLS
  2. Write test software
  3. Run C-simulation
  4. Verify the result of C-simulation
  5. Run C-synthesis
  6. Understand the result of C-synthesis
  7. Run Cosim
  8. Export the design as an IP
  9. Import the IP in the Vivado design environment
  10. Generate a bitstream
  11. Create an overlay, and make a driver in Python

2.3. What to submit

  1. A compressed zip file of the generated IP files, (b) a bitstream file, (b) a Python Jupyter Notebook file;
  2. A report, which should include a top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

3. Lab 3: Vivado HLS - Vector Add <2024-09-27 Fri>

3.1. Description

  • This week's lab is similar to the previous week's. The main difference is that we use vectors for input and output.

3.2. Dependencies

  • Vivado HLS
  • Vivado
  • Pynq board

3.3. Procedure

3.3.1. Part 1

  1. Create a new Vivado HLS project
  2. Write vadd module and testbench (vadd.c, vadd.h, tb_vadd.cpp)
  3. Run C-simulation
  4. Run C-synthesis
  5. Observe the error msg
  6. Fix the error by adding INTERFACE pragma (run C-synthesis again)
  7. Observe the synthesis result, esp. latency
  8. Add the UNROLL pragma, and run C-synthesis again
  9. Observe how latency is changed
  10. Think about how to reduce latency

3.3.2. Part 2

  1. Make sure that your vadd module has an interrupt port

    #progma HLS INTERFACE s_axilite port=return bundle=control
    
  2. Create a new Vivado project
  3. Instantiate the vadd module; observe that it has interrupt output port
  4. Instantiate AXI Interconnects (and other necessary modules) and complete the top-level design
  5. Make sure you configure IPs as necessary (AXI Interrupt Controller, AXI Interconnects, Processing System, etc.)
  6. Validate the design, create HDL Wrapper, and generate bitstream

3.3.3. Part 3

  1. Open Jupyter notebook
  2. Load overlay and write Python code using the overlay (without using interrupt)
  3. See if the result is correct (will it be correct even if computation takes much longer?)
  4. Write Python code using interrupt as demonstrated in the lab
  5. Check if the result is correct even when computation takes long

3.4. Note

  • The previous lab was reviewed.

3.5. Assignment

  • Do the SAXPY example. Use the following function prototype (You may hard-code SIZE)

    void saxpy_hw(float *x, float *y, float a);     // y[i] = y[i] + a * x[i]
    
  • Explore various methods to improve latency, and explain your findings in the report
  • Submit the report and other files as usual

4. Lab 4: Vivado HLS - Vector Add with DMA <2024-10-11 Fri>

4.1. Description

  • Improve the performance of vadd using DMA
  • Learn to use the hls::stream<> class

4.3. Procedure

  1. Study the vadd example in Ref. 1 above (see also Ref. 2 for HLS stream library). Note: This code may not run on the Pynq board.
  2. To make it run on Pynq, we will mainly define/use push_stream and pop_stream functions along with some minor modification of the code.
  3. Refer to Ref. 3 above for an exmple of how to use the ap_axiu type, which is defined in "ap_axi_sdata.h".
  4. The rest of the flow is the same as in the previous lab.

4.4. Assignment

  • Implement the following SAXPY kernel in hardware using DMA as we learned in the lab.

    void saxpy_hw(float *z, float *x, float *y, float a);     // z[i] = a * x[i] + y[i]
    
    1. Note the order of the parameters.
    2. Follow the advice given by Jaewoo, TA, about performing buffer allocation in the constructor of a Python class.
  • Submit source code (including testbench and header files if any and Python/Jupyter files) as well as your implementation files (IP, bitstream).
  • Submit also a report, which should include the top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

5. Lab 5: Vivado HLS - Matrix Multiplication <2024-10-18 Fri> (take-home lab)

5.1. Description

  • Write a basic matrix multiplication kernel and verify its functionality on a Pynq board using the Python layer
  • The use of DMA as in the previous lab is required.
  • Optimization of the kernel (such as loop unrolling and the use of buffers) is NOT required.

5.2. Assignment

  • Implement a matrix multiplication kernel in hardware using Vivado HLS

    void matmul_hw(float *C, float *A, float *B);    // C = A * B
    
    • The sizes of the matrices are fixed as follows (you may hard-code them): float C[128][128], A[128][256], B[256][128]float C[64][64], A[64][128], B[128][64]
    • Note: the top-level function of HLS module should have a slightly different prototype
  • Use DMA as in the previous lab. Use the DMA function provided: dma_template.h.
    • The use of different template parameters in ap_axiu<32,4,5,5> is to avoid compatibility issues.
    • Since we're using DMA, the interface of the top-level function is very similar to that of the previous lab, but internally you'll have to change how to handle the values you recieve via AXIS. Specifically, you won't be able to use hls::stream<> as in the previous lab, but should store them in a local memory due to the complex access pattern required in the compute loop.
  • As usual, submit your source code as well as your implementation files; Submit also a report, which should include the top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

6. Lab 6: Customizing a Python-based Neural Network Framework <2024-11-01 Fri>

6.1. Description

Learn how to use a Python (PyTorch)-based neural network framework and how to customize it so (some) inference computation can be offloaded to hardware.

6.2. Procedure

  1. MNIST neural network inference using PyTorch. (mnist.py)
    • Training included
  2. Consider how to use HW MatMul for the MNIST neural network inference.
  3. Write Python-based MNIST inference code. (numpymnist.py) (numpy_layer.py)
  4. Override Linear layer with a function that calls MatMul HW.
  5. In order to use integer MatMul HW, we have to change the type of input operands to MatMul. Modify the Python code accordingly (recall the quantization method discussed in lecture).

6.3. Assignment

  • Port the MNIST inference code to the Pynq board, by offloading the second layer to the hardware. For this, you'll need to generate N-bit integer MatMul hardware. (ARM-Python is already included in Pynq.)
  • Try to minimize the parameter "N", which is the precision of activation and weight tensors.
  • As usual, submit your source code as well as your implementation files. Submit also a report as usual (see the next item for the contents of the report).
  • In the next lab, each team will give a presentation on their work, which should include the Pynq board execution result, classification accuracy, total execution time, and the value of the parameter "N".

7. Lab 7: Presentation + Optimization Methods <2024-11-08 Fri>

7.1. Presentation of Lab 7 Results

Integrating the Pynq board and the hardware accelerator running on it with the application software is an important milestone. Hence we'll have a presentation session in the first half of the lab. Please prepare to present your Lab 7 results; best if everyone presents.

7.2. Optimization Methods Description

(i) Learn to use array partition and array reshape via a simple matrix multiplication example. (ii) Support arbitrarily large matrices.

7.3. Procedure

  1. Observe the inefficiency in a simple matrix multiplication example
  2. Apply array partition and observe the difference in synthesis results
  3. Consider how to support matrix multiplications with arbitrarily large size matrices
  4. Implement your idea and verify the results

7.4. Assignment

  • Final project (due: TBD)

8. Final Project (TBD)

9. Useful Links

9.1. Installation Procedure (maybe not up-to-date)

Here's the installation procedure.

  1. System requirements: Windows10 or Ubuntu 18.04 only. (WARNING: Ubuntu 18.04.05 may NOT work.) Note: Large HDD space required (installation file alone is >10GB).
  2. Visit https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/archive.html and download the installation file
  3. Apply patch: https://support.xilinx.com/s/article/76960. Otherwise, it will cause problems with Vivado HLS.
  4. Apply PYNQ-Z2 board file: pynq-z2.zip. Unzip the file at your_vivado_path/Vivado/2020.1/data/boards/board_files. After unzipping, you should have data/boards/board_files/pynq-z2/A.0

It takes literally hours to complete installation. So start installing Vivado today!

Post your questions on BlackBoard if you have any!

Author: Jongeun Lee

Date: 2024-09-20 Fri 17:38

Homepage: ICCL.UNIST.AC.KR