EEE326 Projects

1. Lab 1: Learning Pynq Z2 board using FFT example <2024-09-13 Fri>
2. Lab 2: Vivado HLS - Scalar Add <2024-09-20 Fri>
3. Lab 3: Vivado HLS - Vector Add <2024-09-27 Fri>
4. Lab 4: Vivado HLS - Vector Add with DMA <2024-10-11 Fri>
5. Lab 5: Vivado HLS - Matrix Multiplication <2024-10-18 Fri> (take-home lab)
- 5.1. Description
- 5.2. Assignment
6. Lab 6: Customizing a Python-based Neural Network Framework <2024-11-01 Fri>
7. Lab 7: Presentation + Optimization Methods <2024-11-08 Fri>
8. Final Project (TBD)
9. Useful Links
- 9.1. Installation Procedure (maybe not up-to-date)

Fall 2024
Instructor: Jongeun Lee
TAs (officially): Minsang Yu, Tairali Assylbekov

Note 1: See Useful Links below for links to tools and documentation. See Final Project.
Note 2: The Labs description below is based on a previous offering of the same course, and may change during the semester.

1. Lab 1: Learning Pynq Z2 board using FFT example <2024-09-13 Fri>

1.1. Step 1

Install Vivado
- Remember to add pinmap file

1.2. Step 2: generate the bitstream file for FFT design

Open top-level design and do synthesis, creating a bitstream file

1.3. Step 3

Upload the bitstream file to Pynq-Z2 via web browswer

1.4. Step 4 (jupyter)

Run FFT on Pynq FPGA, analyze runtime between SW numpy vs. Pynq FPGA acceleration

1.5. Report

Draw a graph showing runtime vs. array size for (i) SW numpy and (ii) Pynq FPGA. You may plot both graphs in a single figure.
Verify the functionality of FFT, by feeding a sign function as the input to FFT. Verify the output of FFT.

1.5.1. How to submit a report

Submit one report per group via Bb
Specify the contribution of each person in the report

2. Lab 2: Vivado HLS - Scalar Add <2024-09-20 Fri>

2.1. Dependencies

Make sure you have installed Vivado HLS

2.2. Procedure

Write a module in Vivado HLS
Write test software
Run C-simulation
Verify the result of C-simulation
Run C-synthesis
Understand the result of C-synthesis
Run Cosim
Export the design as an IP
Import the IP in the Vivado design environment
Generate a bitstream
Create an overlay, and make a driver in Python

2.3. What to submit

A compressed zip file of the generated IP files, (b) a bitstream file, (b) a Python Jupyter Notebook file;
A report, which should include a top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

3. Lab 3: Vivado HLS - Vector Add <2024-09-27 Fri>

3.1. Description

This week's lab is similar to the previous week's. The main difference is that we use vectors for input and output.

3.2. Dependencies

Vivado HLS
Vivado
Pynq board

3.3. Procedure

3.3.1. Part 1

Create a new Vivado HLS project
Write vadd module and testbench (vadd.c, vadd.h, tb_vadd.cpp)
Run C-simulation
Run C-synthesis
Observe the error msg
Fix the error by adding INTERFACE pragma (run C-synthesis again)
Observe the synthesis result, esp. latency
Add the UNROLL pragma, and run C-synthesis again
Observe how latency is changed
Think about how to reduce latency

3.3.2. Part 2

Make sure that your vadd module has an interrupt port

#progma HLS INTERFACE s_axilite port=return bundle=control

Create a new Vivado project
Instantiate the vadd module; observe that it has interrupt output port
Instantiate AXI Interconnects (and other necessary modules) and complete the top-level design
Make sure you configure IPs as necessary (AXI Interrupt Controller, AXI Interconnects, Processing System, etc.)
Validate the design, create HDL Wrapper, and generate bitstream

3.3.3. Part 3

Open Jupyter notebook
Load overlay and write Python code using the overlay (without using interrupt)
See if the result is correct (will it be correct even if computation takes much longer?)
Write Python code using interrupt as demonstrated in the lab
Check if the result is correct even when computation takes long

3.4. Note

The previous lab was reviewed.

3.5. Assignment

Do the SAXPY example. Use the following function prototype (You may hard-code SIZE)
```
void saxpy_hw(float *x, float *y, float a);     // y[i] = y[i] + a * x[i]
```
Explore various methods to improve latency, and explain your findings in the report
Submit the report and other files as usual

4. Lab 4: Vivado HLS - Vector Add with DMA <2024-10-11 Fri>

4.1. Description

Improve the performance of vadd using DMA
Learn to use the hls::stream<> class

4.2. References

4.3. Procedure

Study the vadd example in Ref. 1 above (see also Ref. 2 for HLS stream library). Note: This code may not run on the Pynq board.
To make it run on Pynq, we will mainly define/use push_stream and pop_stream functions along with some minor modification of the code.
Refer to Ref. 3 above for an exmple of how to use the ap_axiu type, which is defined in "ap_axi_sdata.h".
The rest of the flow is the same as in the previous lab.

4.4. Assignment

Implement the following SAXPY kernel in hardware using DMA as we learned in the lab.
```
void saxpy_hw(float *z, float *x, float *y, float a);     // z[i] = a * x[i] + y[i]
```
1. Note the order of the parameters.
2. Follow the advice given by Jaewoo, TA, about performing buffer allocation in the constructor of a Python class.
Submit source code (including testbench and header files if any and Python/Jupyter files) as well as your implementation files (IP, bitstream).
Submit also a report, which should include the top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

5. Lab 5: Vivado HLS - Matrix Multiplication <2024-10-18 Fri> (take-home lab)

5.1. Description

Write a basic matrix multiplication kernel and verify its functionality on a Pynq board using the Python layer
The use of DMA as in the previous lab is required.
Optimization of the kernel (such as loop unrolling and the use of buffers) is NOT required.

5.2. Assignment

Implement a matrix multiplication kernel in hardware using Vivado HLS
```
void matmul_hw(float *C, float *A, float *B);    // C = A * B
```
- The sizes of the matrices are fixed as follows (you may hard-code them): ~~float C[128][128], A[128][256], B[256][128]~~ → float C[64][64], A[64][128], B[128][64]
- Note: the top-level function of HLS module should have a slightly different prototype
Use DMA as in the previous lab. Use the DMA function provided: dma_template.h.
- The use of different template parameters in ap_axiu<32,4,5,5> is to avoid compatibility issues.
- Since we're using DMA, the interface of the top-level function is very similar to that of the previous lab, but internally you'll have to change how to handle the values you recieve via AXIS. Specifically, you won't be able to use hls::stream<> as in the previous lab, but should store them in a local memory due to the complex access pattern required in the compute loop.
As usual, submit your source code as well as your implementation files; Submit also a report, which should include the top-level diagram and the results of C-simulation, C-synthesis, and Cosimulation.

6. Lab 6: Customizing a Python-based Neural Network Framework <2024-11-01 Fri>

6.1. Description

Learn how to use a Python (PyTorch)-based neural network framework and how to customize it so (some) inference computation can be offloaded to hardware.

6.2. Procedure

MNIST neural network inference using PyTorch. (mnist.py)
- Training included
Consider how to use HW MatMul for the MNIST neural network inference.
Write Python-based MNIST inference code. (numpymnist.py) (numpy_layer.py)
Override Linear layer with a function that calls MatMul HW.
In order to use integer MatMul HW, we have to change the type of input operands to MatMul. Modify the Python code accordingly (recall the quantization method discussed in lecture).

6.3. Assignment

Port the MNIST inference code to the Pynq board, by offloading the second layer to the hardware. For this, you'll need to generate N-bit integer MatMul hardware. (ARM-Python is already included in Pynq.)
Try to minimize the parameter "N", which is the precision of activation and weight tensors.
As usual, submit your source code as well as your implementation files. Submit also a report as usual (see the next item for the contents of the report).
In the next lab, each team will give a presentation on their work, which should include the Pynq board execution result, classification accuracy, total execution time, and the value of the parameter "N".

7. Lab 7: Presentation + Optimization Methods <2024-11-08 Fri>

7.1. Presentation of Lab 7 Results

Integrating the Pynq board and the hardware accelerator running on it with the application software is an important milestone. Hence we'll have a presentation session in the first half of the lab. Please prepare to present your Lab 7 results; best if everyone presents.

7.2. Optimization Methods Description

(i) Learn to use array partition and array reshape via a simple matrix multiplication example. (ii) Support arbitrarily large matrices.

7.3. Procedure

Observe the inefficiency in a simple matrix multiplication example
Apply array partition and observe the difference in synthesis results
Consider how to support matrix multiplications with arbitrarily large size matrices
Implement your idea and verify the results

7.4. Assignment

Final project (due: TBD)

8. Final Project (TBD)

9. Useful Links

Tools

Xilinx Vivado 2020.1 (other versions may NOT work)
- https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/archive.html
- Note: Windows10 and Ubuntu 18.04 recommended (but I hear Windows11 also works)
- Patch: https://support.xilinx.com/s/article/76960?language=en_US

Documentation and Examples

Topic	Links
Pynq library	https://pynq.readthedocs.io/en/latest/pynq_package/pynq.lib.html
	https://pynq.readthedocs.io/en/latest/pynq_libraries/overlay.html
HLS pragmas	https://www.xilinx.com/html_docs/xilinx2017_4/sdaccel_doc/jit1504034365862.html
HLS (Korean)	https://wikidocs.net/book/4588
Interface	https://docs.xilinx.com/r/en-US/ug1399-vitis-hls/S_AXILITE-Example
AXI basics 1~	https://support.xilinx.com/s/article/1053914?language=en_US
Examples by Xilinx	https://github.com/Xilinx/Vitis-HLS-Introductory-Examples
	https://github.com/Xilinx/SDAccel_Examples/tree/master/getting_started
HLS manual (UG902)	https://docs.amd.com/v/u/en-US/ug902-vivado-high-level-synthesis

For Interface Synthesis, see UG902, p.78.

9.1. Installation Procedure (maybe not up-to-date)

Here's the installation procedure.

System requirements: Windows10 or Ubuntu 18.04 only. (WARNING: Ubuntu 18.04.05 may NOT work.) Note: Large HDD space required (installation file alone is >10GB).
Visit https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/vivado-design-tools/archive.html and download the installation file
- You must have a Xilinx account.
- The version we'll use is Xilinx Vivado 2020.1 (other versions may NOT work for our examples).
- Download either Xilinx Unified Installer 2020.1: Windows Self Extracting Web Installer or Xilinx Unified Installer 2020.1: Linux Self Extracting Web Installer.
Apply patch: https://support.xilinx.com/s/article/76960. Otherwise, it will cause problems with Vivado HLS.
Apply PYNQ-Z2 board file: pynq-z2.zip. Unzip the file at your_vivado_path/Vivado/2020.1/data/boards/board_files. After unzipping, you should have data/boards/board_files/pynq-z2/A.0

It takes literally hours to complete installation. So start installing Vivado today!

Post your questions on BlackBoard if you have any!