Abstract:
Most of the hardware accelerators communicate with the processor via custom instructions. Since custom instructions are not standardized, each accelerator requires a di erent compiler and user code, which can be a tedious process for the user. To reduce the user burden, we propose a parallel programming framework called SIMDify, which generates single-instruction-multiple-data (SIMD) processors that can achieve SIMD processing without using custom instructions. SIMDify takes an application machine code compiled for scalar RISC-V ISA and simulates it to determine the SIMD processing regions. Then, SIMDify con gures and generates the application-speciffic SIMD processor that executes scalar RISC-V instructions concurrently on the SIMD datapath. SIMD processor consists of a single master and multiple slave processing elements (PE). Slaves focus on SIMD level tasks, whereas the master is responsible for the central control. Proposed architecture is the first SIMD capable RISC-V processor designed in HLS and can operate with a faster clock frequency than the existing SISD RISC-V HLS cores. SIMDify relieves the user from using custom instructions with rigid programming models and o ers a exible solution. The processor is designed and tested in Vivado High Level Synthesis 19.2. It operates at 78 MHz on Zynq Zedboard FPGA. Master PE uses 5% and each slave uses 3.5% of FPGA resources. Test results show that execution time can be improved by 8.5x with 9 slaves and 19x with 29 slaves.