Speaker
Description
Developers have proposed various hardware accelerators to improve the CNN inference performance on embedded platforms. Recently, Xilinx announced its first 7-nm FPGA accelerator, the Versal ACAP, delivering a high-performance, heterogeneous computing platform adaptable to the application requirements. However, as early studies were concerned with the most common deep learning architectures for CNN, the implementation and analysis of the Versal ACAP performance with customized CNN architectures are yet to be explored.
In this study, we implement one of the CNN architectures considered at the European XFEL and compare its performance to a state-of-the-art GPU and other FPGA generation. In addition, this study evaluates the validity of using the quantization methods for critical regression applications. It presents a complete analysis of the results built upon the device time traces, providing recommendations for configuring the runtime parameters.
The experimental results confirm a superior performance of the Versal ACAP in terms of latency and throughput.