Abstract
| Field-programmable gate arrays (FPGAs) offer an attractive platform for machine learning models; however, creating these models is a time-consuming process that demands specialized knowledge. High-level synthesis (HLS) offers the potential to streamline this hardware development, but with the trade-off of not being able to hand tune the design to take advantage of the specific resources and performance capabilities available. The popular open- source Python library HLS4ML offers low latency HLS machine learning models. However, it is unclear how much quality reduction is caused by adopting an HLS design flow. In this thesis we create carefully optimized SystemVerilog versions of identical HLS4ML designs, allowing us to evaluate the quality of the designs produced by HLS4ML. Our research reveals that HLS designs are highly competitive with hand-optimization techniques for basic layers with standard parameters, especially in terms of the capabilities of the Vivado HLS tools. However, when advanced parameters like reuse factor and stride are altered, our hand-written design is able to achieve a 2x-3x increase in performance over HLS4ML’s. Through this evaluation we were able to identify weaknesses in the existing tools and develop workarounds to help provide significant quality improvements. |