Machine learning algorithms are highly compute-intensive. Training sessions can be performed in data centers where racks of server class processing power can be employed. However, inferencing, which may be done on embedded systems (edge nodes), is less compute-intensive. But certain applications such as real-time video image processing may stress the capabilities of even the fastest Arm processors. One way to address this is to move parts of the inferencing algorithms into accelerators implemented in hardware. This session will explore the use of High-Level Synthesis (HLS) to create machine learning accelerators specific to an implementation. HLS enables for greater trade-offs between power, area, latency, and throughput needed to meet demanding power and performance goals. HLS allows implementation to be done much more quickly than traditional hardware design methodologies, which is essential in fields like machine learning where algorithms are continually evolving. This session will also cover integrating the accelerators into an Arm processor subsystem’s hardware and software.