While AI originally was targeted for data centers and in the cloud, it has been moving rapidly towards the edge of the network where it is needed to make fast and critical decisions locally and closer to the end user. Sure, training can be still done in the cloud, but in applications such as autonomous driving, it is important that the time-sensitive decision making (spotting a car or pedestrian) is done closer to the end user (the driver). After all, edge systems can make decisions on images coming in at up to 60 frames per second, enabling quick actions.
These systems are made possible through edge inference accelerators that have emerged to replace CPUs, GPUs and FPGAs at much higher throughput/$ and throughput/Watt.
The ability to do AI inferencing closer to the end user is opening up a whole new world of markets and applications. In fact, IDC just reported that the market for AI software, hardware, and services is expected to break the $500 billion mark by 2024, with a five-year compound annual growth rate (CAGR) of 17.5% and total revenues reaching an impressive $554.3 billion.
This rapid growth is likely due to the fact that AI is expanding from “just a high-end functionality” into products closer to consumers, essentially bringing AI capabilities to the masses. In addition, recent products announced have started breaking the cost barriers typically associated with AI inference, enabling designers to incorporate AI into a wider range of affordable products.
While the example of autonomous driving above is the most common one people usually think of when they think of edge AI inference, there are actually many other markets closer to becoming reality or that exist today. Below I will highlight just a few, but the possibilities are endless as the technology evolves and the market starts benefiting from volume shipments and manufacturing of AI accelerators.
- Edge servers – Many edge servers are deployed in factories, hospitals, retail stores, financial institutions and other enterprises. In many cases, sensors in the form of cameras are already connected to the servers, but they are just recording what’s happening in case of an accident or a theft. Now these servers can be super-charged with low cost PCIe inference boards with AI inference capabilities. For example, in the industrial space, AI can be used to help manage inventories, detect defects or even predict defects before they happen. In the retail space it can enable capabilities such as pose estimation that uses computer vision technology to detect and analyze human posture. The data gained from this analysis could enable stores better understand human behaviors and foot traffic in their stores, enabling them to set up the store in a way that maximizes their retail sales and customer satisfaction. While these are just two specific examples, edge servers powered with AI inference can open up a wide range of applications, including surveillance, facial recognition, genomics/gene sequencing, industrial inspection, medical imaging, and more.
- High volume, high accuracy/quality imaging – There is strong demand for high accuracy/quality imaging for use in applications such as robotics, industrial automation/inspection, medical imaging, scientific imaging, cameras for surveillance and object recognition, photonics, etc. Because the sensors capture 0.5 to 6 Megapixels, and “getting it right” is critical, they want to use the best models (for example, YOLOv3, which is a heavy model at 62 million weights and >300 billion MACs to process a 2 megapixel image) and to use the largest image size they can (just like humans, we can recognize people better with a large crisp image than a small one).
- Voice and lower throughput inference – Applications such as Amazon Echo have high adoption and are expected to continue to advance. Voice processing requires billions of MACs/second or even less for just keyword recognition. These are well suited for edge AI inference accelerators.
- Mobile phones – Almost all smart phone application processors have an AI module of the SoC for local processing of simple neural network models, making it the highest unit volume of AI deployment at the edge today.
When it comes to edge AI inference, there are four key requirements for customers not only in the markets mentioned above, but also in the many markets that will emerge to take advantage of these accelerators.
The first is low latency. In all edge applications, latency is #1 which means batch size is almost always 1.
Support for BF16 and INT8 is essential. An inference accelerator that can do both floating point and INT8 gives customers the ability to start quickly with BF16 and shift seamlessly to INT8 when they are ready to make the investment in quantization.
High throughput is also critical. Almost every application wants to process megapixel images (1, 2 or 4) at frame rates of 30 or even 60 frames/second. Customers today all have models that work for their applications and how a solution runs their model is all that matters to them. All they care about is throughput and not meaningless benchmarks such as TOPS.
Customers want more throughput/image size per dollar and per watt so that new applications will become possible at the low end of the market where the volumes are exponentially larger. Solutions have emerged that deliver on this efficiency requirement.
There is much innovation happening around inference at the edge due to the vast amounts of new markets and applications that can benefit from its throughout efficiency and accuracy. Prices are coming down while still maintaining or beating the performance of the higher priced systems. This will drive AI inference capabilities into applications that we have not even thought about yet. I believe this is going to be one of the most exciting application areas of our time.
— Geoff Tate is CEO of Flex Logix