The future of AI processing
Artificial Intelligence (AI) is emerging in everyday use cases, thanks to advances in foundational models, more powerful chip technology, and abundant data. To become truly embedded and seamless, AI computation must now be distributed-and much of it will take place on device and at the edge.
To support this evolution, computation for running AI workloads must be allocated to the right hardware based on a range of factors, including performance, latency, and power efficiency. Heterogeneous compute enables organizations to allocate workloads dynamically across various computing cores like central processing units (CPUs), graphics processing units (GPUs), neural processing units (NPUs), and other AI accelerators. By assigning workloads to the processors best suited to different purposes, organizations can better balance latency, security, and energy usage in their systems.

Key findings from the report are as follows:
More AI is moving to inference and the edge. As AI technology advances, inference-a model's ability to make predictions based on its training-can now be run closer to users and not just in the cloud. This has advanced the deployment of AI to a range of different edge devices, including smartphones, cars, and industrial internet of things (IIoT). Edge processing reduces the reliance on cloud to offer faster response times and enhanced privacy. Going forward, hardware for on-device AI will only improve in areas like memory capacity and energy efficiency.
To deliver pervasive AI, organizations are adopting heterogeneous compute. To commercialize the full panoply of AI use cases, processing and compute must be performed on the right hardware. A heterogeneous approach unlocks a solid, adaptable foundation for the deployment and advancement of AI use cases for everyday life, work, and play. It also allows organizations to prepare for the future of distributed AI in a way that is reliable, efficient, and secure. But there are many trade-offs between cloud and edge computing that require careful consideration based on industry-specific needs.

Companies face challenges in managing system complexity and ensuring current architectures can adapt to future needs. Despite progress in microchip architectures, such as the latest high-performance CPU architectures optimized for AI, software and tooling both need to improve to deliver a compute platform that supports pervasive machine learning, generative AI, and new specializations. Experts stress the importance of developing adaptable architectures that cater to current machine learning demands, while allowing room for technological shifts. The benefits of distributed compute need to outweigh the downsides in terms of complexity across platforms.
This content was produced by Insights, the custom content arm of MIT Technology Review. It was not written by MIT Technology Review's editorial staff.
This content was researched, designed, and written entirely by human writers, editors, analysts, and illustrators. This includes the writing of surveys and collection of data for surveys. AI tools that may have been used were limited to secondary production processes that passed thorough human review.