Skip to Content


Artificial intelligence

From cloud to the edge: On-device artificial intelligence boosts performance

AI can boost performance, security, and cost savings—but building any AI-enabled product requires careful use of optimized computing.
May 16, 2019

Produced in association withArm

If artificial intelligence (AI) goes according to plan, we’ll barely notice it taking hold. As a result, and despite the hyperbole, AI may be the quietest major computing revolution the world has ever known. What’s happening at one of the world’s leading children’s hospitals is a great example.

Great Ormond Street Hospital (GOSH) clinicians see more than 300,000 children every year, many of them with critical care needs. To ensure its patients receive the best possible care in a safe and secure environment, GOSH began testing an AI-based person recognition system where medical staff, patients, and authorized visitors receive access to certain secure areas of the hospital while any unauthorized entrants are either stopped or flagged by the system. The solution uses a network of AI-enabled smart cameras to examine each person’s face, body structure, and gait. The system then automatically cross-checks facial features against a database of registered people. The system has increased hospital security and has clinical benefits, too. For example, if a child requires immediate care, an emergency room doctor can quickly be located and notified, ensuring the team is ready to spring into action when needed.

ARM Image

In the past, coping with such a sophisticated system would have required a sprawling data center and its associated costs. But the AI revolution has sparked a movement to perform AI computing differently. Instead of a cloud link, data generated by GOSH’s innovative cameras is processed locally on the cameras themselves using a tiny chip. Not only does this “AI at the edge” system process data faster and more cost efficiently, it never leaves the confines of the hospital.

The Next Era of Computing u2013 Machine Learning for Every Device

  • Learn More

    The way we interact with machines is changing. Arm’s Project Trillium will transform our lives through a new class of advanced and ultra-efficient machine learning processors purpose-built to redefine device capabilities.

Machine learning in action

A branch of AI, machine learning (ML) uses sophisticated algorithms in models that can learn from data and identify important patterns. By uncovering connections, ML helps businesses make better decisions without the need for human input.

Today, ML is powering all kinds of applications, many of which are mobile, as smartphone users climb to an anticipated 3.8 billion by 2021. Examples range from fingerprint recognition and photo-sorting to more innovative use cases, including:

Smart inhalers: AI-powered inhalers run real-time ML algorithms that calculate a patient’s lung capacity and breathing patterns. This data is then interpreted on the device itself and sent to a smartphone app, enabling healthcare professionals to personalize regimens for asthma sufferers based on detailed sensor data.

A branch of AI, machine learning (ML) uses sophisticated algorithms in models that can learn from data and identify important patterns

Robot companions: An AI-driven social robot for senior citizens uses ML to understand the preferences, behavior, and personality of its owner. Based on these interactions, the robot can automatically connect older adults to stimulating digital content, such as music or audiobooks, as well as recommend activities, remind the user about upcoming appointments, or connect to family and friends through social media. And unlike most AI systems, which require voice activation, the robot proactively communicates with its user. For example, if a senior citizen has been sitting for an extended period of time, the robot can automatically recommend calling a friend or taking a walk.  

Reindeer cam: A smart camera system detects herds of reindeer through ML algorithms as they approach train tracks in remote parts of Norway where the animals are often needlessly killed. By processing information on the device itself, the system can warn train operators in real-time to reduce speeds when the animals are present, thereby preventing accidents and train delays.

The edge advantage

Hardware vendors are taking note and increasingly equipping devices with ML-capable chips. As a result, these devices are capturing and processing data in real time, providing instantaneous situational analysis, identifying patterns, and supporting quick AI-enabled decision making.

Edge AI devices are mainly running ML inference workloads—where real-world data is compared to a trained model. The models they use are mostly built in the cloud due to the heavy compute requirement of building an AI model. However, even with AI training, we are starting to see edge devices used as trainers as they learn in real-world environments.  

The timing couldn’t be better. We’ve reached a critical mass of compute resources in the cloud. Around 29 billion connected devices are predicted by 2022, of which nearly 18 billion will be related to the Internet of Things. At the same time, the average consumer will own 13 connected devices by 2021 as autonomous vehicles populate our roads and sensors spread from factory floors to rural farms, each vying for precious compute power.

“With today’s exponential explosion of intelligent devices, there simply aren’t enough data centers in the world to send all of the data to the cloud,” says Ian Bratt, an engineering fellow at Arm Limited, where he leads the machine learning technology group.

Moving ML workloads to the edge can provide a number of key advantages. These include:

Heightened speed and performance: Whether searching for a weather update or driving directions to a restaurant, today’s mobile user demand fast access to critical information. But sending data back and forth to the cloud can result in latency, which can negatively impact time-critical applications. On-device learning, however, delivers increased responsiveness for immediate insights.

Enhanced privacy and security: The global average cost of a data breach is $3.86 million, according to a study by IBM and the Ponemon Institute. Unfortunately, shipping data to the cloud for processing creates an opportunity for that information to get hijacked by a cybercriminal. One sure-fire way to prevent an attack is by making sure sensitive data never leaves a device. On-device ML also provides decentralization, making it more difficult for hackers to launch an attack compared to a single, centralized server.

Cost savings:  A 2018 Frost & Sullivan survey reveals that 49 percent of IT decision-makers struggle to manage costs associated with running cloud workloads. And a TECHnalysis Research report on AI reveals that 50 percent of non-AI users cite cost as a chief concern. ML at the edge can alleviate the financial burden by reducing reliance on costly cloud services and infrastructure.

Together, these advantages, combined with an exponential growth in AI compute, are spurring an on-device ML revolution.

Sussing Silicon: How to select the microprocessor that’s right for your task

Yet with the amount of compute power used in AI-training models doubling every hundred days, many organizations are questioning where ML workloads should be performed—on a central processor (CPU), a graphics processor (GPU), or a neural processor (NPU)?

The answer hinges on a number of factors, including how quickly tasks need to be executed, the compute performance required, and whether it’s worth the extra cost of adding an NPU or GPU to a system design.

Some training and inference algorithms are so complex, and the data sets so large, that faster computation solutions, such as a GPU and NPU, are worth considering as co-processors to the CPU every system will already have. But it’s also critical that organizations consider the benefit of including custom silicon when CPUs are becoming increasingly AI-capable.

 “A lot of AI is still being done on CPUs,” says Bob O’Donnell, founder and chief analyst of TECHnalysis Research. “Yes, it’s great that there are other kinds of chips that can be used for certain AI tasks and workloads. However, because every single device out there has CPU, it’s an excellent baseline.”

Dean Wampler agrees. A vice president at Lightbend, which provides an open-source platform for developing cloud-native applications, Wampler says companies are rethinking how they use GPU resources with an eye toward trying “to minimize the compute overhead required” for a task.

Instead, he says, more and more clients are realizing they “can be clever and exploit the compute power that they already have” in a CPU. The result: minimal overhead and less strain on resources for maximum throughput.

When CPU isn’t enough

That’s not to suggest, however, that there’s not enormous value in GPU and NPU. “In some of today’s more advanced, forward-thinking applications, GPUs are incredibly important,” says O’Donnell. Although best known for graphics, video, and photo processing, GPUs are gaining favor in the finance and scientific research sectors for accelerating computational workloads.

NPUs, on the other hand, are several times faster than GPUs, making them best-suited for compute-intensive tasks and heavy workloads, while the general availability and programmability of CPUs make them an excellent default option for mobile inference.

The key is to take advantage of all three forms of computational power depending on the ML-related task at hand. Certainly, CPU may be the first choice for ML processing, but in cases where responsiveness or power efficiency are imperative, it helps to complement a CPU with a dedicated NPU which offers greater efficiency and higher performance. In many situations, a layer of software that lives on the CPU can help play real-time traffic cop, determining the right processor for the right task.

A look forward

These are early days for intelligent devices. “We’re in the first phase of the AI and machine learning revolution,” says O’Donnell. But use cases are fast evolving from voice recognition and photo filters to life-saving devices, driving the demand for unprecedented compute power. Moving ML workloads to the edge can help improve performance and efficiency. But carefully considering which programming approach is best, and on which platform, is what will ultimately ensure staying in the game.

Deep Dive

Artificial intelligence

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

What’s next for generative video

OpenAI's Sora has raised the bar for AI moviemaking. Here are four things to bear in mind as we wrap our heads around what's coming.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.