A desert robot depicts AI’s vast opportunities
The next chapter in AI development will be defined by two trends: increased accessibility and increased technical maturity.
When Hongzhi Gao was young, he lived with his family in Gansu, a province located in the center of northern China by the Tengger Desert. Thinking back to his childhood, he recalls the constant, steady wind of dirt outside their house, and that during most months of the year it didn’t take more than a minute after stepping outside before sand would fill any empty space and creep into his pockets, boots, and his mouth. The monotony of the desert stuck in his head for years, and at university he turned that memory into an idea to build a machine that can bring plant life to the desert landscape.
Efforts to stop desertification—the process by which fertile land becomes desert—have been primarily focused on expensive manual solutions. Hongzhi designed a robot with deep learning technology to automate the process of tree planting: from identifying optimal spots to planting tree seedlings to watering. Despite having no experience with AI, as an undergraduate student Hongzhi used Baidu’s deep learning platform PaddlePaddle to stitch together different modules to build a robot with better object detection capability than similar machines already available in the market. It took less than one year for Hongzhi and his friends to spin up the final product and put it to work.
Hongzhi’s desert robot serves as a telling example of the increasing accessibility of artificial intelligence.
Today, more than four million developers are using Baidu’s open source AI technology to build solutions that can improve the lives of people in their communities, and many of them have little to no technical expertise in the field. “Within the next decade, AI will be the source of changes taking place across every fabric of our society, transforming how industries and businesses operate. The technology will expand the human experience by taking us on a deeper dive into the digital world,” said Baidu CEO Robin Li at Baidu Create 2021, an AI developer conference.
As we enter a new chapter in the evolution of AI, Haifeng Wang, CTO of Baidu, identified two key trends that underpin the industry’s path forward: AI will continue to mature and increase its technical complexity. And at the same time, the cost of deployment and barrier to entry will decrease—benefiting both enterprises building AI-powered solutions at scale and software developers exploring the world of AI.
Merging of knowledge and data with deep learning
The integration of knowledge and data with deep learning has significantly improved the efficiency and accuracy of AI models. Since 2011, Baidu’s AI infrastructure has been acquiring and integrating new information into a large-scale knowledge graph. Currently, this knowledge graph has more than 550 billion facts, covering all aspects of everyday life, as well as industry-specific topics, including manufacturing, pharmaceuticals, law, financial services, technology, and media and entertainment.
This knowledge graph and the massive data points together make up the building blocks of Baidu’s newly released pre-trained language model PCL-BAIDU Wenxin (version ERINIE 3.0 Titan). The model outperforms other language models without knowledge graphs on 60 natural language processing (NLP) tasks, including reading comprehension, text classification, and semantic similarity.
Learnings across modalities
Cross-modal learning is a new area of AI research that seeks to improve machines’ cognitive understanding and to better mimic the adaptive behavior of humans. Examples of research efforts in this area include automatic text-to-image synthesis, where a model is trained to generate images from text descriptions alone, as well as algorithms built to understand visual content and express that understanding with words. The challenge with these tasks is for the machines to build semantic connections across different types of datasets (e.g., images, text) and understand the interdependencies between them.
The next step for AI is merging AI technologies like computer vision, speech recognition, and natural language processing to create a multi-modal system.
On this front, Baidu has rolled out a variant of its NLP models that ties together language and visual semantic understanding. Examples of real-world applications for this type of model include digital avatars that can perceive their surroundings like human beings and handle customer support for businesses, and algorithms that can “draw” pieces of art and compose poems based on their understanding of the generated artworks.
There are even more creative, impactful potential outcomes for this technology. The PaddlePaddle platform can build semantic connections across vision and language, which led a group of master’s students in China to create a dictionary to preserve endangered languages in regions like Yunnan and Guangxi by more easily translating them into simplified Chinese.
AI integration across software and hardware, and into industry-specific use cases
As AI systems are applied to solve increasingly complex and industry-specific problems, a greater emphasis is placed on optimizing the software (deep learning framework) and hardware (AI chip) as a whole, instead of optimizing each individually, taking into consideration factors such as computing power, power consumption, and latency.
Further, tremendous innovation is taking place at the platform layer of Baidu’s AI infrastructure, where third-party developers are using the deep learning capabilities to build new applications tailored to specific use cases. The PaddlePaddle platform has a series of APIs to support AI applications in newer technologies such as quantum computing, life sciences, computational fluid mechanics, and molecular dynamics.
AI has practical uses as well. For example, in Shouguang, a small city in Shandong Province, AI is being used to streamline the fruit and vegetable industry. It takes only two people and one app to manage dozens of vegetable sheds.
And this is notable says Wang, “Despite the increased complexity of AI technology, open-source deep learning platform brings together the processor and applications like an operating system, reducing barriers to entry for companies and individuals looking to incorporate AI into their business.”
Reduced barrier to entry for developers and end users
On the technology front, pre-training large models like PCL-BAIDU Wenxin (version ERNIE 3.0 Titan) have solved many common bottlenecks faced by traditional models. For instance, these general-purpose models have helped lay the foundation for running different types of downstream NLP tasks, such as text classification and question-answering, in one consolidated place, whereas in the past, each type of task would have to be solved by a separate model.
PaddlePaddle also has a series of developer-friendly tools, such as model compression technologies to tweak the general-purpose models to fit more specific use cases. The platform provides an officially supported library of industrial-grade models with more than 400 models, ranging from large to small, which retain only a fraction of the general-purpose models’ size but can achieve comparable performance, reducing model development and deployment costs.
Today, Baidu’s open source deep learning technology supports a community of more than four million AI developers who have collectively created 476,000 models, contributing to the AI-driven transformation of 157,000 businesses and institutions. The examples enumerated above are a result of innovations happening across all layers of the Baidu AI infrastructure, which integrates technologies such as voice recognition, computer vision, AR/VR, knowledge graphs, and pre-training large models that are one step closer to perceiving the world like humans.
In its current state, AI has reached a level of maturity that allows it to do amazing tasks. For example, the recent launch of Metaverse XiRang would not have been possible without PaddlePaddle’s platform to create digital avatars for participants around the world to connect from their devices. Further, future breakthroughs in areas like quantum computing could significantly improve the performance of metaverses. This goes to show how Baidu’s different offerings are inter-woven and inter-dependent.
In a few years, AI will be near the core of our human experience. It will be to our society what steam power, electricity, and the internet were to previous generations. As AI becomes more complex, developers like Hongzhi will be working more in the capacity of artists and designers, given the creative freedom to explore use cases previously considered only theoretically possible. The sky is the limit.
This content was produced by Baidu. It was not written by MIT Technology Review’s editorial staff.
The inside story of how ChatGPT was built from the people who made it
Exclusive conversations that take us behind the scenes of a cultural phenomenon.
AI is dreaming up drugs that no one has ever seen. Now we’ve got to see if they work.
AI automation throughout the drug development pipeline is opening up the possibility of faster, cheaper pharmaceuticals.
GPT-4 is bigger and better than ChatGPT—but OpenAI won’t say why
We got a first look at the much-anticipated big new language model from OpenAI. But this time how it works is even more deeply under wraps.
The original startup behind Stable Diffusion has launched a generative AI for video
Runway’s new model, called Gen-1, can change the visual style of existing videos and movies.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.