The cloud is getting smarter by the minute. In fact, it will soon know more about the photos you’ve uploaded than you do.
Cloud storage company Box announced today that it is adding computer-vision technology from Google to its platform. Users will be able to search through photos, images, and other documents using their visual components, instead of by file name or tag. “As more and more data goes into the cloud, we’re seeing they need more powerful ways to organize and understand their content,” says CEO Aaron Levie.
Computer-vision technology has improved remarkably over the past few years thanks to a machine-learning approach known as deep learning (see “10 Breakthrough Technologies 2013: Deep Learning”). A deep neural network—loosely inspired by the way neurons process and store information—can learn to recognize categories of objects, such as a “red sweater” or a “pickup truck.” Ongoing research, including work from Google’s researchers, is improving the ability of algorithms to describe what’s happening in images.
Box’s computer-vision feature could be a good way for companies to dip their toes into AI and machine learning. It removes the need to manually annotate thousands of images, and it will make it possible to search through older files in ways that might not have occurred to anyone during tagging. Levie says one company testing the technology is using it to search images for particular people.
The announcement is the latest sign that cloud computing is being reinvented through machine learning and artificial intelligence. AI is already the weapon of choice in the battle to dominate cloud computing, with companies that offer on-demand computing—Google, Amazon, and Microsoft among them—all increasingly touting added machine-learning features.
Fei-Fei Li, chief scientist of Google Cloud and a professor at Stanford University who specializes in computer vision and machine learning, said in a statement that the announcement shows how broadly available AI technology is becoming. “Ultimately it will democratize AI for more people and businesses,” Li said.
Levie says his company is looking at adding machine learning for other types of content. This could include audio and video, but also text, for which an algorithm could add semantic analysis, making it possible to search by the meaning of a document rather than specific keywords.
It’s also significant that Box is relying on computer vision from Google, rather than technology developed in-house. This reflects the fact that a few big players have come to dominate the more fundamental aspects of AI like computer vision, voice recognition, and natural-language processing. “If you think about the strength that Google has in image recognition, it would just be strategically unwise for us to try to compete with them,” Levie says. He says his company’s researchers are exploring ways of applying machine learning to the behavior of its customers. This process might reveal ways to optimize the Box service, or help identify tasks that could be ripe for automation, Levie says.
Google’s Cloud Vision API can recognize many thousands of everyday objects in images. However, some customers might need the ability to recognize and search through specific types of images, for example medical or architectural images. So Box’s researchers are exploring ways for customers to train their own vision systems if necessary.