Skip to Content

Rejoice, Disorganized Workers: This Smart Cloud Looks After Your Files For You

The storage company Box ratchets up its competition with Dropbox.
October 28, 2016

Box, a popular cloud storage company for businesses, now employs machine learning to allow its service to peek into files and figure out what they contain.

The idea is to let users search for, say, every press release related to a particular product, or the minutes from a string of important meetings, or even photos of the CEO dancing at the last office holiday party.

This would be a compelling new way for individuals and companies to search for and organize their data. It also shows the potential for cutting-edge AI techniques to change the nature of everyday office work. Machine learning is fundamentally altering the way many online services and consumer devices work, from Google’s search to smartphone assistants. And it may now be poised to bring similar improvements to office work.

“The state of search in the enterprise today sucks,” says Aaron Levie, Box’s energetic CEO, who argues that the techniques his researchers are working on could fundamentally change things. “I could be General Electric, say, and I could say ‘Show me every press release about our jet engines.’ That is a query that would be useless today.”

Currently, Levie points out, if you want to be sure to find a document on your computer, you’ll most likely name it carefully and store it in a particular place—the challenge of keeping everything organized is more complicated for the files stored by different people across an organization. And, just as computers have been trained to recognize the objects in images and words in audio, they may be able to identify résumés, financial reports, or memos, and even documents that might be related to any of those documents.

Levie says his company is, in effect, building “an ImageNet of files,” referring to a vast database of tagged images created by researchers at Stanford University that has helped advance computer vision in recent years. ImageNet has been used extensively to train machine-learning algorithms to recognize different objects or scenes (see “Next Big Test for AI: Making Sense of the World”).

Box CEO Aaron Levie wants to use artificial intelligence to make it easier to search documents across an entire company.

Levie says researchers at his company have labeled thousands of different types of files and are feeding this information into a very large “deep learning” neural network. He adds that Box’s algorithms would find connections only between documents within a particular company—it would not connect information across different customers due to privacy and security concerns.

“Box has to differentiate from the competition, and this seems like a good move,” says Oren Etzioni, director of the Allen Institute for AI, a nonprofit in Seattle that is dedicated to advancing the state of the art in artificial intelligence. “The challenge, of course, is achieving a high degree of accuracy.”

It may prove tricky for Box to set itself apart. Competitors with expertise in machine learning—such as Google, Microsoft, and Amazon—are sure to use similar techniques to make it easier for customers to organize their data. It is already possible to search through images stored on Google by their contents rather than by file name; and it seems inevitable that before long it should be possible to search for documents the same way. Meanwhile, one of Box’s main direct rivals, Dropbox, has acquired numerous startups, several with expertise in machine learning.

Levie says his company’s researchers are also thinking about ways that Box might help automate routine office work. Some companies already offer ways to automate certain routine office tasks, like filling out forms, although the steps usually have to be carefully programmed into a computer. It’s less clear if computers could learn to perform a wide variety of tasks reliably, but a company like Box would have an advantage in being able to analyze a lot of user data.

If Box can get this to work, then it would embed AI into many offices. “If we watch that a user continues to do a process over and over again within our system,” says Levie, “at some point we can say, ‘We see you’re trying to do “X,” shall we turn this into a bot?’”

Keep Reading

Most Popular

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.

The problem with plug-in hybrids? Their drivers.

Plug-in hybrids are often sold as a transition to EVs, but new data from Europe shows we’re still underestimating the emissions they produce.

Google DeepMind’s new generative model makes Super Mario–like games from scratch

Genie learns how to control games by watching hours and hours of video. It could help train next-gen robots too.

How scientists traced a mysterious covid case back to six toilets

When wastewater surveillance turns into a hunt for a single infected individual, the ethics get tricky.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at with a list of newsletters you’d like to receive.