From Amazon’s product recommendations to Pandora’s ability to find us new songs we like, the smartest Web services around rely on machine learning–algorithms that enable software to learn how to respond with a degree of intelligence to new information or events.
Now Google has launched a service that could bring such smarts to many more apps. Google Prediction API provides a simple way for developers to create software that learns how to handle incoming data. For example, the Google-hosted algorithms could be trained to sort e-mails into categories for “complaints” and “praise” using a dataset that provides many examples of both kinds. Future e-mails could then be screened by software using that API, and handled accordingly.
Currently just “hundreds” of developers have access to the service, says Travis Green, Google’s product manager for Prediction API, “but already we can see people doing some amazing things.” Users range from developers of mobile and Web apps to oil companies, he says. “Many want to do product recommendation, and there are also interesting NGO use cases with ideas such as extracting emergency information from Twitter or other sources online.”
Machine learning is not an easy feature to build into software. Different algorithms and mathematical techniques work best for different kinds of data. Specialized knowledge of machine learning is typically needed to consider using it in a product, says Green.
Google’s service provides a kind of machine-learning black box–data goes in one end, and predictions come out the other. There are three basic commands: one to upload a collection of data, another telling the service to learn what it can from it, and a third to submit new data for the system to react to based on what it learned.
“Developers can deploy it on their site or app within 20 minutes,” says Green. “We’re trying to provide a really easy service that doesn’t require them to spend month after month trying different algorithms.” Google’s black box actually contains a whole suite of different algorithms. When data is uploaded, all of the algorithms are automatically applied to find out which works best for a particular job, and the best algorithm is then used to handle any new information submitted.
“Getting machine learning to a Google scale is significant,” says Joel Confino, a software developer in Philadelphia who builds large-scale Web apps for banks and pharmaceutical companies, and a member of the preview program. He used Prediction API to quickly develop a simple yet effective spam e-mail filter, and he says the service has clear commercial potential.
For example, a bank or credit-card company wanting to use machine learning to build systems that make decisions based on historical transactions is unlikely to have the specialized staff and necessary infrastructure for what is a computationally intensive approach. “This API could be a way to get a capability cheaply that would cost a huge amount through a traditional route.”
Google’s new service may also be more palatable to businesses wary of handing over their data to cloud providers, says Confino. “The data can be completely obfuscated, and you can still use this service. Google doesn’t have to know if those numbers you are sending it are stock prices or housing prices.”
Google does, however, get some information that it can use to improve its machine-learning algorithms. “We don’t look at users’ data, but we do see the same metrics on prediction quality that they do, to help us improve the service,” says Green. The engineers running Prediction API will know if a particular algorithm is rarely used, or if a new one needs to be added to the mix to better process certain types of data.
Prediction API has the potential to be a leveler between established companies and smaller startups, says Pete Warden, an ex-Apple engineer now working on his own startup OpenHeatMap.com. “That’s been a competitive advantage for large companies like Amazon, whose product recommendation is built on machine learning,” he explains. “Now you still have to have a decent set of training data, but you don’t have to have the same level of expertise.”
Warden has yet to gain access to Prediction API, but has plans to use it to improve a service he built that shows where people using a particular word or phrase on Twitter are located. “It would be really interesting to also see where they are saying positive and negative things on a subject,” says Warden. Prediction API could be trained to distinguish between positive and negative tweets to do that, he says.
Chris Bates, a data scientist with online music service Grooveshark and a member of the preview program, agrees that Google’s black box will enable wider use of machine learning, but he contends that the service needs to mature. “Today it is good at predicting which language text is in and also sentiment analysis, for example to pick out positive and negative reviews,” he says.
Ultimately, though, being unable to inspect the inner workings of the algorithms and fine-tune them for a specific use may have its limits. “It’s good for cases that are not mission-critical, where you can afford a few false positives,” Bates says. For example, a spam filter that occasionally lets through the occasional junk message could still be usable, but a credit-card company might be less able to accept any errors.