As digital data piles up at ever faster rates, the potential is growing for smart algorithms to dig out insights the human brain never could. IBM’s head of analytics research, Chid Apte, directs a team intent on realizing that potential. His group is developing algorithms and other techniques that can extract meaning from data, and it is trying to find ways to use these methods to solve business challenges. Apte talked about his group’s priorities with Tom Simonite, the IT editor for hardware and software at Technology Review.
TR: IBM has been creating and selling analytics products for decades. What’s new?
Apte: Historically, analytics has been about using well-organized past data from inside an enterprise. Now we have two new and different sources of data. One is unstructured data from customer interactions, such as e-mails to support, or call transcripts. The other is social information that we get by tapping into the Web—the world of Twitter and feeds.
My group is working directly with clients to get a better handle on how these sources can be used on the problems businesses are seeing in the trenches.
Can you give an example of such a project and how it can help a business?
We worked with a [consumer packaged-goods] company that makes sports beverages. They were interested in the sentiment—feeling—in the marketplace about their drink. We developed technology to find the exact blogs talking about their product and started extracting the conversations about their sports drink for analysis. We made it possible to judge the sentiment being expressed and also to identify who the influencers are. We want to find the people an enterprise should target with new messages so the social network will take care of the rest and [the messages] will spread widely.
This technology will form the basis of a new product we will in the future be able to offer all of IBM’s big customers.
Will your analytics technologies interpret more than just numbers?
We have already developed technology that can actually tell you what plan you should execute. It uses techniques called reinforcement learning and Markov decision processes, and we developed a system that uses it with the New York Department of Tax and Finance. The system automatically generates a plan for dealing with individual tax delinquents. It tells you what to do to maximize the chance of recovery and minimize your costs.
When you train the system, it doesn’t look at the data as a big table; it maps out a directional graph of sequential decisions. From that it can derive the most optimal plan of action.
What about technology like what Watson used on Jeopardy!—technology that would let you pose a question as you would to a colleague?
We see a lot of opportunities for what we call deep QA for business solving. Watson was built primarily by IBM’s natural-language understanding team, but they collaborated with my colleagues very closely for the machine learning involved. We continue to work closely with them.
The basic technology relies on a huge unstructured corpus, like what Watson used. For business, some of the more traditional analytics solutions need bringing together with the deep QA approach, and we are working on that.
What is the biggest challenge to analytics in the near future?
We need a better way to handle large-scale data. Historically it’s the Internet companies that have been out there with petabytes of data, but now it’s moving out into the enterprise in general: telecoms with call detail records, government getting into analyzing large volumes of data, health-care companies pulling together patient records. Instead of analyzing a few dozen factors, we are getting into spaces with hundreds of factors that you need to analyze at the same time.
We’re developing a whole new kind of infrastructure for this world. That includes things like architectures for distributed and parallel machine learning that exploit new hardware. We need to scale up analytics.