Finding Business Insights in Text

Software that scans documents and online posts can uncover correlations or reveal what customers really think.

Tom Simonitearchive page

February 17, 2011

Looked at from one angle, the social Web is a fizzing arena of human expression and collaboration. From another, it is a kind of immense and freely available focus group that can reveal valuable insights into what consumers think and want. Increasingly, businesses are turning to technologies that can extract a signal from its noise.

Text mining software from Collective Intellect, a company based in Boulder, Colorado, scrutinizes posts on Twitter, Facebook, MySpace, blogs, and message boards to help businesses learn what’s being said about their products. The company’s software uses a suite of algorithms to scan text and work out what a person is referring to and with what emotional tone.

Crucially, it doesn’t just perform a Google-like keyword search; it also tries to pin down meaning. “ ‘Apple’ is a good example,” says Greg Greenstreet, the company’s chief technology officer. “It can mean a tech company, or it can mean fruit.” Collective Intellect’s software tries to distinguish between such meanings by looking at other usages of a word in similar contexts. As a result, it can make judgments and associations not unlike a human. “If I say ‘Steve Jobs,’ you and our software can both know which company I’m talking about,” says Greenstreet.

A user of Collective Intellect’s software begins by defining a few keywords of interest. The tool returns several clusters of results that can be accepted or rejected to teach the system what the user is interested in. Someone exploring consumer feeling about the brand Crocs, for example, would reject clusters of results about crocodiles and accept those about shoes. After that, Collective Intellect will search for online discussion related to the defined topic and send regular, detailed reports. An online dashboard shows posting activity over time and the volume of positive, neutral, and negative posts.

MTV provides one example of a new strategy that such tools make possible. “Traditional market research is too slow for them,” says Greenstreet. “They want to know, the moment it happened, whether people thought it was cool when that girl got punched on Jersey Shore.” Online chatter about TV shows can also reveal new advertising opportunities, he adds: “I can prove to, say, Clorox that the people that watch a particular show care about a particular product.”

Recently Collective Intellect has been working on more private data—for example, processing transcripts of calls made to customer service lines, or transcripts of doctor-patient consultations. “In that case the drug company wants to know what the themes of discussion are when a drug gets prescribed,” says Greenstreet. He predicts that such tools will start being used on internal data much more frequently. “There’s a lot of data in the enterprise that they don’t have a handle on,” he says.

David Steier, a director of information management with Deloitte’s consulting arm in Palo Alto, California, agrees that many businesses are unknowingly sitting on top of valuable data. His team makes it possible to automatically extract insights from collections of documents that would otherwise have to be read individually.

“Auto insurance claims, for example, have text descriptions of the accident that would usually be read and interpreted by a person processing the claim,” he says. “We can have software read that text, and other information on the form, to create a risk score for the claim automatically.” Deloitte’s system was trained on a batch of manually processed claims and was able to learn which words in the written description signaled high-cost accidents. “Motorcycle” and “flighted” (as in to a hospital) proved to be particularly strong indications that a claim would be expensive.

“You can use the score we generate to automatically route complex claims to the most experienced adjustors,” says Steier. A similar approach can use the text of support requests and call transcripts to make predictions about which customers are likely to switch to a competing product, giving companies a chance to develop innovative strategies for targeting people before they defect. “Companies need to pay attention to the fact that there is a lot of high value in this unstructured data that it is easy to overlook,” says Steier.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.