Mining for Meaning

Software

Claire Tristramarchive page

July 1, 2001

Online newsgroups are popular gathering spots; over the years they’ve logged millions of opinions on topics ranging from politics to appliances. The largest newsgroup network, Usenet, boasts 500 million messages posted since 1995; unlike postings in chat rooms and online forums, such messages tend to be uncensored-and preserved.

All these postings add up to a trove of public opinion that sociologists, linguists and market researchers would love to analyze; and software projects at IBM and the University of California at Berkeley are beginning to develop the analytical tools they’ll need. Unlike Web search engines, which try to find the best matches for any one query, these efforts focus on understanding how communities of individuals interact online, and how their opinions evolve.

To begin taking on this difficult task, IBM’s Babble software depicts conversations as dynamic circular graphs in which icons representing frequent talkers cluster at the center, and less chatty participants move toward the circumference. “People do in fact cluster together when talking, then drift apart,” says Thomas Erickson, research analyst at IBM.

But that’s only a first step. Beyond charting the chatters lies the task of examining what they’re saying. At the University of California, Berkeley, computational linguist Warren Sack’s software maps how often words or phrases appear, and how close they are to one another. “In effect you’re building a thesaurus of terms that relate directly to the conversation being studied,” says Sack. “You can see constellations of conversations, and see which topics are being discussed more than others.” One test of this Conversation Map tool helped pinpoint when online participants began thinking of Gulf War syndrome as a “disease” rather than a cluster of symptoms.

Sack and others say they’re still years away from a commercial product. When the software is available, though, market researchers just might be the customers: with the right tools, they could turn newsgroups containing millions of opinions into the ultimate focus group.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.