Skip to Content

Data Mining Reveals the Secret to Getting Good Answers

If you want a good answer, ask a decent question. That’s the startling conclusion to a study of online Q&As.

If you spend any time programming, you’ll probably have come across the question and answer site Stack Overflow. The site allows anybody to post a question related to programing and receive answers from the community.

And it has been hugely successful. According to Alexa, the site is the 3rd most popular Q&A site in the world and 79th most popular website overall.

But this success has naturally led to a problem–the sheer number of questions and answers the site has to deal with. To help filter this information, users can rank both the questions and the answers, gaining a reputation for themselves as they contribute.

Nevertheless, Stack Overflow still struggles to weed out off topic and irrelevant questions and answers. This requires considerable input from experienced moderators. So an interesting question is whether it is possible to automate the process of weeding out the less useful question and answers as they are posted.

Today we get an answer of sorts thanks to the work of Yuan Yao at the State Key Laboratory for Novel Software Technology in China and a team of buddies who say they’ve developed an algorithm that does the job.

And they say their work reveals an interesting insight: if you want good answers, ask a decent question. That may sound like a truism, but these guys point out that there has been no evidence to support this insight, until now.

“To the best of our knowledge, we are the first to quantitatively validate the correlation between the question quality and its associated answer quality,” say Yuan and co.

These guys began their work by studying the entire corpus of questions and answers on Stack Overflow between July 2008 and August 2011. That’s some 2 million questions from 800,000 people who produced over 4 million answers and 7 million comments. They also considered metadata, such as the number of upvotes and down votes for each entry.

Until now, most attempts to evaluate the quality of user input have looked only at the votes associated with questions or the votes associated with answers. For example, a good answer has more upvotes than downvotes and the bigger the difference, the better the result.

But Yuan and co digged a little deeper. They looked at the correlation between well received questions and answers. And they discovered that these are strongly correlated.

A number of factors turn out to be important. These include the reputation of the person asking the question or answering it, the number of previous questions or answers they have posted, the popularity of their input in the recent past along with measurements like the length of the question and its title.

Put all this into a number cruncher and the system is able to predict the quality score of the question and its expected answers. That allows it to find the best questions and answers and indirectly the worst ones.

There are limitations to this approach, of course. First, it can only make its prediction after the first 24 hours of responses to a question. That’s not so useful to Stack Overflow since it needs to find ways of filtering out the lower quality questions before they reach the broader community. So Yaun and co say they are working ways to filter out the worst questions more quickly.

Second, Yuan and co rely on the impressive amount of metadata that Stack overflow collects for both questions and answers. That’s in stark contrast to many Q&A sites that allow users only to vote on answers. The moral for these sites may be to collect more data on the questions.

In the meantime, users of Q&A sites can learn a significant lesson from this work. If you want good answers, first formulate a good question. That’s something that can take time and experience.

Perhaps the most interesting immediate application of this new work might be as a teaching tool to help with this learning process and to boost the quality of questions and answers in general.

Ref: http://arxiv.org/abs/1311.6876: Want a Good Answer? Ask a Good Question First!

Keep Reading

Most Popular

The Steiner tree problem:  Connect a set of points with line segments of minimum total length.
The Steiner tree problem:  Connect a set of points with line segments of minimum total length.

The 50-year-old problem that eludes theoretical computer science

A solution to P vs NP could unlock countless computational problems—or keep them forever out of reach.

section of Rima Sharp captured by the LRO
section of Rima Sharp captured by the LRO

The moon didn’t die as early as we thought

Samples from China’s lunar lander could change everything we know about the moon’s volcanic record.

conceptual illustration of a heart with an arrow going in on one side and a cursor coming out on the other
conceptual illustration of a heart with an arrow going in on one side and a cursor coming out on the other

Forget dating apps: Here’s how the net’s newest matchmakers help you find love

Fed up with apps, people looking for romance are finding inspiration on Twitter, TikTok—and even email newsletters.

ASML machine
ASML machine

Inside the machine that saved Moore’s Law

The Dutch firm ASML spent $9 billion and 17 years developing a way to keep making denser computer chips.

Stay connected

Illustration by Rose WongIllustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.