Now There’s an IQ Test for Siri and Friends

A new artificial intelligence test could tell us which digital assistant is the smartest.

Emerging Technology from the arXivarchive page

October 13, 2017

Psychologists have never agreed precisely on the nature of human intelligence. And without that agreement, intelligence is hard to measure. Indeed, the most famous measurement method—the intelligence quotient or IQ test—has generated ferocious controversies.

The nature of machine intelligence is just as thorny. But it’s a topic that psychologists and computer scientists are being forced to confront as intelligent machines become more common and powerful.

This raises a number of questions. How intelligent are these machines, and how do they compare with humans in this respect?

Today we get an answer of sorts thank to the work of Feng Liu at the Chinese Academy of Sciences in Beijing and a couple of pals, who have developed an intelligence test that both machines and humans can take. These guys have used the test to rank intelligent assistants such as Google Assistant and Siri on the same scale used for humans.

Their test is based on what the researchers call the “standard intelligence model.” In this new model, systems must have a way of obtaining data from the outside world; they must be able to transform the data into a form that they can process; they must be able to use this knowledge in an innovative way; and, finally, they must feed the resultant knowledge back into the outside world.

This boils down to being able to gather data, master it, exercise creativity over it, and then produce an output.

“If a system has [these] characteristics, it can be defined as a standard intelligence system,” say Feng and co.

The researchers developed a test that measures a machine’s (or a human’s) ability to do all these things. This is where the details become a little fuzzy. They do not say what kind of tests they use.

But they go on to say they have been testing intelligent assistants since 2014. These machines include those from Google, Baidu, Sogou, Apple, and Microsoft. They’ve also tested humans in the same way, and this allows them to rank them all on the same scale.

The 2016 ranking—the most recent test they publish—is as follows.

Human 18 years old 97
Human 12 years old 84.5
Human 6 years old 55.5
Google 47.28
Baidu’s Duer 37.2
Baidu 32.92
Sogou 32.25
Bing 31.98
Microsoft’s Xiaobing 24.48
Apple’s Siri 23.94

So on this scale, even a six-year-old human outperforms the most advanced digital assistant, which in this case is Google’s. Apple props up this ranking and is seemingly outperformed by all its main competitors.

An important point is that machine intelligence is improving rapidly. In 2014, Google’s assistant scored 26.4 in this test. Only two years later, it scored 47.28, not far behind a six-year-old. That’s a significant increase.

Since then, Google has made a number of improvements to its assistant, which increase its utility. It’ll be interesting to see how these machines perform in the 2017 ranking.

But only if this method gains in credibility. Feng and co’s secrecy may be understandable in the sense that revealing the tests would make it straightforward to game them. On the other hand, without knowing how the testing works, it’s hard to believe in its utility.

Feng and co would be wise to publish more details—perhaps a set of sample questions, even. Without that, the prevailing view of this work will be skeptical.

Ref: arxiv.org/abs/1709.10242 : Intelligence Quotient and Intelligence Grade of Artificial Intelligence

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.