A system for automatically screening phone calls has been developed by researchers at Microsoft. It works by analyzing characteristics of a caller’s voice and word usage to figure out how urgent a call is and whether the caller is a friend, family member, colleague, or stranger. Then the call can be either put through or sent to voice mail.
Called V-Priorities, the system was originally part of a larger effort to ensure that urgent calls get through when an individual is busy or in a meeting. But, according to Eric Horvitz, a senior researcher at Microsoft Research in Redmond, WA, who created the system, it could also prove to be useful for filtering the growing number of spam phone calls.
In preliminary tests, the prototype system was 90 percent accurate at judging whether or not calls were unsolicited, says Horvitz. Similarly, its ability to judge the personal “closeness” of the caller was 84 percent accurate, while it could distinguish business calls from personal calls 75 percent of the time.
Extracting such information is quite feasible, says Bill Keller, a computer scientist who specializes in natural-language processing at the University of Sussex in England. There’s already been progress in using programs to screen automated call-center calls, he says. “They are trying to spot when people are getting agitated.” Also Corpora, based outside London, has had some success in analyzing speech for signs of sentiment.
At the moment, voice spam is still relatively rare, says Raimund Genes, chief technology officer of anti-malware at Trend Micro, an anti-virus company in Munich, Germany. But the growing popularity of Voice over Internet Protocol (VoIP) phone calls is likely to increase the amount of such spam because of the reduced cost of making calls, he says.
Furthermore, VoIP doesn’t just allow cheaper and easier spamming, it could also make many VoIP-based corporate phone networks more vulnerable to other sorts of attacks, says Scott Sobers, director of the Service Provider Market for IBM Tivoli in Washington, DC. In the future, such networks could be vulnerable to new types of computer viruses and other malicious software, which in theory could be introduced to the network merely by answering a phone, he says.
V-Priorities works on three levels, says Microsoft’s Horvitz. One level of analysis examines the prosody – rhythm, syllabic rate, pitch, and length of pauses – of a caller’s voice. In a second level, rudimentary word and phrase recognition is done to spot target words that could indicate the nature of a call. It’s a simple but effective approach, says Horvitz. “How often does your wife say ‘my name is’?” he asks. The third level of analysis involves metadata, such as the time and length of a message. “These things are all working together,” he says.
The machine-learning algorithm that powers the crude prototype of V-Priorities was trained on 207 voice-mail messages collected from a single recipient over an eight- month period. Voice mail was used for the sake of convenience. In a final version, the system would be designed to answer calls and ask callers to identify themselves, before determining whether to put the call through or divert it to voice mail.
In principle it’s the same sort of “challenge-response” approach used to deal with e-mail spam, says Genes. “It definitely works well with e-mail, but it’s not a popular technology,” he says. People tend to find it annoying and offensive.
There are some challenge-response systems designed for phone-call screening, says IBM’s Sobers. But they’re not automated and require the person receiving the call to listen to a recording of the caller identifying themselves before deciding whether or not to take the call.
Checking a caller’s ID is, of course, one way to screen a call, says Horvitz. But often businesses or individuals block their IDs. Also, sometimes it’s not just the caller’s identity that dictates whether you might want to take the call. For example, you may wish to take a call from a colleague only if it’s urgent, such as when they’re running late. To do this, the automated system could look for phrases such as “running late,” “traffic,” or “missed the train.”
But, according to Trend Micro’s Genes, applying an automated challenge-response approach to voice spam is likely to be less effective than it is with e-mail. It could be easy for someone to fool the software, by pretending to know the person they’re calling and by using more familiar language, he says.
The low cost of labor in developing countries means that voice spam calls are as likely to be made by humans as machines – but on a scale far beyond old-fashioned telemarketing, says Genes. And this will make it much tougher to filter out, because people are smarter. “If a person is really determined to reach you, they will find a way,” he says.
DeepMind’s cofounder: Generative AI is just a phase. What’s next is interactive AI.
“This is a profound moment in the history of technology,” says Mustafa Suleyman.
What to know about this autumn’s covid vaccines
New variants will pose a challenge, but early signs suggest the shots will still boost antibody responses.
Human-plus-AI solutions mitigate security threats
With the right human oversight, emerging technologies like artificial intelligence can help keep business and customer data secure
Next slide, please: A brief history of the corporate presentation
From million-dollar slide shows to Steve Jobs’s introduction of the iPhone, a bit of show business never hurt plain old business.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.