Singing at your computer is the new way to identify that song that’s been stuck in your head. As long as it’s one of the roughly 3,000 currently in the database of the prototype “query-by-humming” search engine Tunebot, brought to you by Arefin Huq, Mark Cartwright and Bryan Pardo of the Interactive Audio Lab at Northwestern University.
Tunebot, the guts of which are described in a paper just released at the 2010 Sound and Music Computing Conference, tackles a problem so difficult that it makes beating the world’s greatest chess grandmaster look like a party trick: How to transform off-key caterwauling of a visitor to your website (or the user of a forthcoming iPhone app) into a melodic signature that can be compared to a database of thousands (and eventually millions) of songs, yielding something that even approximates the result your visitor was looking for.
Just populating the requisite database of hummed or sung tunes is a challenge. You could either have a small army of volunteers sing multiple renditions of all three million tracks in iTunes database, or you could turn it into a hit game.
At least, Huq et al. are hoping it will be a hit:
Karaoke Callout is a game in which a player has to sing, a-capella, a random song chosen by the phone. The player’s rendition is run through a melody extractor - which determines the fundamental frequency of the note they’re singing, once every 20 milliseconds. It is then compared to the nearest match in the database for that song, and their performance is rated by how close it is to the “real thing.”
Karaoke Callout is what’s known as a Game With a Purpose, which means it’s a way for a computer scientist to outsource a huge amount of manual training by transforming the task into a leisure-time activity.
Huq et al. are confident that a straightforward comparison of the melodic signature of songs sung or hummed into their search engine with songs already collected from Karaoke Callout is an adequate solution to their problem.
It’s also an elegant solution to the problem of melody recognition. Rather than trying to solve the much harder problem of comparing a hummed tune to an existing recording or some simplified version of a song, such as a MIDI encoding, Huq et al. are comparing apples and apples: two different human beings singing the same song.
And the more people who sing a particular track into Karaoke Callout, the more accurately it will be able to identify that song. In other words, whatever hit rate Tunebot ultimately achieves for Margaritaville - that’s as good as it gets.
Five poems about the mind
Work reinvented: Tech will drive the office evolution
As organizations navigate a new world of hybrid work, tech innovation will be crucial for employee connection and collaboration.
Investing in people is key to successful transformation
People-related factors like talent attraction and retention and clear top-down communication will determine whether your transformation progresses or stalls.
The way forward: Merging IT and operations
Digital transformation in any industry begins with bridging the gap between two traditionally separate teams.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.