How iTunes Genius Really Works

An Apple engineer discloses how the company’s premier recommendation engine parses millions of iTunes libraries.

Christopher Mimsarchive page

June 2, 2010

Ever since the feature debuted in 2008, there’s been a lot of speculation about how iTunes Genius accomplishes its playlist-building magic. Now an engineer at Apple that works on the iTune Genius team has revealed some tantalizing clues–a rare disclosure for the infamously secretive company.

Recapitulating what Steve Jobs has said previously about iTunes Genius, Apple engineer Erik Goldman writes in his post on Quora that the starting point for the Genius service is a packet of usage data–what songs a user has in his or her library (and, presumably, how often he or she plays them)–sent from the iTunes application which is “folded into a larger database of users and songs.”

Basically, your library of tracks is compared to all the other Genius users’ libraries of tracks. Apple then runs a set of previously secret algorithms, which Goldman described as straightforward recommendation algorithms similar to those used by other services like Netflix when it suggests movies for a user to watch now or add to his quene, to generate statistics for each song. “These statistics are computed globally at regular intervals and stored in a cache,” notes Goldman, because data on the similarity of any two songs changes slowly–it’s assumed the only reason it changes at all is because of the changing tastes of the listening public, and the introduction of new tracks and artists.

Goldman jokes that if he told you how Genius works, he’d have to kill you (or at the least, have a squad of police officers raid your brain to retrieve Apple’s rightful property), but he continues to describe how the program works anyway.

To uncover part of how iTunes Genius works, says Goldman, “look at information retrieval algorithms, especially those that leverage the vector-space model.” But before you can compare factors, such as the frequency of a particular artist or genre in a user’s library or playlists, across iTunes libraries via a Vector-Space model, you need a clever way to define the factor that gives more weight to the things that really matter.

A simple way to properly weight factors for comparison is what’s known as term frequency-inverse document frequency (tf-idf). It’s simply a way to compare how often a particular factor occurs in a single document (or song, or library) to how often that factor occurs in a larger body such as the sum of all iTunes libraries stored by the Genius servers. Thus, a factor that occurs pretty often in a given user’s library–for example, an affinity for an obscure indy band–will tend to be a more powerful determinant, unless it also happens to occur quite often in the total set of data–as would be the case if the factor was an affinity for the Beatles.

Once you’ve got your tf-idf weights sorted, you can represent them in a vector space model as vectors.

In this example (courtesy Wikipedia) two different documents (or songs) have all their various tf-idf weights represented as a single vector (e.g. d1) which can then be compared to a second document / vector (e.g. d2) and a query (q)–such as “which of these two songs is most like the one for which I’ve just clicked the ‘genius’ button.” Whichever one is closer in angle to your query vector is more similar.

Digging deeper into iTune Genius system, Goldman talks about its use of latent-factor algorithms. “Latent-factor algorithms, in particular, tend to work very well on huge data sets with an enormous number of dimensions and a lot of noise,” says Goldman.

Latent factors are what shakes out when you do a particular kind of statistical analysis, called a factor analysis, on a set of data, looking for the hidden, unseen variables that cause the variation in all the different variables you’re examining. Let’s say that the variability in a dozen different variables turns out to be caused by just four or five “hidden” variables–those are your latent factors. They cause many other variables to move in more or less lock-step.

Discovering the hidden or “latent” factors in your data set is a handy way to reduce the size of the problem that you have to compute, and it works because humans are predictable: people who like Emo music are sad, and sad people also like the soundtracks to movie versions of vampire novels that are about yearning, etc. You might think of it as the mathematical expression of a stereotype–only it works.

If you want to go really deep on this subject, Goldman suggests you read the papers that came out of the million-dollar Netflix Prize, which was won by a combination of teams led by engineers from AT&T. Their challenge was to improve Netflix’s recommendation engine, and one of their primary innovations was reducing the computational intensity of the algorithms used in recommendation engines.

Previously, the amount of computation required to do a pairwise comparison of any two items in Netflix’s (and presumably Apple’s) library scaled as a quadratic function of the number of comparisons to be performed. But the AT&T team figured out how to re-write a fundamental algorithm to make the problem scale only linearly with the amount of data involved. So, whatever Apple’s new data center is for, it’s probably not to calculate Genius results.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.