The Urban Dictionary is a crowdsourced website that records new words and their meanings. It began life in 1999 as a parody of Dictionary.com but has since become an important resource on the Web. Indeed, judges in the U.K. famously used the site in 2005 to help them understand slang used by two rappers involved in a dispute.
Part of Urban Dictionary’s appeal is its informal approach, which allows both definitions and descriptions of words. It even allows opinions, which can sometimes be offensive. It captures new words quickly and registers many of the variations that emerge over time. A voting system allows users to show admiration or disdain, revealing words’ popularity.
Today, many millions of users rely on the site to keep them up to date with slang, common usage, and popular culture.
Of course, Urban Dictionary has its shortcomings. In the absence of style guides, editors, and moderators, the content can be vague and inaccurate. Also, little is known about the people who post new words and whether the entries reflect real changes in the language or just those that affect a small subset of people.
So just how good is the Urban Dictionary at capturing new words, and how does it compare with more conventional approaches to producing online dictionaries?
Today, we get an answer of sorts thanks to the work of Dong Nguyen at the Alan Turing Institute in London and a few pals, who compare the Urban Dictionary and its content with Wiktionary, another crowdsourced dictionary. “To the best of our knowledge, this is the first systematic study of Urban Dictionary at this scale,” they say.
Wiktionary is an interesting comparison because it takes a much more formal approach to crowdsourcing. This is a sister site to Wikipedia, run by the same Wikimedia organization. It records only word definitions and employs guidelines about how these should be compiled. It also guides users as to what constitutes a definition. Moderators edit the content, control vandalism, and aim to generate high-quality results. Unsurprisingly, Wiktionary has also become an important online resource, one that researchers increasingly use for natural-language processing and so on.
Nguyen and co begin by analyzing the Urban Dictionary content in the broadest terms. They say it records 2,661,625 definitions for 1,620,438 words and phrases. Most words have just one definition, but a few have upwards of 1,000.
The word with the highest number of definitions is emo, with 1,204. And the top definition is this:
- A terribly misconstrued and misused word. In contemporary culture it is utilized as a broad term to describe a multitude of children and teenagers who straighten their hair, have their hair in their face, perhaps dye it black, and wear tight clothing. Unfortunately this is completely inaccurate. Actual “emo” music existed in the late 80’s and was a subgenre of hardcore punk rock, after all, “emo” is a shortening of “Emotional hardcore punk rock.” The people in early emo bands dressed like regular people, everyday guys/girls who just played music that they enjoyed. Sadly, since the formulation and ongoing existence of Hot Topic, the term emo has been incorrectly characterized for a little more than a decade. You have to wonder how the original bands feel about the slandering and mass misunderstanding and misuse of their originality with those of the unoriginal.
By contrast, Wiktionary lists five definitions for emo:
- A particular style of hardcore punk rock 2. An individual or group of people associated with that subculture and musical style. 3. Any form of guitar-driven alternative rock that is particularly or notably emotional 4. An individual or group of people associated with a fashion or stereotype of that style of rock. 5. A young person who is considered to be over-emotional or stereotypically emo.
The word with the next highest number of definitions on Urban Dictionary is love, with 1140. The other words in the top 10 by number of definitions are: god, urban dictionary, chode, Canada’s history, sex, school, cunt, and scene.
In terms of popularity, upvotes slightly outnumber downvotes. But, say Nguyen and co, “there is a wide variation among the definitions, with some having more than ten times more up votes than down votes and some the other way around.”
The team also compare the lexical coverage of Urban Dictionary and Wiktionary. It turns out that the overlap is surprisingly small—72 percent of the words on Urban Dictionary are not recorded on Wiktionary.
However, the team note that many words on Urban Dictionary are relevant to only a small subset of users. Many are nicknames or proper names such as Dan Taylor, defined as “A very wonderful man that cooks the best beef stew in the whole wide world.” These usually have only one meaning.
So to study more common words, the team also compared only those words that have two or more definitions. In that case, the overlap is much larger: just 25 percent of the definitions appear only on Urban Dictionary. For example, the word phased appears on both dictionaries as something being done bit by bit—in phases.
However, Urban Dictionary also describes several other meanings, such as “A word that is used when your asking if someone wants to fight” and “to be ‘buzzed.’ when you arent drunk, but arent sober.”
In this analysis, many more words appear only on Wiktionary, some 69 percent of them. Nguyen and co say that many of these are encyclopedic entries such as acacetins, dramaturge, and Shakespearean sonnets.
That leads the team to a clear conclusion. “In general, we can say that the overlap between the two dictionaries is small,” they say.
Urban Dictionary meanings also include opinions, unlike those on Wiktionary. One definition of beer is this: “Possibly the best thing ever to be invented ever. I MEAN IT.”
To work out what proportion of definitions these make up, the team had crowdworkers assess each to determine whether it was an opinion or a meaning and whether they were familiar with it.
They found that up to 50 percent of meanings for proper nouns were opinions and that the workers were unfamiliar with the majority of these uses. They also found definitions such as coffee, “a person who is coughed upon.”
In addition, crowdworkers found that much of the Urban Dictionary content was offensive, but that this cntent tended to get lower votes.
“We also found that words with more definitions tended to be more familiar to crowdworkers, suggesting that Urban Dictionary content does reflect broader trends in language use to some extent,” say Nguyen and co.
The work provides a unique window into a website that has come to play an important role in popular culture. That should set the scene for other studies. In particular, an interesting question is whether online dictionaries not only record linguistic change but actually drive it, as some linguists suggest.
Perhaps something for a future research project.
Ref: arxiv.org/abs/1712.08647 : “Emo, Love, and God: Making Sense of Urban Dictionary, a Crowd-Sourced Online Dictionary“
Inside the machine that saved Moore’s Law
The Dutch firm ASML spent $9 billion and 17 years developing a way to keep making denser computer chips.
The 50-year-old problem that eludes theoretical computer science
A solution to P vs NP could unlock countless computational problems—or keep them forever out of reach.
The US is worried that hackers are stealing data today so quantum computers can crack it in a decade
The US government is starting a generation-long battle against the threat next-generation computers pose to encryption.
This new startup has built a record-breaking 256-qubit quantum computer
QuEra Computing, launched by physicists at Harvard and MIT, is trying a different quantum approach to tackle impossibly hard computational tasks.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.