Select your localized edition:

Close ×

More Ways to Connect

Discover one of our 28 local entrepreneurial communities »

Be the first to know as we launch in new countries and markets around the globe.

Interested in bringing MIT Technology Review to your local market?

MIT Technology ReviewMIT Technology Review - logo

 

Unsupported browser: Your browser does not meet modern web standards. See how it scores »

Words most commonly used by women. The researchers could predict users’ genders with 92 percent accuracy.

What do your status updates really say about you?

In a study published last week at PLOS ONE, scientists at the University of Pennsylvania examined the language used in 75,000 Facebook profiles. They found differences across ages, genders, and certain personality traits. This allowed the group, led by computer and information scientist H. Andrew Schwartz, to make predictions about the profile of each user.

The researchers found that they could predict a user’s gender with 92 percent accuracy. They could also guess a user’s age within three years more than half of the time.

To date, this is the largest study of its kind. Its magnitude allowed the researchers to use an “open-vocabulary approach”—that is, they let the data drive which words or phrases were considered most important. Most studies rely on a closed-vocabulary approach, using previously established lists of related words. That technique forces researchers to look at trait markers they already know, rather than discover new ones.

“Automatically clustering words into coherent topics allows one to potentially discover categories that might not have been anticipated,” the authors wrote. “[Open vocabulary approaches] consider all words encountered and thus are able to adapt well to the evolving language in social media or other genres.”

The group was particularly interested in using this approach to determine users’ characteristics. Each participant filled out a questionnaire, scoring themselves on the “Big Five” personality traits: extraversion, agreeableness, conscientiousness, neuroticism, and openness. The researchers then looked at the profile updates for language that aligned with the participants’ test scores, clumping common words and phrases into word clouds. (Some of this data is publicly available at The World Well-Being Project.)

Some of the language was consistent with previous psychological findings. For example, extroverts were far more likely than introverts to use the word “party,” and neurotic people were more likely to use the word “depressed.”

But other discoveries were more novel. Introverts were more likely to talk about Japanese media like “anime” and “manga,” and people who were less neurotic mentioned social events like “vacation,” “church,” and “sports” more often. Users who scored as less open were more likely to use shorthands like “2day” or “ur.”

The researchers hope to use their findings to provide more insight into what behavior sets different types of people apart.

“When I ask myself,” co-author Martin Seligman said in a press release, “ ‘What’s it like to be an extrovert?’ ‘What’s it like to be a teenage girl?’ ‘What’s it like to be schizophrenic or neurotic?’ or ‘What’s it like to be 70 years old?’ these word clouds come much closer to the heart of the matter than do all the questionnaires in existence.”

2 comments. Share your thoughts »

Credit: Word cloud image by University of Pennsylvania

Tagged: Computing, Communications, Web, Mobile, Facebook

Reprints and Permissions | Send feedback to the editor

From the Archives

Close

Introducing MIT Technology Review Insider.

Already a Magazine subscriber?

You're automatically an Insider. It's easy to activate or upgrade your account.

Activate Your Account

Become an Insider

It's the new way to subscribe. Get even more of the tech news, research, and discoveries you crave.

Sign Up

Learn More

Find out why MIT Technology Review Insider is for you and explore your options.

Show Me