How Data Mining Facebook Messages Can Reveal Substance Abusers
“A substance use disorder (SUD) is a condition in which recurrent use of substances such as alcohol, drugs and tobacco causes clinically and functionally significant impairment in an individual’s daily life.” So begin Warren Bickel from the Addition Recovery Research Center in Roanoke, Virginia, and a couple of pals, who study this condition.
Substance abuse is a serious concern. Around one in 10 Americans are sufferers. Which is why it costs the American economy more than $700 billion a year in lost productivity, crime, and health-care costs. So a better way to identify people suffering from the disorder, and those at risk of succumbing to it, would be hugely useful.
Bickel and co say they have developed just such a technique, which allows them to spot sufferers simply by looking at their social media messages such as Facebook posts. The technique even provides new insights into the way abuse of different substances influences people’s social media messages.
The new technique comes from the analysis of data collected between 2007 and 2012 as part of a project that ran on Facebook called myPersonality. Users who signed up were offered various psychometric tests and given feedback on their scores. Many also agreed to allow the data to be used for research purposes.
One of these tests asked over 13,000 users with an average age of 23 about the substances they used. In particular, it asked how often they used tobacco, alcohol, or other drugs, and assessed each participant’s level of use. The users were then divided into groups according to their level of substance abuse.
This data set is important because it acts as a kind of ground truth, recording the exact level of substance use for each person.
The team next gathered two other Facebook-related data sets. The first was 22 million status updates posted by more than 150,000 Facebook users. The other was even larger: the “like” data associated with 11 million Facebook users.
Finally, the team worked out how these data sets overlapped. They found almost 1,000 users who were in all the data sets, just over 1,000 who were in the substance abuse and status update data sets, and 3,500 who were in the substance abuse and likes data sets.
These users with overlapping data sets provide rich pickings for data miners. If people with substance use disorders have certain unique patterns of behavior, it may be possible to spot these in their Facebook status updates or in their patterns of likes.
So Bickel and co got to work first by text mining most of the Facebook status updates and then data mining most of the likes data set. Any patterns they found, they then tested by looking for people with similar patterns in the remaining data and seeing if they also had the same level of substance use.
The results make for interesting reading. The team says its technique was hugely successful. “Our best models achieved 86% for predicting tobacco use, 81% for alcohol use and 84% for drug use, all of which significantly outperformed existing methods,” say Bickel and co.
The technique also identified a wide range of keywords that people with substance abuse disorder are more likely to use in social media posts. “Swear words such as ‘fuck’ and ‘shit,’ sexual words such as ‘horny’ and ‘sex,’ words related to biological process such as ‘blood’ and ‘pain’ are positively correlated with all three types of substance use disorder,” say Bickel and co, referring to tobacco, alcohol and drug use. “In addition, female references such as ‘girl’ and ‘woman,’ prepositions, space reference words such as ‘up’ and ‘down’ are positively correlated with alcohol use, while words related to anger such as ‘hate’ and ‘kill,’ words related to health such as ‘clinic’ and ‘pill’ are positively correlated with drug use.”
The data shows correlations both ways. “A preference for movies such as V for Vendetta and Boondock Saints is positively correlated with alcohol use, while having a hobby, liking cartoons and shows favored by kids or liking movies and brands favored by girls are negatively correlated with drug, alcohol and tobacco use respectively,” say the team.
There are also some surprising correlations. “For example, female references such as ‘girl’ and ‘woman’ are positively related to alcohol use while male references such as ‘man’ and ‘boy’ are negatively related to drug use,” says Bickel and co. This is probably because references to females are more likely to be made by males who are also more likely to use alcohol.
That’s interesting work that immediately suggests a way to identify people who are at risk of substance use disorder—simply look at their Facebook posts and likes. “We believe social media is a promising platform for both studying SUD-related human behaviors as well as engaging the public for substance abuse prevention and screening,” say Bickel and co.
Ref: arxiv.org/abs/1705.05633: Social Media-based Substance Use Prediction
Geoffrey Hinton tells us why he’s now scared of the tech he helped build
“I have suddenly switched my views on whether these things are going to be more intelligent than us.”
Deep learning pioneer Geoffrey Hinton has quit Google
Hinton will be speaking at EmTech Digital on Wednesday.
The future of generative AI is niche, not generalized
ChatGPT has sparked speculation about artificial general intelligence. But the next real phase of AI will be in specific domains and contexts.
Video: Geoffrey Hinton talks about the “existential threat” of AI
Watch Hinton speak with Will Douglas Heaven, MIT Technology Review’s senior editor for AI, at EmTech Digital.
Get the latest updates from
MIT Technology Review
Discover special offers, top stories, upcoming events, and more.