Instagram users have used hashtags to classify billions of images, unintentionally creating great labels to train the company’s computer vision algorithms.
The news: Facebook created a data set of 3.5 billion pictures and 17,000 hashtags pulled from public Instagram accounts to improve how well it can recognize objects in images, the company announced on stage at its annual F8 developer conference today. Using a subset of that data set, Facebook was able to label 85.4 percent of photos correctly, the highest level the company has achieved to date.
But: This free labor does have drawbacks: some hashtags like #photooftheday or #tbt—short for throwback Thursday—don’t describe what’s in the image and can confuse the algorithm.
Why it matters: More accurate computer vision could do everything from surface more relevant content to help keep abusive posts off the site. Of course, Mark Zuckerberg has already had to answer for how users’ data gets handled, and not everyone might be happy to find out that their vacation photos are being put to work.