Skip to Content
Uncategorized

How to Build a 100-Million-Image Database

The next generation of image-search algorithms must be evaluated using a database big enough to test their mettle.

We take some 80 billion photographs each year which would require around 400 petabytes to store if they were all saved. Finding your cherished shot of Aunt Marjory’s 80th birthday party among that lot is going to take some special kind of search algorithm. And of course, various groups are working on just how to solve this problem.

But if you want to build the next generation of image search algorithms, you need a database on which to test it, say Andrea Esuli and pals at the Institute of Information Science and Technologies in Pisa, Italy. And they have one: a database of 100 million high quality digital images taken from Flickr. For each image they have extracted five descriptive features such as colours, shape, and texture, as defined by the MPEG-7 image standard.

That’s no mean feat. Esuli and co point out that such an image database would normally require the download and processing of up to 50 TB of data, something that would take take about 12 years on a standard PC and about 2 years using a high-end multi-core PC. Instead, they simply decided to crawl the Flickr site, where the pictures are already stories, taking what data they need as descripitors. This paper describes the trials and tribulations of building such a database.

Elusi and co also announce that the resulting collection is now open to the research community for experiments and comparisons. So if you’re testing the next generation of image search algorithm, this is the database you need to set it loose on.

Finding Aunt Marjory may not be the lost cause we had thought.

Ref: http://arxiv.org/abs/0905.4627 :CoPhIR: a Test Collection for Content-Based Image Retrieval

Deep Dive

Uncategorized

Our best illustrations of 2022

Our artists’ thought-provoking, playful creations bring our stories to life, often saying more with an image than words ever could.

How CRISPR is making farmed animals bigger, stronger, and healthier

These gene-edited fish, pigs, and other animals could soon be on the menu.

The Download: the Saudi sci-fi megacity, and sleeping babies’ brains

This is today’s edition of The Download, our weekday newsletter that provides a daily dose of what’s going on in the world of technology. These exclusive satellite images show Saudi Arabia’s sci-fi megacity is well underway In early 2021, Crown Prince Mohammed bin Salman of Saudi Arabia announced The Line: a “civilizational revolution” that would house up…

10 Breakthrough Technologies 2023

Every year, we pick the 10 technologies that matter the most right now. We look for advances that will have a big impact on our lives and break down why they matter.

Stay connected

Illustration by Rose Wong

Get the latest updates from
MIT Technology Review

Discover special offers, top stories, upcoming events, and more.

Thank you for submitting your email!

Explore more newsletters

It looks like something went wrong.

We’re having trouble saving your preferences. Try refreshing this page and updating them one more time. If you continue to get this message, reach out to us at customer-service@technologyreview.com with a list of newsletters you’d like to receive.