Technology Review - Published By MIT
Advertisement

arXiv blog

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Email Subscription

Recently on the arXiv blog...

Recent comments on the arXivblog

  • Mr CP : Wow!  These games are almost as fun as throwing a real ball around!  Almost.  :)
  • scoubidoo : I suggest you visit the NooJ website: http://www.nooj4nlp.net/pages/nooj.htmland discover the...
  • coolmike : The other issue that is being overlooked is the type of prepaid service, and how the cost impacts...
  • jhertzberg : I checked the date as I read this (nope, not April 1st). How long until we start using...
  • jtempere : Both yttrium barium copper oxide and the family of bismuth strontium calcium copper oxides have...
  • sleeprun : We read a Wharton study of doctors influencing new treatment adoption.  It was not the most...
  • ZephirAWT : isn't completelly new in this context, as the simmilar concept was proposed by Mark Hucko in 1985...
  • ZephirAWT : If I understood it well, by your theory matter universe is surrounded by antimatter universe and...
  • matt_s : Couldn't we theoretically save the earth from the eventual expansion of the sun in it's...
  • ... : This is not about the originators of ideas, but about how the ideas spread. A well connected...
  • debu : Please read my ether-gravity or theory of gravitoethertons which explains many aspects of quantum...
  • ms : So is AMSC selling superconducting wire that doesn't exist?
  • shazl : I believe the results are not only because of somebody being post-paid or pre-paid. It's...
  • ... : I am surprised no one is addressing an immediate need for energy here on earth, and what this...
  • IXANTI666 : THE NEXT STEP HUMAN TELEPORTATION.SINCE WE ARE MADE UP OF QUATUM MATTER TO BEGIN WITH TO BORROW...
  • 020648 : try www.prisonplanet.tv
  • ZephirAWT : And what prohibits scientists in ATTEMPT to replicate J.F.Prins experiments? Are they so...
  • ... : Make me some 90K Tc superconductor and I'll finish my PhD in a month! How's this for a conspiracy...
  • sfrysfry : For ideas on entangling larger structures, I introduce the conjecture of Nicholas Greaves, an...
  • snedunuri : This might explain how UFOs get the energy they need to travel interstellar distances. Either...
Advertisement
Monday, June 01, 2009

How to Build a 100-Million-Image Database

The next generation of image-search algorithms must be evaluated using a database big enough to test their mettle.

We take some 80 billion photographs each year which would require around 400 petabytes to store if they were all saved. Finding your cherished shot of Aunt Marjory's 80th birthday party among that lot is going to take some special kind of search algorithm. And of course, various groups are working on just how to solve this problem.

But if you want to build the next generation of image search algorithms, you need a database on which to test it, say Andrea Esuli and pals at the Institute of Information Science and Technologies in Pisa, Italy. And they have one: a database of 100 million high quality digital images taken from Flickr. For each image they have extracted five descriptive features such as colours, shape, and texture, as defined by the MPEG-7 image standard.

That's no mean feat. Esuli and co point out that such an image database would normally require the download and processing of up to 50 TB of data, something that would take take about 12 years on a standard PC and about 2 years using a high-end multi-core PC. Instead, they simply decided to crawl the Flickr site, where the pictures are already stories, taking what data they need as descripitors. This paper describes the trials and tribulations of building such a database.

Elusi and co also announce that the resulting collection is now open to the research community for experiments and comparisons. So if you're testing the next generation of image search algorithm, this is the database you need to set it loose on.

Finding Aunt Marjory may not be the lost cause we had thought.

Ref: http://arxiv.org/abs/0905.4627 :CoPhIR: a Test Collection for Content-Based Image Retrieval

Comments

  • Video database
    Perhaps you should have a look at the algorithms behind Blinkx which has indexed about 35 million hours of video, frame by frame. It is doing for online video and film what Google has done for web pages.
    Rate this comment: 12345

    yewlodge
    06/02/2009
    Posts:1
    Avg Rating:
    4/5
Advertisement

Log In

Forgot your password?     Register »
Advertisement
Technology Review January/February 2010

Current Issue

Security in the Ether
Information technology's next grand challenge will be to secure the cloud--and prove we can trust it.
•  Subscribe
Save 36%
•  Table of Contents
•  MIT News
» Gift Subscription
» Digital Subscription
» Reprints, Back Issues
» Subscribe
» Table of Contents
» MIT News

More Technology News from Forbes

Advertisement
MIT Massachusetts Institute of Technology © 2010 Technology Review. All Rights Reserved.