Technology Review - Published By MIT
Advertisement

arXiv blog

The Physics arXiv Blog produces daily coverage of the best new ideas from an online forum called the Physics arXiv on which scientists post early versions of their latest ideas. Contact me at KentuckyFC @ arxivblog.com

Email Subscription

Recently on the arXiv blog...

Recent comments on the arXivblog

  • ... : Well, I did log in and searched in vain for your "withering reply".  Is it perhaps subtly hidden...
  • TooMany : While the anti-nuke contingent in the US and in parts of Europe continues to rant about the...
  • mulliguns : Points to why a pair may be the most difficult to row successfully, and that sometimes odd size...
  • shaase : It would seem difficult to guarantee that oarsmen all pull with the same force, so it will still...
  • seth : Even though the new arrangements were not immediately obvious, I do think they make common sense....
  • ... : I have examined the major uranium producing countries and made a forecast for 2009, 2010 and...
  • ZephirAWT : Frankly, I don't believe in communication hypothesis wery much, but there's is at least another...
  • ... : http://www.technologyreview.com/energy/22114/The link above goes to your Technology Review...
  • jmc8888 : Great explanation of an issue that only concerns the idiots. If running out of nuclear fuel was a...
  • spad12 : "Federally insured loans do not reduce the overall cost of projects, they just shift part of the...
  • Siphon : Of course, many things do change with temperature: density of the radioactive material, which...
  • mbelvadi : I wonder if this model takes into account the slow cumulation of very tiny increments of...
  • ZephirAWT : That's OK, it's word consumption (110 thousands tons of uranium concentrate) - enough to next ten...
  • Bob... : 1) "Federally back the loans..."  Federally insured loans do not reduce the overall cost of...
  • tsport100 : What a load of alarmist rubbish this article is. Australia has 1.2 MILLION tonnes of known...
  • ZephirAWT : /*..Uranium is being taken from coal ash waste...*/It isn't - we are only speculating about it...
  • Colonel... : Says this blog: "The combined threats of climate change, energy security and fears over the high...
  • spad12 : "The initial costs (overnight costs) are very large.    Financing the construction effectively...
  • Bob... : "it is cheap after the initial costs (and even then a large portion of the costs is in the form...
  • pasward : Do they?  If you are headed toward a star, being off by 1,000,000 km is still going to get you...
Advertisement
Monday, June 01, 2009

How to Build a 100-Million-Image Database

The next generation of image-search algorithms must be evaluated using a database big enough to test their mettle.

We take some 80 billion photographs each year which would require around 400 petabytes to store if they were all saved. Finding your cherished shot of Aunt Marjory's 80th birthday party among that lot is going to take some special kind of search algorithm. And of course, various groups are working on just how to solve this problem.

But if you want to build the next generation of image search algorithms, you need a database on which to test it, say Andrea Esuli and pals at the Institute of Information Science and Technologies in Pisa, Italy. And they have one: a database of 100 million high quality digital images taken from Flickr. For each image they have extracted five descriptive features such as colours, shape, and texture, as defined by the MPEG-7 image standard.

That's no mean feat. Esuli and co point out that such an image database would normally require the download and processing of up to 50 TB of data, something that would take take about 12 years on a standard PC and about 2 years using a high-end multi-core PC. Instead, they simply decided to crawl the Flickr site, where the pictures are already stories, taking what data they need as descripitors. This paper describes the trials and tribulations of building such a database.

Elusi and co also announce that the resulting collection is now open to the research community for experiments and comparisons. So if you're testing the next generation of image search algorithm, this is the database you need to set it loose on.

Finding Aunt Marjory may not be the lost cause we had thought.

Ref: http://arxiv.org/abs/0905.4627 :CoPhIR: a Test Collection for Content-Based Image Retrieval

Comments

  • Video database
    Perhaps you should have a look at the algorithms behind Blinkx which has indexed about 35 million hours of video, frame by frame. It is doing for online video and film what Google has done for web pages.
    Rate this comment: 12345

    yewlodge
    06/02/2009
    Posts:1
    Avg Rating:
    4/5
Advertisement

Log In

Forgot your password?     Register »
Advertisement
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
•  Subscribe
Save 36%
•  Table of Contents
•  MIT News
» Gift Subscription
» Digital Subscription
» Reprints, Back Issues
» Subscribe
» Table of Contents
» MIT News

More Technology News from Forbes

Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.