Technology Review - Published By MIT
Advertisement
« Back 1 [2]

Thursday, April 19, 2007

How to Speed Up Movie Downloads

Continued from page 1

By Brendan Borrell

smaller text tool iconmedium text tool iconlarger text tool icon

One challenge in devising a distribution system that can locate similar files is that the system must search not just for each file but also for every chunk within that file. A 700-megabyte video clip may be divided into 40,000 chunks, which means that the system must make several billion comparisons. SET is a hybrid system that first locates users with identical files before searching for requested chunks in file variants. SET's innovation in the latter task is what the researchers call handprinting, which efficiently identifies similar files using a constant number of search queries regardless of the file size. SET divides the requested file into 16-kilobyte chunks, which are then distilled into 160-bit-chunk hashes, or fingerprints. These fingerprints are sorted based on their numeric value, and the system selects the first few to form the handprint. Comparing handprints, says Andersen, "gives you a 90 percent chance of discovering a file that is 10 percent or more similar."

Locating that file with just 10 percent similarity could speed up downloads by 8 percent. For music files with greater than 90 percent similarity, a five-minute download on BitTorrent would take just over two minutes with SET. For a single user, the savings could be even greater if he or she happens to be downloading an unpopular variant of a common file. Andersen proposes a scenario in which a U.S.-based user downloads a German version of a popular movie. Currently, the movie would most likely be transferred from a slower overseas connection. But with SEC, users could take advantage of faster local sources for video and receive only the audio from German peers.

"It's a very clever scheme for finding the chunks in common," says Sirer. However, he says that "for the most popular content, [SET] won't make too much of a difference because there are already plenty of other peers who host that content. But I can imagine that other content which would otherwise be slow to get from a single swarm might actually be easier to download."

Although the researchers have released the source code for the SET system, they have no plans to build a graphical user interface for it or to deploy it in current file-sharing networks. "The math behind it was complex to analyze," Andersen says, "but the idea is relatively straightforward, and the implementation won't be bad." He says he wouldn't be surprised if someone deployed the SET system in the next year.

« Back 1 [2]

Comments

  • Brilliant..
    stradric on 04/19/2007 at 12:53 PM
    Posts:
    20
    Avg Rating:
    4/5
    This is pretty brilliant.  Couldn't the same technique be applied to file storage for things like backups?  It may exist already, but I'm not aware.

    I take one issue with this article, however.  I rarely have a negative experience with my bittorrent downloads, and very often I get speeds over 700kb/sec for the duration of the download.  Perhaps you're not visiting the right bit torrent sites.
    Rate this comment: 12345
    • Brilliant... but...
      maggie on 04/19/2007 at 1:26 PM
      Posts:
      1
      That's because this statement

      "Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours."

      relies on a study done over 4 years ago. A study focusing on Kazaa and Gnutella, not Bittorrent, and highlighting their non-Zipf characteristics... a characteristic BitTorrent _does_ have.

      This method, as complicated as it would be to put into practice,  could be made to 'enhance' BitTorrent or a similar system, but it is not something to replace it.

      Sorry, Brendan, you don't earn the research cookie today.
      Rate this comment: 12345
    • Re: Brilliant..
      ableix on 04/20/2007 at 7:04 PM
      Posts:
      1
      This technology is already being used in backups, as you suggest. EMC's Avamar "commonality factoring" technology does exactly this. It divides all the files that it backs up into chunks, hashes them and for subsequent backups of any files, compares the hashes of previous file chunks it has backed up to the one it is currently trying to backup and eliminates redundancy. It claims to achieve upto 95% reduction in backup size and time.
      Rate this comment: 12345
  • [no subject]
    anton on 04/22/2007 at 1:07 AM
    Posts:
    7
    Avg Rating:
    5/5
    "Let's face it: peer-to-peer file transfers on the Internet are slow."

    No they are not. With bittorrent, downloads from a typical swarm with a few hundred participants easily max out any broadband connection you might have. As the author apparently has never used bittorrent or other current p2p-apps he should at least have checked the literature on theoretical models and practical measurements on current filesharing systems like bittorrent.
    Rate this comment: 12345
    • Re:
      bug_me_not on 04/22/2007 at 7:46 AM
      Posts:
      6
      Avg Rating:
      3/5
      This is true for popular and well-seeded torrents. Rarer, older or larger torrents dry out quite frequently. This is because bittorrent was designed for fast mass leeching, not for seeding many files over a long period. Similar files, or even the same file on other, more sustainable networks (ed2k) could be a good way to heal suffering torrents. I've done this by hand already.
      Rate this comment: 12345
  • Logic?
    bug_me_not on 04/22/2007 at 7:39 AM
    Posts:
    6
    Avg Rating:
    3/5
    "discovered that similar files typically shared anywhere between 20 and 99 percent of their content" This looks like perfect circular logic to me. Isn't similarity defined by shared content? Please rephrase what you were trying to say in more scientific terms.

    And by "user-defined header labels" in music files you probably mean ID3 metadata? Um, they sit at the _tail_ of the file.
    Rate this comment: 12345
Advertisement

Current Issue

Technology Review November/December 2008
Sun + Water = Fuel
An MIT chemist has opened the way to making hydrogen fuel from water using sunlight.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today
Advertisement

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology