Technology Review - Published By MIT
Advertisement

How to Speed Up Movie Downloads

Continued from page 1

By Brendan Borrell

Thursday, April 19, 2007

smaller text tool iconmedium text tool iconlarger text tool icon

One challenge in devising a distribution system that can locate similar files is that the system must search not just for each file but also for every chunk within that file. A 700-megabyte video clip may be divided into 40,000 chunks, which means that the system must make several billion comparisons. SET is a hybrid system that first locates users with identical files before searching for requested chunks in file variants. SET's innovation in the latter task is what the researchers call handprinting, which efficiently identifies similar files using a constant number of search queries regardless of the file size. SET divides the requested file into 16-kilobyte chunks, which are then distilled into 160-bit-chunk hashes, or fingerprints. These fingerprints are sorted based on their numeric value, and the system selects the first few to form the handprint. Comparing handprints, says Andersen, "gives you a 90 percent chance of discovering a file that is 10 percent or more similar."

Locating that file with just 10 percent similarity could speed up downloads by 8 percent. For music files with greater than 90 percent similarity, a five-minute download on BitTorrent would take just over two minutes with SET. For a single user, the savings could be even greater if he or she happens to be downloading an unpopular variant of a common file. Andersen proposes a scenario in which a U.S.-based user downloads a German version of a popular movie. Currently, the movie would most likely be transferred from a slower overseas connection. But with SEC, users could take advantage of faster local sources for video and receive only the audio from German peers.

"It's a very clever scheme for finding the chunks in common," says Sirer. However, he says that "for the most popular content, [SET] won't make too much of a difference because there are already plenty of other peers who host that content. But I can imagine that other content which would otherwise be slow to get from a single swarm might actually be easier to download."

Although the researchers have released the source code for the SET system, they have no plans to build a graphical user interface for it or to deploy it in current file-sharing networks. "The math behind it was complex to analyze," Andersen says, "but the idea is relatively straightforward, and the implementation won't be bad." He says he wouldn't be surprised if someone deployed the SET system in the next year.

Comments

  • Brilliant..
    This is pretty brilliant.  Couldn't the same technique be applied to file storage for things like backups?  It may exist already, but I'm not aware.

    I take one issue with this article, however.  I rarely have a negative experience with my bittorrent downloads, and very often I get speeds over 700kb/sec for the duration of the download.  Perhaps you're not visiting the right bit torrent sites.
    Rate this comment: 12345

    stradric
    04/19/2007
    Posts:30
    Avg Rating:
    4/5
    • Brilliant... but...
      That's because this statement

      "Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours."

      relies on a study done over 4 years ago. A study focusing on Kazaa and Gnutella, not Bittorrent, and highlighting their non-Zipf characteristics... a characteristic BitTorrent _does_ have.

      This method, as complicated as it would be to put into practice,  could be made to 'enhance' BitTorrent or a similar system, but it is not something to replace it.

      Sorry, Brendan, you don't earn the research cookie today.
      Rate this comment: 12345

      maggie
      04/19/2007
      Posts:1
    • Re: Brilliant..
      This technology is already being used in backups, as you suggest. EMC's Avamar "commonality factoring" technology does exactly this. It divides all the files that it backs up into chunks, hashes them and for subsequent backups of any files, compares the hashes of previous file chunks it has backed up to the one it is currently trying to backup and eliminates redundancy. It claims to achieve upto 95% reduction in backup size and time.
      Rate this comment: 12345

      ableix
      04/20/2007
      Posts:1
  • [no subject]
    "Let's face it: peer-to-peer file transfers on the Internet are slow."

    No they are not. With bittorrent, downloads from a typical swarm with a few hundred participants easily max out any broadband connection you might have. As the author apparently has never used bittorrent or other current p2p-apps he should at least have checked the literature on theoretical models and practical measurements on current filesharing systems like bittorrent.
    Rate this comment: 12345

    anton
    04/22/2007
    Posts:7
    Avg Rating:
    5/5
    • Re:
      This is true for popular and well-seeded torrents. Rarer, older or larger torrents dry out quite frequently. This is because bittorrent was designed for fast mass leeching, not for seeding many files over a long period. Similar files, or even the same file on other, more sustainable networks (ed2k) could be a good way to heal suffering torrents. I've done this by hand already.
      Rate this comment: 12345

      bug_me_not
      04/22/2007
      Posts:6
      Avg Rating:
      2/5
  • Logic?
    "discovered that similar files typically shared anywhere between 20 and 99 percent of their content" This looks like perfect circular logic to me. Isn't similarity defined by shared content? Please rephrase what you were trying to say in more scientific terms.

    And by "user-defined header labels" in music files you probably mean ID3 metadata? Um, they sit at the _tail_ of the file.
    Rate this comment: 12345

    bug_me_not
    04/22/2007
    Posts:6
    Avg Rating:
    2/5
  • NY
    Download and watch the new movie Up http://blog-movie.com/Up.html
    Rate this comment: 12345

    myid38
    09/28/2009
    Posts:1

Log In

Forgot your password?     Register »
Advertisement

Videos

Microsoft's Many Multitouch Mice
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.