Technology Review - Published By MIT
Advertisement

How to Speed Up Movie Downloads

Researchers have designed a new way to get the most out of peer-to-peer file-sharing networks, decreasing the time it takes to download movies and music.

By Brendan Borrell

Thursday, April 19, 2007

smaller text tool iconmedium text tool iconlarger text tool icon

Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours. But now, a team of computer scientists led by Himabindu Pucha at Purdue University, in Indiana, say that they can double the speed of these transfers by taking advantage of overlap in data chunks contained within nonidentical multimedia files posted on peer-to-peer distribution networks. This would improve the likelihood of success of these transfers.

Speedy downloads: A new file-sharing approach can decrease the time it takes to download movies and music on peer-to-peer networks.
Credit: Technology Review

Peer-to-peer distribution networks such as BitTorrent and Kazaa allow people to download individual files from others' computers. These systems first locate the copies of the requested file in the network's global lookup table using its "hash"--a unique identifier computed from the file's data sequence. Then, the file is divided into chunks so that each user's computer only has to upload a small piece of it. This technique speeds up file transfers because home users typically have greater bandwidth allocated to downloads compared with uploads. Of course, the overall speed of the transfer will depend on the number of file sources and how much spare upload capacity they have. The more popular a file is, the faster it is to download and the greater the chance of success.

Computer scientist David Andersen, a professor of computer science at Carnegie Mellon University, worked with the Purdue group to develop a way to increase the size of the pool of uploaders called similarity-enhanced transfer (SET). The approach takes advantage of multiple variants of the same music files, video clips, and software, which are often floating around file-distribution networks. "We hope that SET gives you access to a larger pool of people to download from," says Andersen. "And by doing so, we think you're more likely to find one of these people who have more spare capacity."

Before Andersen and his colleagues conducted their study, it was not at all clear how much redundancy existed in file-sharing networks and whether it could be exploited, says Cornell University computer scientist Emin Gün Sirer, who was not involved in the study. The SET team analyzed almost two terabytes of music and video files from file-sharing networks, and it discovered that similar files typically shared anywhere between 20 and 99 percent of their content. With music files, even misspellings in user-defined header labels that identify artist and song titles are enough to throw off BitTorrent, despite the fact that 99 percent of the file is the same. Similarly, multiple versions of the same video are often available with different language tracks.

Comments

  • Brilliant..
    This is pretty brilliant.  Couldn't the same technique be applied to file storage for things like backups?  It may exist already, but I'm not aware.

    I take one issue with this article, however.  I rarely have a negative experience with my bittorrent downloads, and very often I get speeds over 700kb/sec for the duration of the download.  Perhaps you're not visiting the right bit torrent sites.
    Rate this comment: 12345

    stradric
    04/19/2007
    Posts:30
    Avg Rating:
    4/5
    • Brilliant... but...
      That's because this statement

      "Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours."

      relies on a study done over 4 years ago. A study focusing on Kazaa and Gnutella, not Bittorrent, and highlighting their non-Zipf characteristics... a characteristic BitTorrent _does_ have.

      This method, as complicated as it would be to put into practice,  could be made to 'enhance' BitTorrent or a similar system, but it is not something to replace it.

      Sorry, Brendan, you don't earn the research cookie today.
      Rate this comment: 12345

      maggie
      04/19/2007
      Posts:1
    • Re: Brilliant..
      This technology is already being used in backups, as you suggest. EMC's Avamar "commonality factoring" technology does exactly this. It divides all the files that it backs up into chunks, hashes them and for subsequent backups of any files, compares the hashes of previous file chunks it has backed up to the one it is currently trying to backup and eliminates redundancy. It claims to achieve upto 95% reduction in backup size and time.
      Rate this comment: 12345

      ableix
      04/20/2007
      Posts:1
  • [no subject]
    "Let's face it: peer-to-peer file transfers on the Internet are slow."

    No they are not. With bittorrent, downloads from a typical swarm with a few hundred participants easily max out any broadband connection you might have. As the author apparently has never used bittorrent or other current p2p-apps he should at least have checked the literature on theoretical models and practical measurements on current filesharing systems like bittorrent.
    Rate this comment: 12345

    anton
    04/22/2007
    Posts:7
    Avg Rating:
    5/5
    • Re:
      This is true for popular and well-seeded torrents. Rarer, older or larger torrents dry out quite frequently. This is because bittorrent was designed for fast mass leeching, not for seeding many files over a long period. Similar files, or even the same file on other, more sustainable networks (ed2k) could be a good way to heal suffering torrents. I've done this by hand already.
      Rate this comment: 12345

      bug_me_not
      04/22/2007
      Posts:6
      Avg Rating:
      2/5
  • Logic?
    "discovered that similar files typically shared anywhere between 20 and 99 percent of their content" This looks like perfect circular logic to me. Isn't similarity defined by shared content? Please rephrase what you were trying to say in more scientific terms.

    And by "user-defined header labels" in music files you probably mean ID3 metadata? Um, they sit at the _tail_ of the file.
    Rate this comment: 12345

    bug_me_not
    04/22/2007
    Posts:6
    Avg Rating:
    2/5
  • NY
    Download and watch the new movie Up http://blog-movie.com/Up.html
    Rate this comment: 12345

    myid38
    09/28/2009
    Posts:1

Log In

Forgot your password?     Register »
Advertisement

Videos

Making 3D Maps on the Move
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.