Technology Review - Published By MIT
Log in to My.TechnologyReview.com | Register
Advertisement
[1] 2 Next »

Thursday, April 19, 2007

How to Speed Up Movie Downloads

Researchers have designed a new way to get the most out of peer-to-peer file-sharing networks, decreasing the time it takes to download movies and music.

By Brendan Borrell

smaller text tool iconmedium text tool iconlarger text tool icon
Speedy downloads: A new file-sharing approach can decrease the time it takes to download movies and music on peer-to-peer networks.
Credit: Technology Review

Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours. But now, a team of computer scientists led by Himabindu Pucha at Purdue University, in Indiana, say that they can double the speed of these transfers by taking advantage of overlap in data chunks contained within nonidentical multimedia files posted on peer-to-peer distribution networks. This would improve the likelihood of success of these transfers.

Peer-to-peer distribution networks such as BitTorrent and Kazaa allow people to download individual files from others' computers. These systems first locate the copies of the requested file in the network's global lookup table using its "hash"--a unique identifier computed from the file's data sequence. Then, the file is divided into chunks so that each user's computer only has to upload a small piece of it. This technique speeds up file transfers because home users typically have greater bandwidth allocated to downloads compared with uploads. Of course, the overall speed of the transfer will depend on the number of file sources and how much spare upload capacity they have. The more popular a file is, the faster it is to download and the greater the chance of success.

Computer scientist David Andersen, a professor of computer science at Carnegie Mellon University, worked with the Purdue group to develop a way to increase the size of the pool of uploaders called similarity-enhanced transfer (SET). The approach takes advantage of multiple variants of the same music files, video clips, and software, which are often floating around file-distribution networks. "We hope that SET gives you access to a larger pool of people to download from," says Andersen. "And by doing so, we think you're more likely to find one of these people who have more spare capacity."

Before Andersen and his colleagues conducted their study, it was not at all clear how much redundancy existed in file-sharing networks and whether it could be exploited, says Cornell University computer scientist Emin Gün Sirer, who was not involved in the study. The SET team analyzed almost two terabytes of music and video files from file-sharing networks, and it discovered that similar files typically shared anywhere between 20 and 99 percent of their content. With music files, even misspellings in user-defined header labels that identify artist and song titles are enough to throw off BitTorrent, despite the fact that 99 percent of the file is the same. Similarly, multiple versions of the same video are often available with different language tracks.

[1] 2 Next »

Comments

  • Brilliant..
    stradric on 04/19/2007 at 12:53 PM
    Posts:
    19
    Avg Rating:
    4/5
    This is pretty brilliant.  Couldn't the same technique be applied to file storage for things like backups?  It may exist already, but I'm not aware.

    I take one issue with this article, however.  I rarely have a negative experience with my bittorrent downloads, and very often I get speeds over 700kb/sec for the duration of the download.  Perhaps you're not visiting the right bit torrent sites.
    Rate this comment: 12345
    • Brilliant... but...
      maggie on 04/19/2007 at 1:26 PM
      Posts:
      1
      That's because this statement

      "Let's face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours."

      relies on a study done over 4 years ago. A study focusing on Kazaa and Gnutella, not Bittorrent, and highlighting their non-Zipf characteristics... a characteristic BitTorrent _does_ have.

      This method, as complicated as it would be to put into practice,  could be made to 'enhance' BitTorrent or a similar system, but it is not something to replace it.

      Sorry, Brendan, you don't earn the research cookie today.
      Rate this comment: 12345
    • Re: Brilliant..
      ableix on 04/20/2007 at 7:04 PM
      Posts:
      1
      This technology is already being used in backups, as you suggest. EMC's Avamar "commonality factoring" technology does exactly this. It divides all the files that it backs up into chunks, hashes them and for subsequent backups of any files, compares the hashes of previous file chunks it has backed up to the one it is currently trying to backup and eliminates redundancy. It claims to achieve upto 95% reduction in backup size and time.
      Rate this comment: 12345
  • [no subject]
    anton on 04/22/2007 at 1:07 AM
    Posts:
    7
    Avg Rating:
    5/5
    "Let's face it: peer-to-peer file transfers on the Internet are slow."

    No they are not. With bittorrent, downloads from a typical swarm with a few hundred participants easily max out any broadband connection you might have. As the author apparently has never used bittorrent or other current p2p-apps he should at least have checked the literature on theoretical models and practical measurements on current filesharing systems like bittorrent.
    Rate this comment: 12345
    • Re:
      bug_me_not on 04/22/2007 at 7:46 AM
      Posts:
      6
      Avg Rating:
      3/5
      This is true for popular and well-seeded torrents. Rarer, older or larger torrents dry out quite frequently. This is because bittorrent was designed for fast mass leeching, not for seeding many files over a long period. Similar files, or even the same file on other, more sustainable networks (ed2k) could be a good way to heal suffering torrents. I've done this by hand already.
      Rate this comment: 12345
  • Logic?
    bug_me_not on 04/22/2007 at 7:39 AM
    Posts:
    6
    Avg Rating:
    3/5
    "discovered that similar files typically shared anywhere between 20 and 99 percent of their content" This looks like perfect circular logic to me. Isn't similarity defined by shared content? Please rephrase what you were trying to say in more scientific terms.

    And by "user-defined header labels" in music files you probably mean ID3 metadata? Um, they sit at the _tail_ of the file.
    Rate this comment: 12345
Advertisement

Current Issue

Technology Review September/October 2008
How Obama Really Did It
Social technology helped bring him to the brink of the presidency.
•  Subscribe
Save 41%
•  Table of Contents
•  MIT News

Magazine Services

Career Resources

MIT Technology Insider

Stories and breaking news from inside MIT about the latest research, innovations, and startups--in a convenient monthly e-newsletter. Subscribe today

Follow us on Twitter

Twitter

Get Technology Review updates via the web, cellphone, or Instant Messager – Follow techreview on Twitter!

Advertisement

More Technology News from Forbes

Advertisement
Advertisement
Advertisement
TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology