Let’s face it: peer-to-peer file transfers on the Internet are slow. More than half of all downloads fail, and the average transfer time for a 100-megabyte file is more than 24 hours. But now, a team of computer scientists led by Himabindu Pucha at Purdue University, in Indiana, say that they can double the speed of these transfers by taking advantage of overlap in data chunks contained within nonidentical multimedia files posted on peer-to-peer distribution networks. This would improve the likelihood of success of these transfers.
Peer-to-peer distribution networks such as BitTorrent and Kazaa allow people to download individual files from others’ computers. These systems first locate the copies of the requested file in the network’s global lookup table using its “hash”–a unique identifier computed from the file’s data sequence. Then, the file is divided into chunks so that each user’s computer only has to upload a small piece of it. This technique speeds up file transfers because home users typically have greater bandwidth allocated to downloads compared with uploads. Of course, the overall speed of the transfer will depend on the number of file sources and how much spare upload capacity they have. The more popular a file is, the faster it is to download and the greater the chance of success.
Computer scientist David Andersen, a professor of computer science at Carnegie Mellon University, worked with the Purdue group to develop a way to increase the size of the pool of uploaders called similarity-enhanced transfer (SET). The approach takes advantage of multiple variants of the same music files, video clips, and software, which are often floating around file-distribution networks. “We hope that SET gives you access to a larger pool of people to download from,” says Andersen. “And by doing so, we think you’re more likely to find one of these people who have more spare capacity.”
Before Andersen and his colleagues conducted their study, it was not at all clear how much redundancy existed in file-sharing networks and whether it could be exploited, says Cornell University computer scientist Emin Gün Sirer, who was not involved in the study. The SET team analyzed almost two terabytes of music and video files from file-sharing networks, and it discovered that similar files typically shared anywhere between 20 and 99 percent of their content. With music files, even misspellings in user-defined header labels that identify artist and song titles are enough to throw off BitTorrent, despite the fact that 99 percent of the file is the same. Similarly, multiple versions of the same video are often available with different language tracks.