Technology Review - Published By MIT
Advertisement

Fingerprinting Your Files

Continued from page 1

By Simson Garfinkel

8/04/2004

smaller text tool iconmedium text tool iconlarger text tool icon

The Incredibly Useful Hash

Cryptographic hash functions are one of the fundamental building blocks of todays digital economy. Nevertheless they remain in many ways a mysteryboth to the cryptographers who create them, and to the general public that uses them every day.

Hash functions are sometimes called fingerprint functions because they can be used to create a unique fingerprint of a digital file. The fingerprints are usually 128-bit or 160-bit numbers that are displayed as a sequence of hexadecimal digits. The fingerprint of my name using the MD5 system, for example, is c55bbe0f3ba258f5b1cb6d5b62b0b360. Hash functions are designed so that, in theory at least, no two files will ever hash to the same value.

So that you can get an idea of how these fingerprinting functions work, we've embedded a JavaScript-based MD5 calculator below. Just type in some text and you can see the MD5 hash. Notice how it completely changes every time you add, remove, or change a letter. The manner that the fingerprint changes is unpredictableindeed, if we could predict how it changes, then file fingerprints wouldn't be very useful.

Enter your text below:

The MD5 is:

Most of the hash functions used today are based on a technique developed by MIT professor Ron Rivest in the 1980s. (Rivest is probably best known for being the R in the RSA encryption algorithm, the public key encryption algorithm thats built into practically every Web browser.) At the time, Rivest and other mathematicians were working out the details of the basic cryptographic operations that we now take for granted. The hash functions were envisioned as a kind of cryptographic compression systema way to take a large file and crunch it down to a short string of letters and numbers.

The idea was to use these fingerprints as a kind of surrogate for the files themselves. Instead of digitally signing the entire file, Rivest and others reasoned, you could digitally sign the hash. Because public-key cryptography involves a lot of heavy-duty math, hash functions make it almost as fast to sign an extremely long file as to sign a short file.

One of the most basic things that you can do with a hash function is to find out if a file has changed: just calculate the hash of a file and write it down. Later on you calculate the hash again. If the hash hasnt changed, then the odds are overwhelming that the file hasnt changed either.

For example, say that you keep the finances of your small business using QuickBooks and you want to go on vacation for a few days: people need to use your computer but you want to make sure that nobody modifies the QuickBooks data. One simple thing that you can do is compute the cryptographic hash of the file before you leave and write the number on an index card. When you come back from vacation, just re-compute the hash. If the two values dont match, you know that the file has been tampered with.

Of course, you dont need to stop with just one file. You could compute the cryptographic hash of every file on your computer and put them all into a new filecall that file hashes.txt. You could then compute the hash of hashes.txt and write this fingerprint on your note card. Repeat the process when you come back from vacation and youll have a fast way of knowing if any file on your entire computer has changed. (You wont have any way of knowing which file has changed, but thats a different problem.)

This idea of computing the hash of a hash is the basis of an intrusion detection system called Tripwire that Purdue University computer science professor Gene Spafford and his graduate student Gene Kim invented back in the early 1990s. (Spafford and I have co-authored five books on computer science.) Today, many different programs use this Tripwire approach to assure the integrity of computer files and databases.

Computing hashes of hashes is also the basis of a secure timestamp service invented by Stuart Haber and Scott Stornetta while the two were at Bellcore in 1990. The service, called Surety, makes it possible to generate a cryptographically secure and unforgeable proof that a given document, photograph, or other file existed at a particular time on a particular date and that it hasnt been changed since.

The Surety technique works by computing a hash tree based on the hash codes of every document being time-stamped. The root of the tree is then published in a well-known locationit could, for example, be printed in a classified advertisement in the New York Times. You can prove that your document existed on the day in question by showing that your documents fingerprint was needed to generate the fingerprint-of-fingerprints that appeared in the newspaper.

Other companies and even the U.S. Postal Service have since created their own electronic time-stamp service. But all of these systems rely on an organization that acts as a trusted third-party that in effect signs your document using their private key. The problem with this approach is that the third-party needs to be completely trustworthy: if that third-party decides to create a signature with the wrong date, or some hacker manages to steal the third-partys private key, there is no way to tell a fraudulent signature from a valid one. Its also possible to create fraudulent Surety signatures, of course, but you would need to either go back in time and change what was printed in The New York Times, or else travel all over the world, find every copy that was printed, and change the old fingerprint-of-fingerprints to the new one.

Comments

  • file
    a javascript file md5 generator?
    Rate this comment: 12345
    Guest (8cf14c1ca9280af0e8525011007c2404)
    12/30/2005
    Posts:1
  • file
    a javascript file md5 generator?
    Rate this comment: 12345
    Guest (8cf14c1ca9280af0e8525011007c2404)
    12/30/2005
    Posts:1

Log In

Forgot your password?     Register »
Advertisement

Videos

White Matter
Technology Review November/December 2009

Current Issue

Natural Gas Changes the Energy Map
The United States has vast supplies of this cleaner fossil fuel. But how should we use it?
Featured Content
Sponsored by:
White Papers

Twelve ways to reduce costs with SQL Server 2008
Find out how to reduce costs and get more efficient

Download

Total Economic Impact of SQL Server 2008 Upgrade
Forrester reports on increasing productivity and management capabilities

Download 

Achieving Cost and Resource Savings with UC
How Office Communications Server R2 and Exchange Server can make your business smarter and more efficient

Download 

The Compelling Case for Conferencing
Read how you can improve workload support and find IT efficiencies

Download

How Windows Server 2008 R2 Helps Optimize IT and Save you Money
Read how you can improve workload support and find IT efficiencies

Download

Windows Server 2008 R2 Hyper-V Live Migration
See how Windows Server 2008 R2 and Hyper-V enable virtualization and Live Migration

Download
Advertisement
Subscribe to Technology Review's daily e-mail update. Enter your e-mail address

TECHNOLOGY RESOURCES
Advertisement
MIT Massachusetts Institute of Technology © 2009 Technology Review. All Rights Reserved.