Faster Computation Will Damage the Internet’s Integrity

An Intel researcher warns that SHA-1, a commonly used cryptographic algorithm, will be easy to crack by 2018.

Glenn Fleishmanarchive page

October 8, 2012

A security researcher says that the end is near for a cryptographic routine commonly used to protect the integrity of secure Web transactions, stored passwords, and hundreds of other purposes. By 2018, writes Intel’s Jesse Walker on a mailing list devoted to this form of cryptographic protection, a criminal organization could easily afford the cost of, in essence, forging the signature on critical security documents using commodity computing hardware. By 2021, he says, an academic group could afford to own or rent the necessary processing time. The good news is that there’s ample time for the thousands of bits of software and millions of organizations to move to a better process.

The algorithm in question is SHA-1, where SHA stands for Secure Hash Algorithm, the second in a series of such routines published by the National Institute for Standards and Technology. A hash has a critical purpose for digital certificates, which validate Web servers and other secure services, and for protecting passwords.

A hash results from taking a set of input text, which can be as short as a password, and running through an enormous number of mathematical operations. The short sequence of digits that makes up a hash has no seeming relationship to the starting text, although feed the same text in and the hash is always the same. Vary the text slightly, changing just a single bit anywhere in it (such as the letter “B” becoming the letter “C”), and the hash is dramatically and unpredictably different.

For instance, take the phrase “My dog has fleas” and run it through a program that creates SHA-1 hashes, and you get: “bea0ce15a981ca655bef4679e2dbe91d24a378d6”. Change “dog” to “cog” and the output is “0dfba958c7ae2fca75aaaf75d300fb846acccd5c”.

A hash is designed to be irreversible: once the text goes into the grinder, the output hash can’t be used to determine the starting text. Brute force methods do work, as has been shown many times in the last few years, and most recently with the leak of a LinkedIn password database. Crackers can take enormous dictionaries of words that commonly occur in passwords, run them one at a time through a SHA-1 generator, and then compare the resulting hashes against the hashes that were stored in a database. (There are ways to add randomness to text before it’s hashed that well-designed password systems rely on, and which LinkedIn didn’t employ.)

Brute force only works if the starting text can be guessed, or, as with passwords, the starting text is insufficiently random. Choices like “password” or “12345678” may be guessed, but combinations like “3y3balls” (where the “3” replaces an “e”) are easily matched using similar substitutions as well. A longer password (often 9 to 12 characters, depending on other factors) that doesn’t contain dictionary words or even has a single piece of punctuation in it becomes impossible to crack by trying possibilities. The digital certificates used by Web servers, for instance, rely on public-key cryptography, which rely on keys that can’t be brute-forced.

But there’s another approach that renders cracking a password unnecessary—and this is the issue Intel’s Walker is worried about. In this alternate approach, a cracker tries to arrange a “collision,” which is the discovery of another set of input text that, when it passes through the hashing algorithm, produces exactly the same string. It would be as if “My dog has fleas” and “Yo momma insufficiently thin” resulted in the same stream of SHA-1 digits. Such a weakness in SHA-1 has been known since 2005, but Walker reckons it’ll be practical to exploit collisions by 2018.

The utility of a collision is that a cracker can perform an enormous amount of calculation trying different input texts until the resulting output hash is the same, and then use that collided text to gain access. If my password for a service is “ads09f8a0ds9f80adflka!!” and “klioeuroiur&&730_” works just as well, the cracker simply enters the latter, the hash matches, and he or she gains access.

Passwords aren’t the brass ring here, though. What criminals and nation-states would use collisions for is to impersonate Web and other secured servers. The digital certificates used for SSL/TLS transactions for HTTPS, for instance, rely on public-key encryption, the gold standard for protecting a private key, and hashes that can only be produced by the possessor of that private key.

In certificate attacks in the last few years, parties that may be allied with central governments have used social engineering or flaws in security to obtain Web-server documents that are absolutely legitimate, but were generated illegitimately. With a collision, an impersonator doesn’t need to go that far. A document could be generated that a Web browser would accept as correct because the input text provided, when run through the SHA-1 routine, produced the same hash. A collision weakness with an earlier hashing routine, MD5, that sadly remains in wide use within companies, allowed the Flame espionage malware discovered earlier this year to use exactly this exploit to infect computers.

The knowledge of weaknesses in SHA-1 led the National Security Agency (NSA) to develop and NIST to release SHA-2, which so far hasn’t been shown to be susceptible to the same flaws. On October 2, NIST announced the selection for the SHA-3, designed by four European cryptographers, which came from an open call in 2007 for competing algorithms. NIST recommends, but the U.S. government doesn’t yet require, switching to SHA-2.

Walker’s post spells out the timeframe left in which it is too expensive for all but governments to create collisions in SHA-1, and that’s just a few years. His calculations include a number of assumptions that make it conservative, too: new weaknesses, a faster pace than Moore’s Law in an increase in calculation speed, or a reduction in price by Amazon as a bellwether for on-demand computing time would bring SHA-1’s effective demise even closer. (Walker’s details are in a post on an open list which strangely has locked archives. Security guru Bruce Schneier reproduced it on his blog.)

It would seem logical that Web site operators and software makers would have moved to use SHA-2 everywhere already, just as everyone moved from MD5 and SHA-0 (two older, broken methods) when that was necessary. The trouble has been backwards compatibility. Windows XP didn’t add support for any form of SHA until Service Pack 3 in 2008; Java 1.4.2 is needed for SHA-2 on servers; and other proprietary software may still lag behind.

There’s a related problem for passwords, too. Because passwords are hashed and not stored in plain text, a site that wants to upgrade its database from SHA-1 to SHA-2 (or to a more sensible function that has even better properties for secure password storage, like bcrypt) has to have every user log in to make the swap. (A site could offer “hash amnesty,” asking all users to log in over, say, 60 days, and then delete the passwords and require a reset for those who want to access their data after that point.)

All digital certificates have to be reissued, too. For certificates signed by external parties, like those used with Web and email servers, they tend to be renewed every year or two, making upgrades easier.

The end of SHA-1 will likely come with a bang, not a whimper. Walker’s rough path to SHA-1 becoming affordable to crack will likely be interrupted suddenly with a breakthrough that challenges all systems and software that didn’t make the leap. His post should be heeded as a call to action, not a distant kazoo.

Keep Reading

Large language models can do jaw-dropping things. But nobody knows exactly why.

And that's a problem. Figuring it out is one of the biggest scientific puzzles of our time and a crucial step towards controlling more powerful future models.