Skip to content

SHA-1 Explained: The Algorithm and SHAttered Break

How SHA-1 works internally — 80 rounds, message expansion, Merkle–Damgård — and how the 2017 SHAttered collision finally broke it. Why SHA-1 is deprecated.

Published on 9 min read

SHA-1 was the workhorse hash function of the early web — embedded in TLS certificates, code-signing pipelines, and the very plumbing of Git. Today it is comprehensively broken, yet it remains worth understanding in depth: its internal structure is the clearest classical example of the Merkle–Damgård construction, and its downfall is one of the most instructive stories in applied cryptography. This article walks through how SHA-1 actually computes a digest, then traces the slow collapse that culminated in the 2017 SHAttered collision.

Origins: NSA, NIST, and FIPS 180-1

SHA-1 was designed by the U.S. National Security Agency and published by NIST in 1995 as part of FIPS 180-1. It was a corrected revision of the original SHA (now called SHA-0, from FIPS 180 in 1993), which the NSA quietly amended after spotting a weakness — the only change being a single one-bit rotation added to the message schedule. That tiny tweak meaningfully improved resistance to differential attacks, foreshadowing both the algorithm's strengths and the eventual nature of its break.

For more than a decade SHA-1 was the default general-purpose cryptographic hash. If you want the broader conceptual grounding before diving into the internals, the pillar guide on how cryptographic hashing works covers the properties — preimage resistance, second-preimage resistance, and collision resistance — that any hash function aims to provide.

The Big Picture: 160 Bits From Arbitrary Input

SHA-1 maps a message of any length (up to 2⁶⁴ − 1 bits) to a fixed 160-bit digest, usually written as 40 hexadecimal characters. Internally it processes the message in 512-bit blocks and maintains a 160-bit state expressed as five 32-bit working variables, conventionally labelled a, b, c, d, and e.

The state is initialized to five fixed 32-bit constants (h0h4). Each 512-bit block is fed through a compression function that scrambles the current state, and the result is added back into the running state. After the last block, the concatenation of the five variables is the digest.

Merkle–Damgård and Padding

SHA-1 follows the Merkle–Damgård construction: a fixed-input-size compression function is iterated over the message blocks, chaining the output of one block in as the input state of the next. This design lets a single 512-bit compression function handle messages of any length, and it transfers the collision resistance of the compression function to the full hash — provided the function itself stays secure.

Before processing, the message is padded so its length is a multiple of 512 bits. The padding scheme is deterministic and unambiguous:

  1. Append a single 1 bit.
  2. Append 0 bits until the length is congruent to 448 modulo 512.
  3. Append the original message length as a 64-bit big-endian integer, filling the final 64 bits of the block.

Encoding the length in the padding (length-strengthening) is what blocks trivial extension and fixed-point tricks. Note, however, that the Merkle–Damgård structure still leaves SHA-1 vulnerable to length-extension attacks — given H(m) and the length of m, an attacker can compute H(m ‖ pad ‖ x) without knowing m. This is a property it shares with MD5 and SHA-256, and it is why HMAC exists.

The Message Schedule: Expanding 16 Words to 80

Each 512-bit block is split into sixteen 32-bit words, W[0] through W[15]. The compression function needs eighty words, so the remaining sixty-four are derived from the first sixteen. This is the message schedule, and it is precisely where SHA-1 differs from its broken predecessor.

Each new word is the XOR of four earlier words, rotated left by one bit. That single rotation (absent in SHA-0) is what spreads differences across word positions and was the NSA's fix:

for t = 16 to 79:
    W[t] = ROTL_1( W[t-3] XOR W[t-8] XOR W[t-14] XOR W[t-16] )

ROTL_1 is a circular left shift by one position within the 32-bit word. The recurrence guarantees that every expanded word depends, directly or transitively, on many of the original sixteen, diffusing input changes throughout the eighty rounds.

Eighty Rounds in Four Groups of Twenty

The compression function runs 80 rounds, divided into four groups of 20. Each group uses its own round function and its own round constant K. The round functions are:

  • Rounds 0–19Ch (choose): (b AND c) OR ((NOT b) AND d). Variable b selects bitwise between c and d.
  • Rounds 20–39Parity: b XOR c XOR d. A pure bitwise parity with no nonlinearity.
  • Rounds 40–59Maj (majority): (b AND c) OR (b AND d) OR (c AND d). Each output bit is the majority vote of the three inputs.
  • Rounds 60–79Parity again: b XOR c XOR d.

Each group has a distinct 32-bit additive constant K, four constants in total — one per group. These constants are derived from the square roots of 2, 3, 5, and 10, giving "nothing-up-my-sleeve" values that demonstrably hide no backdoor. The alternation between Ch, Parity, Maj, and Parity mixes nonlinear and linear behaviour across the rounds.

The Round Operation: Rotate and Add

Within each round the five working variables are updated using the current message word W[t], the round function f, and the round constant K. The defining operation is a rotate-and-add followed by a shift of the variables:

temp = ROTL_5(a) + f(b, c, d) + e + W[t] + K
e = d
d = c
c = ROTL_30(b)
b = a
a = temp

All additions are modulo 2³². ROTL_5(a) injects diffusion from the high word into the new value, while ROTL_30(b) rotates b before it becomes the new c. The four older variables simply shift down one slot. Because only a is freshly computed each round, it takes several rounds for a change to propagate through all five variables — the cryptanalytic difficulty lies in tracking exactly how differences flow through these rotations and additions.

Feed-Forward: Chaining the Blocks

After the 80th round, the compression function performs its feed-forward: it adds the five working variables back into the block's starting state, modulo 2³².

h0 = h0 + a
h1 = h1 + b
h2 = h2 + c
h3 = h3 + d
h4 = h4 + e

This addition of the input state to the output is the Davies–Meyer-style step that makes the function hard to invert: without it, the round transformation would be a reversible permutation. The updated h0h4 become the input state for the next block. After the final block, concatenating h0 ‖ h1 ‖ h2 ‖ h3 ‖ h4 yields the 160-bit digest.

Cracks Appear: 2005 Onward

Theoretical trouble arrived early. In 2005, Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu published differential cryptanalysis showing a collision could be found in roughly 2⁶⁹ operations — far below the 2⁸⁰ expected from a 160-bit hash's birthday bound. Follow-up work pushed the estimate lower still, to around 2⁶³.

These were theoretical attacks: no actual collision had been produced, and the computation remained out of practical reach for years. But the message to standards bodies was unambiguous — SHA-1's collision resistance was fundamentally compromised, and migration needed to begin. NIST formally deprecated SHA-1 for digital signatures in 2011.

SHAttered: The First Real Collision (2017)

In February 2017, researchers at CWI Amsterdam and Google announced SHAttered, the first published SHA-1 collision. They produced two distinct PDF files with identical SHA-1 digests but different displayed content. This was an identical-prefix collision: the two files share a common prefix, followed by carefully crafted near-collision blocks that drive the internal state to converge.

The computation was enormous — on the order of 2⁶³ SHA-1 evaluations, executed across a large GPU cluster. The team estimated it consumed roughly 6,500 CPU-years and 100 GPU-years of computation. While far beyond a hobbyist's budget, it was firmly within reach of a well-resourced organization, proving SHA-1 collisions were no longer hypothetical. The two colliding PDFs became the canonical demonstration that the algorithm was dead for any integrity purpose.

A Shambles: Chosen-Prefix Collisions (2020)

SHAttered's identical-prefix collision is powerful but limited — both files must share a fixed prefix. The far more dangerous variant is the chosen-prefix collision, where an attacker picks two arbitrary, different prefixes and appends colliding suffixes. This is the type of collision that broke MD5-based certificate forgery a decade earlier.

In 2020, Gaëtan Leurent and Thomas Peyrin published "SHA-1 is a Shambles," the first practical chosen-prefix collision for SHA-1, at a cost low enough (an estimated tens of thousands of dollars in GPU time) to be realistic for many attackers. They demonstrated concrete impact against the PGP/GnuPG web of trust, crafting two PGP identity certificates that collided — allowing an attacker to forge a trusted signature. GnuPG responded by rejecting SHA-1 identity signatures created after a cutoff date.

Deprecation Everywhere

By the time SHAttered landed, the industry had already been moving away from SHA-1, but these attacks accelerated and finalized the retreat:

  • TLS certificates — Major browsers stopped trusting SHA-1-signed certificates from public certificate authorities by 2017; CAs had ceased issuing them.
  • Code signing — Platforms migrated signing and timestamping to SHA-256 and stronger; SHA-1 signatures are rejected or flagged.
  • Git — Git historically used SHA-1 to name every object and commit. SHAttered prompted Git to add collision-detection hardening (rejecting inputs that match known attack patterns) and to begin an ongoing transition to a SHA-256 object format. The migration is non-trivial precisely because the hash is woven into Git's content-addressable core.

NIST has set 2030 as the deadline to fully retire SHA-1 across federal systems. For a side-by-side comparison of the modern alternatives and why they hold up, see MD5 vs SHA-256 vs SHA-3.

Recommendation: Do Not Use SHA-1 for Security

The verdict is unambiguous: SHA-1 must not be used for any security purpose. Both collision variants are practical, and chosen-prefix collisions enable real forgery against signatures and certificates. Use SHA-256 or SHA-3 for new work, and HMAC for authentication. SHA-1 survives only in narrow, non-security legacy contexts — Git object names guarded by collision detection, or checksums where an adversary is genuinely absent — and even there, migration is the right long-term answer.

That said, you will still encounter SHA-1 when verifying legacy artifacts or interoperating with older systems. You can generate a SHA-1 hash in your browser with our free tool — it runs entirely client-side via Rust compiled to WebAssembly, so nothing you type is ever uploaded to a server.

Conclusion

SHA-1 is a beautifully clear illustration of Merkle–Damgård design: a 512-bit block, five 32-bit variables, an 80-word message schedule built from XORs and a single rotation, and eighty rounds of rotate-and-add across four round functions. It is also a cautionary tale. A weakness that was merely theoretical in 2005 became the SHAttered PDF collision in 2017 and a practical chosen-prefix attack by 2020. The lesson holds for every cryptographic primitive: "no known attack" is not "no possible attack," and deprecation timelines exist to get ahead of the curve. To experiment with SHA-1 and modern hashes safely and privately, try the in-browser hash generator — fast, local, and nothing leaves your machine.

Related articles

A deep, developer-focused guide to how cryptographic hash functions work — properties, Merkle–Damgård vs sponge constructions, the birthday bound, and where each family fits.
MD5 vs SHA-256 vs SHA-3 compared — output size, internal construction, speed, security status, and a clear decision guide for integrity, security, and password use cases.
How the MD5 hash algorithm works internally — Merkle–Damgård, the 64-step compression function, padding — and why MD5 is cryptographically broken yet still used for checksums.