BLAKE2 and BLAKE3 Explained: Fast, Modern Hashing

Most developers reach for SHA-256 out of habit and MD5 out of laziness. But there is a family of hash functions that is faster than MD5 on modern CPUs while offering security margins comparable to SHA-3: the BLAKE lineage. If you do content addressing, deduplication, or high-throughput integrity checks, BLAKE2 and BLAKE3 deserve a hard look. This post explains how they work, where their speed comes from, and when to choose them over the SHA standards.

A Short Lineage: From SHA-3 Finalist to BLAKE3

BLAKE began as a candidate in the NIST SHA-3 competition (2008–2012). It was one of the five finalists, alongside Keccak (which ultimately won and became SHA-3), Grøstl, JH, and Skein. BLAKE was widely praised for its strong security analysis and clean design, but Keccak's sponge construction offered a more dramatic departure from the Merkle–Damgård world that SHA-2 already occupied, and NIST valued that diversity.

The BLAKE authors did not abandon the design. In 2012 they released BLAKE2, a version optimized for speed on commodity software. It trimmed the round count, simplified padding, removed some conservative-but-slow elements, and added practical features like a built-in keyed mode. In 2020 the team (Jean-Philippe Aumasson, Samuel Neves, Zooko Wilcox-O'Hearn, and Jack O'Connor) released BLAKE3, a ground-up restructuring around a Merkle tree that unlocks parallelism and SIMD on a scale the earlier designs could not reach.

The ARX Core: Borrowing From ChaCha

The heart of BLAKE is its G function, a mixing step built on the ARX paradigm: Add, Rotate, XOR. These three operations are cheap on every CPU, run in constant time (no data-dependent branches or table lookups, so no cache-timing side channels), and compose into strong diffusion when iterated.

BLAKE's G function is directly derived from the ChaCha stream cipher, also designed by Daniel J. Bernstein. ChaCha's quarter-round mixes four words using only additions, fixed rotations, and XORs. BLAKE adapts this to absorb message words between the mixing operations. The general shape looks like this:

G(a, b, c, d, m0, m1):
    a = a + b + m0
    d = rotate_right(d XOR a, R1)
    c = c + d
    b = rotate_right(b XOR c, R2)
    a = a + b + m1
    d = rotate_right(d XOR a, R3)
    c = c + d
    b = rotate_right(b XOR c, R4)

(The rotation constants R1–R4 differ between BLAKE2b and BLAKE2s and are chosen for good diffusion — the sketch above shows the structure, not exact values.) A compression round applies G across the columns of a 4×4 state matrix, then across its diagonals. Mixing columns and then diagonals means every word influences every other word within a couple of rounds, which is exactly the avalanche behavior a hash function needs.

The HAIFA Counter and Length-Extension Resistance

Classic Merkle–Damgård hashes like MD5 and SHA-1 suffer from length-extension attacks: knowing H(m) lets an attacker compute H(m || padding || suffix) without knowing m. This is why HMAC exists — to wrap such hashes safely for authentication.

BLAKE uses the HAIFA construction, which feeds a counter (the number of bytes hashed so far) and a set of finalization flags into the compression function on every block. Because the final block is processed differently from intermediate blocks, you cannot resume the internal state from the digest alone. The result: BLAKE2 and BLAKE3 are not vulnerable to length extension, which is part of why they can offer a native keyed mode instead of requiring HMAC.

BLAKE2b vs BLAKE2s

BLAKE2 ships in two main flavors tuned to different word sizes:

BLAKE2b uses 64-bit words and is optimized for 64-bit platforms. It produces digests up to 512 bits (64 bytes) and is the default choice on modern servers and desktops.
BLAKE2s uses 32-bit words, targets 8- to 32-bit platforms (embedded, older hardware), and produces digests up to 256 bits.

Both support tree-friendly parallel variants (BLAKE2bp and BLAKE2sp) that split work across cores, foreshadowing BLAKE3's design. In practice BLAKE2b on a 64-bit CPU comfortably outpaces SHA-256 and rivals or beats MD5, while remaining cryptographically sound.

Built-In Keyed Mode: A MAC Without HMAC

One of BLAKE2's most useful features is a native keyed mode. You pass a secret key during initialization, and the function becomes a message authentication code directly — no HMAC wrapper needed.

This works precisely because BLAKE resists length extension. HMAC's nested H(key ⊕ opad || H(key ⊕ ipad || m)) structure exists to neutralize the length-extension weakness of Merkle–Damgård hashes. BLAKE simply does not have that weakness, so a single keyed pass is secure. The result is a MAC that is simpler to implement correctly and faster, since it avoids the double hashing HMAC requires.

BLAKE2 also exposes salt and personalization parameters baked into the initialization, plus arbitrary output length up to the maximum. Personalization lets you domain-separate hashes (so the same input under two different application contexts yields unrelated digests) without inventing your own prefixing scheme.

BLAKE3: A Binary Merkle Tree Over Chunks

BLAKE3 keeps the ARX core but reorganizes everything above it. Instead of processing the message as one long serial chain, BLAKE3 splits the input into 1 KiB chunks, hashes each chunk independently, and combines the chunk results in a binary Merkle tree.

          root
         /    \
      node      node
      /  \      /  \
   chunk chunk chunk chunk
   (1KiB)(1KiB)(1KiB)(1KiB)

Each chunk and each internal node is produced by the same compression function with appropriate flags marking its role (chunk start, chunk end, parent, root). Because chunks are independent, BLAKE3 offers unbounded parallelism: a multi-core machine can hash different subtrees on different threads, and a single core can process multiple chunks at once using SIMD instructions (AVX2, AVX-512, NEON). The compression function itself also runs lanes in parallel. This is the structural reason BLAKE3 throughput scales with available hardware rather than being capped by a serial dependency chain.

BLAKE3 reduced the round count relative to BLAKE2 (the designers argue the security margin remains ample), which combined with the tree structure makes it routinely faster than MD5 and SHA-1 on modern CPUs — while being a current, cryptographically strong design rather than a broken legacy one.

One Algorithm, Four Modes

BLAKE3 is a single construction that serves four purposes by changing only its initialization flags:

Hash — the default digest function.
Keyed hash (MAC) — supply a 256-bit key; the keyed mode is a secure MAC, again with no HMAC needed.
Key derivation (KDF) — derive subkeys from key material and a context string, giving built-in domain separation.
Extendable-output function (XOF) — request any number of output bytes. The output is a stream you can read as far as you need, which is ideal for generating keystreams, multiple keys, or variable-length identifiers.

Having one primitive cover hashing, authentication, key derivation, and arbitrary-length output dramatically shrinks the cryptographic surface area an application has to reason about.

Security: Strong Margins, No Practical Attacks

BLAKE received some of the most thorough cryptanalysis of any SHA-3 candidate, and that analysis carries forward. For BLAKE2 and BLAKE3:

There are no practical collision, preimage, or second-preimage attacks. Published cryptanalysis reaches only a fraction of the full round count, leaving a comfortable security margin.
The HAIFA counter rules out length extension, as discussed.
The ARX design is constant-time by construction, avoiding the cache-timing pitfalls of table-driven ciphers.

It is worth being precise: "fast" here does not mean "weak." BLAKE3's speed comes from parallelism and an efficient core, not from cutting security to the bone. This is the opposite of MD5, whose speed today is a liability because the function is broken.

When to Choose BLAKE2/3 vs SHA-2/SHA-3

Pick the tool for the job:

Choose BLAKE2 or BLAKE3 for high-throughput integrity checking, deduplication, content-addressed storage (think file syncers, build caches, version control internals), and any place you control both ends and want maximum speed with strong security. BLAKE3's XOF and KDF modes are also attractive when you need flexible output.
Choose SHA-256 or SHA-3 when a standard or compliance regime requires it — FIPS 140 validation, government contracts, protocol specifications (TLS, certificates), or interoperability with systems that only speak the SHA families. BLAKE2 and BLAKE3 are excellent but are not NIST-standardized hash functions.

If you are unsure how compression functions, Merkle–Damgård chaining, and sponge constructions fit together, start with our pillar explainer on how hashing works, then come back to compare the constructions.

Try It Yourself

You can generate BLAKE2 and BLAKE3 hashes in your browser with our tool. The hashing runs entirely client-side via Rust compiled to WebAssembly — nothing is uploaded to a server, so you can paste sensitive data or hash local files without anything leaving your machine.

Conclusion

BLAKE2 and BLAKE3 show that modern cryptographic hashing does not force a trade-off between speed and security. By building on ChaCha's ARX core, adding a HAIFA counter that kills length extension, and — in BLAKE3 — restructuring the whole thing as a parallel Merkle tree, the BLAKE family delivers MD5-class speed with serious cryptographic margins, plus native keying, KDF, and XOF modes that eliminate a lot of bolt-on machinery. When you control the stack and want throughput, they are hard to beat. Reach for SHA-2 or SHA-3 when standards demand it, and reach for BLAKE when speed and flexibility win. Either way, you can hash anything privately in your browser to see the digests for yourself.