PBKDF2 and scrypt Explained: Key Derivation Functions

A raw hash like SHA-256 is built for speed, which is exactly wrong for two jobs developers reach for it anyway: storing passwords and turning a passphrase into an encryption key. A key derivation function (KDF) fixes that by being deliberately, tunably slow. This article walks through the two most widely deployed password-based KDFs — PBKDF2 and scrypt — how their internals actually work, and where each fits in 2026.

What a key derivation function does

A KDF takes a low-entropy secret (usually a human password), mixes in a unique salt, and produces a fixed-length output after a measured amount of work. It serves two related but distinct jobs:

Slow password hashing. Store the KDF output instead of the password. An attacker who steals the database must spend the full per-guess cost on every candidate, which makes offline cracking expensive.
Key derivation. Stretch a passphrase into one or more cryptographic keys for symmetric encryption — disk encryption, password vaults, encrypted backups.

Both rely on the same primitives: a unique salt to defeat precomputation (rainbow tables) and force per-target attacks, and a deliberate work factor to make each guess costly. The difference between a fast hash and a KDF is not the algorithm family — it is the intentional cost. For the foundations of digest functions underneath all of this, see how hashing works.

PBKDF2: iterating a PRF

PBKDF2 (Password-Based Key Derivation Function 2) comes from RFC 2898 (PKCS #5 v2.0) and is the elder statesman of password KDFs. Its design is almost aggressively simple: take a pseudorandom function (PRF), feed it the password and salt, and apply it over and over for a configurable iteration count.

The PRF is almost always HMAC wrapping an underlying hash — PBKDF2-HMAC-SHA256 and PBKDF2-HMAC-SHA1 being the common pairings. HMAC keyed with the password is what gives PBKDF2 its security properties; PBKDF2 itself just decides how many times to call it and how to combine the results.

Parameters

PBKDF2 takes four knobs:

PRF — the underlying keyed function, typically HMAC-SHA256.
Salt — a unique, random per-password value (16 bytes is a sound default).
Iteration count (c) — how many times the PRF runs per output block; this is the work factor.
Output length (dkLen) — how many bytes of derived key you want.

To produce more output than one hash block, PBKDF2 generates blocks T_1, T_2, … by appending a 32-bit big-endian block index to the salt, then concatenates and truncates them to dkLen.

The block formula

Each output block T is built by chaining the PRF and XORing every iteration together:

DK = T_1 || T_2 || ... || T_dklen/hlen   (truncated to dkLen)

For each block index i:
  U_1 = PRF(Password, Salt || INT_32_BE(i))
  U_2 = PRF(Password, U_1)
  U_3 = PRF(Password, U_2)
  ...
  U_c = PRF(Password, U_{c-1})
  T_i = U_1 XOR U_2 XOR U_3 XOR ... XOR U_c

The XOR of all intermediate values means every iteration contributes to the result — you cannot shortcut to the answer; you must walk the whole chain. The chaining (U_n depends on U_{n-1}) makes the work strictly sequential per block.

Why PBKDF2 is simple but GPU-friendly

PBKDF2's strength is also its weakness. Each guess needs only a few hash-state-sized buffers of memory — a handful of bytes. That tiny footprint is irrelevant on a CPU but enormous on parallel hardware: a GPU has thousands of cores and gigabytes of memory, so an attacker can run tens of thousands of PBKDF2 guesses concurrently with no memory pressure at all. ASICs and FPGAs do even better. PBKDF2 raises the serial cost per guess but does almost nothing to stop massively parallel cracking.

What it has going for it is ubiquity and standing: PBKDF2 is FIPS-approved (NIST SP 800-132), implemented everywhere, and well understood. That is why it shows up in WPA/WPA2 Wi-Fi (PBKDF2-HMAC-SHA1 deriving the PMK from the passphrase and SSID), disk encryption schemes like LUKS, password vaults, and countless protocols that derive keys from secrets. If your problem is "I need a KDF that the compliance auditor recognizes," PBKDF2 is the answer.

scrypt: making memory the bottleneck

Colin Percival designed scrypt in 2009 (later standardized as RFC 7914) precisely to neutralize the parallel-hardware advantage. The insight: if each guess must hold a large array in memory, an attacker can no longer pack thousands of instances onto one GPU — memory, not arithmetic, becomes the limiting resource. This property is called memory-hardness.

The pipeline

scrypt is a sandwich. PBKDF2 sits on the outside; a memory-hard core called ROMix sits in the middle.

Expand. Run PBKDF2-HMAC-SHA256 with one iteration to stretch the password and salt into p blocks of working data, each 128 * r bytes.
Mix (ROMix). For each of the p blocks, run ROMix — the step that fills and randomly reads a large array. This is where the memory cost lives.
Finish. Feed the mixed blocks back through PBKDF2-HMAC-SHA256 (one iteration again) as the salt to produce the final derived key.

ROMix and BlockMix

ROMix is the heart of the algorithm and runs in two phases:

ROMix(B, N):
  X = B
  # Fill: store N successive states in array V
  for i in 0 .. N-1:
    V[i] = X
    X = BlockMix(X)
  # Mix: N pseudo-random reads back into V
  for i in 0 .. N-1:
    j = Integerify(X) mod N    # j depends on X's contents
    X = BlockMix(X XOR V[j])
  return X

The first loop writes N evolving states into the array V, consuming the memory. The second loop reads from V at indices j that depend on the data itself, so the accesses are unpredictable. An attacker who tries to save memory by not storing all of V must recompute missing entries on demand — and because the reads are data-dependent and pseudo-random, that recomputation explodes the time cost. This is the time-memory tradeoff scrypt forces: skimp on memory and pay dearly in time.

BlockMix is the internal shuffle, built on the Salsa20/8 core (a reduced, eight-round variant of the Salsa20 stream-cipher core). It takes the 128 * r-byte block, runs Salsa20/8 over its sub-blocks, and permutes them. Salsa20/8 is fast and not cryptographically strong on its own, but inside ROMix it only needs to diffuse bits, not resist cryptanalysis.

Parameters: N, r, p

scrypt exposes three tuning knobs:

N — CPU/memory cost. Must be a power of two. It sets the number of array entries in ROMix, so memory scales roughly as 128 * r * N bytes. With r = 8, N = 2^15 (32768) uses about 32 MiB per hash; N = 2^17 uses about 128 MiB.
r — block size. Scales the size of each BlockMix block (128 * r bytes), tuning memory-bandwidth usage and how N translates to total memory.
p — parallelism. Independent mixing instances; raises CPU cost without raising the memory of a single instance. Usually left at 1.

N is the dial that matters most: it drives both how long a hash takes and how much RAM it needs. Because an attacker must provision real memory per concurrent guess, doubling N roughly doubles their hardware cost — the parallel advantage that crushes PBKDF2 is exactly what scrypt taxes.

Choosing between them

Memory-hardness is a genuine advantage, but it is not the end of the story. Here is how to decide in 2026:

Reach for Argon2id first. Argon2 won the 2015 Password Hashing Competition and is the current default recommendation. It is memory-hard like scrypt but with independent, well-separated cost parameters (memory, iterations, parallelism) and a more thoroughly analyzed design.
Use scrypt when you want a memory-hard function and Argon2 is not available in your stack, or when you are interoperating with systems that already use it (it is widely deployed, including in some cryptocurrency and storage software).
Use PBKDF2 only when compliance or interoperability demands it — FIPS environments, an existing protocol (WPA2, LUKS), or a library that offers nothing better. If you must use PBKDF2, use HMAC-SHA256 and push the iteration count high (hundreds of thousands and rising over time).

For the full decision framework — salts, peppers, parameter tuning, migration, and why you should never hand-roll any of this — read the password hashing guide.

Salts and parameter upgrades

Whichever KDF you pick, two operational rules hold. First, every password gets a unique, random salt stored alongside the output; this defeats rainbow tables and forces attackers to crack each hash independently. Second, cost parameters are not set-and-forget — hardware gets faster, so raise iteration counts (PBKDF2) or N (scrypt) over time. Modern password-hash encodings (PHC string format) embed the parameters with the hash, so you can verify old hashes at their original cost and re-hash at higher cost on the next successful login.

Try it in your browser

Want to see how iteration count or N changes the output and the time it takes? You can generate PBKDF2 and scrypt outputs in your browser with Hash Generator. Everything runs locally via Rust compiled to WebAssembly — your passwords and salts never leave the page and nothing is uploaded to a server, so you can experiment with real secrets safely.

Conclusion

PBKDF2 and scrypt solve the same problem from two angles. PBKDF2 iterates HMAC to raise the serial cost per guess — simple, FIPS-approved, and everywhere, but cheap to attack in parallel because it uses almost no memory. scrypt wraps PBKDF2 around a memory-hard ROMix core so that each guess demands real RAM, taxing the parallel attackers PBKDF2 ignores. For new systems, prefer Argon2id; choose scrypt when you need memory-hardness without it; and keep PBKDF2 for compliance and interop. To experiment with the parameters yourself, open Hash Generator and run them client-side.