CRC32 and Checksums: Error Detection, Not Security

Every time a file crosses a network or sits on a disk, something has to confirm it arrived intact. That job usually falls to a checksum like CRC32 — a tiny, blisteringly fast value that flags accidental corruption. But developers reach for CRC32 in the wrong places constantly, treating it like a security primitive. It is not. This post explains exactly what CRC32 and friends compute, why they are so good at catching bit flips, and why they are catastrophically bad at resisting an attacker.

Two Different Jobs: Detection vs Security

A checksum and a cryptographic hash both turn arbitrary data into a fixed-size value, which is why they get confused. But they are engineered to solve opposite problems.

A checksum or CRC answers: did this data change by accident? It is designed against noise — a flipped bit on a wire, a flaky disk sector, a dropped byte. It assumes nature is the adversary, and nature is not clever.

A cryptographic hash answers: did anyone — including a motivated attacker — change this data? It must make finding two inputs with the same output computationally infeasible. If you want the deep version of how that property is built, see the pillar guide on how hashing works.

CRC32 is brilliant at the first job and useless at the second. The rest of this article is mostly about why.

CRC as Polynomial Division in GF(2)

The "CR" in CRC stands for cyclic redundancy, and the math behind it is polynomial division — but in a peculiar number system called GF(2), the field with two elements, 0 and 1.

In GF(2), addition and subtraction are both just XOR, and there are no carries. This is the crucial simplification: arithmetic that would be painful with normal integers becomes pure bit manipulation.

To compute a CRC, you treat the message as the coefficients of a giant polynomial. A byte 0b10110001 becomes:

x⁷ + x⁵ + x⁴ + x⁰

You then divide this message polynomial (shifted left by the CRC width) by a fixed generator polynomial. The CRC is simply the remainder of that division. Because subtraction is XOR, the whole long-division process is a sequence of conditional XORs and shifts — no arithmetic units required.

The Shift-and-XOR Algorithm

The bit-by-bit form of CRC is the polynomial long division written out as a loop. You walk through the message one bit at a time, and whenever the high bit of your running register is set, you XOR in the polynomial.

crc = INIT
for each byte b in message:
    crc ^= (b << (WIDTH - 8))      # bring byte into the top
    repeat 8 times:
        if (crc & TOP_BIT) != 0:
            crc = (crc << 1) ^ POLY
        else:
            crc = crc << 1
crc ^= FINAL_XOR
return crc & WIDTH_MASK

That if high bit set, shift and XOR the polynomial; else just shift is the entire heart of CRC. Everything else is bookkeeping.

The Table-Driven Optimization

Processing one bit at a time is slow. The classic optimization precomputes a 256-entry lookup table: one entry per possible byte, holding the CRC contribution of that byte run through the eight-shift inner loop in advance.

With the table, the per-byte work collapses to a single table lookup and an XOR:

crc = (crc >> 8) ^ table[(crc ^ byte) & 0xFF]

This is the form zlib and most libraries ship. Modern variants use slicing-by-8 or slicing-by-16 tables to process multiple bytes per iteration, and on recent CPUs CRC-32C has a dedicated hardware instruction (more on that below).

The Four Parameters That Define a CRC

Two implementations can both be "CRC-32" and produce completely different values, because a CRC is defined by a small set of parameters:

Polynomial — the generator divisor. This single constant determines the error-detection strength.
Initial value (INIT) — what the register starts at. CRC-32 starts at 0xFFFFFFFF so that leading zero bytes still affect the result.
Reflection (input/output bit order) — whether bits are processed least-significant-first. CRC-32 reflects both input and output, which is why real implementations often shift right, not left.
Final XOR — a constant XORed into the result. CRC-32 uses 0xFFFFFFFF.

Get any one of these wrong and your CRC will disagree with the reference. This is the single most common source of "my CRC doesn't match the spec" bugs.

CRC-32 vs CRC-32C

The classic CRC-32 uses the IEEE 802.3 polynomial 0x04C11DB7 (or 0xEDB88320 reflected). It is everywhere: Ethernet frames, the PNG image format, ZIP archives, gzip, and the zlib library all rely on it. When someone says "the CRC32," this is almost always what they mean.

CRC-32C (Castagnoli) uses a different polynomial, 0x1EDC6F41. It was chosen for measurably better error-detection properties at typical message lengths, and it is used by iSCSI, the ext4 and Btrfs filesystems, and SCTP. Its decisive practical advantage: Intel's SSE4.2 instruction set includes a CRC32 hardware instruction that computes CRC-32C directly, making it far faster than a table-driven loop. If you control both ends and want speed, CRC-32C is usually the better modern choice.

Why CRCs Catch Errors So Well

The reason CRCs dominate error detection is that their guarantees are provable, not statistical, for the common failure modes:

All single-bit errors are detected, as long as the polynomial has at least two non-zero terms.
All burst errors up to the CRC width — any contiguous run of flipped bits no longer than the CRC's bit length (32 for CRC-32) is guaranteed to be caught. This matters because real-world corruption tends to come in bursts.
Good random-error detection — for errors that escape the guaranteed categories, the chance of an undetected error is roughly 1 in 2³² for a 32-bit CRC.

These are exactly the failure patterns of noisy channels and degrading storage, which is why CRC was designed the way it was.

The Fatal Flaw: CRCs Are Linear

Here is the property that makes CRC useless for security: it is linear over GF(2). Concretely:

CRC(a XOR b) = CRC(a) XOR CRC(b) XOR CRC(0)

Because of this linearity, an attacker who wants to flip specific bits in a message can compute exactly which other bits to flip to keep the CRC unchanged. There is no searching, no brute force, no luck involved — it is a closed-form calculation. You can patch a file's contents and trivially restore its original CRC, or craft two different messages with the same CRC on demand.

A cryptographic hash has no such structure; that lack of structure is the whole point. This is why a CRC tells you "the wire didn't corrupt this" but can never tell you "an adversary didn't tamper with this."

Adler-32 and the Family of Non-Crypto Hashes

CRC is not the only fast checksum. Adler-32, also used in zlib (it protects the zlib data stream, while CRC-32 protects the gzip container), works completely differently: it maintains two running sums modulo 65521 — one accumulating the bytes, the other accumulating the running total of the first. It is even faster than CRC-32 because it avoids table lookups, but it is weaker on short messages, where its limited mixing produces poor distribution and easy collisions.

Other fast non-cryptographic hashes live in the same category: xxHash and FNV are excellent for hash-table keys and quick integrity sampling. But like CRC and Adler-32, they are built for speed and distribution, not collision resistance against an adversary. None of them belong in a security context.

When to Use What

A simple rule keeps you out of trouble:

Use CRC32 / Adler-32 for: detecting accidental corruption in transit or storage — network frames, archive integrity bits, disk-block verification, cache validation. Anywhere the only adversary is noise.

Never use them for: integrity against a malicious party, digital signatures, message authentication, password handling, or deduplication where a collision would let one file silently masquerade as another. For all of those, use a real cryptographic hash like SHA-256 or the faster, modern BLAKE3. And note that even historically "cryptographic" hashes can fall — MD5 is now considered broken for collision resistance, which is its own cautionary tale about using the wrong tool.

If your decision hinges on "what if someone wants this to collide?", you need cryptography, not a checksum.

Try It Yourself

You can compute CRC32 and other checksums in your browser with Hash Generator. The computation runs entirely client-side via Rust compiled to WebAssembly — nothing you type or drop in is ever uploaded to a server, which makes it safe even for sensitive files.

Conclusion

CRC32 is a masterpiece of engineering for the job it was built to do: catching the bit flips and bursts that real channels and disks produce, fast, with mathematically guaranteed coverage. Its linearity is a feature for that purpose and a fatal weakness for any other. Reach for CRC32 or Adler-32 to detect accidents, reach for SHA-256 or BLAKE3 to resist attackers, and never confuse the two. When you need to check a value quickly, run it through Hash Generator right in your browser.