What does differential privacy add that k-anonymity does not?

A mathematically provable bound on individual re-identification risk, tunable via the ε parameter

Your team wants to publish customer-level analytics to a partner. Which composition gives the strongest privacy posture?

Tokenise direct IDs, generalise quasi-identifiers to satisfy k-anonymity, *and* serve aggregates through a differentially-private query API

Encrypt the file with a shared key

Anonymisation Techniques

Masking, generalisation, k-anonymity, differential privacy — none are silver bullets.

0/2 done

Overview

Anonymisation Techniques

Masking, generalisation, k-anonymity, differential privacy — none are silver bullets.

Why it matters

Each technique trades a different axis of utility for privacy. DP gives a tunable, provable bound — at a real utility cost.

Going deeper

A rough decision table for the four techniques:

Technique	Best for	Breaks when
Masking / tokenisation	Operational data still keyed by id	The token vault leaks
Generalisation	Lookup-style analytics; coarse dashboards	Joined with a richer external dataset
k-anonymity (+ l-diversity)	Microdata release	Sensitive attribute is homogeneous in a cell
Differential privacy	Public statistics, query interfaces	ε chosen too loosely — or queries are unbounded so the privacy budget burns out

In production you typically layer these: tokenise direct identifiers, generalise the quasi-identifiers, and (for any public release or wide-audience dataset) wrap aggregate metrics in a DP query interface with an enforced ε budget per analyst.

Analogy

Anonymisation is noise-cancelling for personal data — and like noise cancelling, every dB of privacy gain costs some signal.

Masking is putting tape over the name on a form: cheap, but anyone holding the original can lift the tape.
Generalisation is blurring the photo: zoom out from exact DOB to year, from postcode to district. The face is gone but the silhouette remains.
k-anonymity is hiding in a crowd of at least k people who all look the same on the quasi-identifiers — strong against single-record lookups, weak against homogeneity attacks (‘everyone in this cell has the same diagnosis’).
Differential privacy is adding calibrated noise to every released statistic. The mathematician's gift: a tunable, provable bound on what any single person's participation can reveal — paid for in genuine accuracy loss.

There is no free lunch. The honest deliverable for any anonymisation request is a privacy bound + a utility budget the consuming team has explicitly signed off on.

Make it stick

Use the prompts below to anchor anonymisation techniques to something you actually own.

›Pick a dataset you've called 'anonymised' in the past. Which class of attack (linkage, homogeneity, background-knowledge) is it actually robust against — and which is it not?
›For a candidate DP release in your org, what ε would the legal team accept and what utility loss would the analytics team accept? The intersection is your real operating point.
›Where could a *k-anonymous* dataset in your warehouse silently become re-identifying after a future column is added upstream?

Reading in progress · 0 of 2 activities done