Anonymisation Techniques

Masking, generalisation, k-anonymity, differential privacy — none are silver bullets.

0/2 done

Overview

Anonymisation Techniques

Masking, generalisation, k-anonymity, differential privacy — none are silver bullets.

Why it matters

Each technique trades a different axis of utility for privacy. DP gives a tunable, provable bound — at a real utility cost.

Going deeper

A rough decision table for the four techniques:

TechniqueBest forBreaks when
Masking / tokenisationOperational data still keyed by idThe token vault leaks
GeneralisationLookup-style analytics; coarse dashboardsJoined with a richer external dataset
k-anonymity (+ l-diversity)Microdata releaseSensitive attribute is homogeneous in a cell
Differential privacyPublic statistics, query interfacesε chosen too loosely — or queries are unbounded so the privacy budget burns out

In production you typically layer these: tokenise direct identifiers, generalise the quasi-identifiers, and (for any public release or wide-audience dataset) wrap aggregate metrics in a DP query interface with an enforced ε budget per analyst.

Analogy

Anonymisation is noise-cancelling for personal data — and like noise cancelling, every dB of privacy gain costs some signal.

  • Masking is putting tape over the name on a form: cheap, but anyone holding the original can lift the tape.
  • Generalisation is blurring the photo: zoom out from exact DOB to year, from postcode to district. The face is gone but the silhouette remains.
  • k-anonymity is hiding in a crowd of at least k people who all look the same on the quasi-identifiers — strong against single-record lookups, weak against homogeneity attacks (‘everyone in this cell has the same diagnosis’).
  • Differential privacy is adding calibrated noise to every released statistic. The mathematician's gift: a tunable, provable bound on what any single person's participation can reveal — paid for in genuine accuracy loss.

There is no free lunch. The honest deliverable for any anonymisation request is a privacy bound + a utility budget the consuming team has explicitly signed off on.

Make it stick

Use the prompts below to anchor anonymisation techniques to something you actually own.

  • Pick a dataset you've called 'anonymised' in the past. Which class of attack (linkage, homogeneity, background-knowledge) is it actually robust against — and which is it not?
  • For a candidate DP release in your org, what ε would the legal team accept and what utility loss would the analytics team accept? The intersection is your real operating point.
  • Where could a *k-anonymous* dataset in your warehouse silently become re-identifying after a future column is added upstream?

Reading in progress · 0 of 2 activities done