Overview
PII Classification
Direct / quasi / sensitive — different classes need different controls.
Why it matters
A name alone isn't usually a re-identification risk; a name + DOB + postcode is. Quasi-identifiers are why anonymisation is harder than it looks.
Going deeper
A practical PII classification policy attaches a class label and a control set to every column:
| Class | Examples | Default controls |
|---|---|---|
| Direct PII | email, phone, SSN, passport | Encrypt at rest; tokenise for analytics; access logged + reviewed |
| Quasi-identifier | postcode, DOB, gender, IP, device-id | Generalise (DOB → year, postcode → first 3 chars) before joining wide |
| Sensitive | health, biometrics, religion, sexuality, finances | Smallest-possible audience, purpose-bound, explicit lawful basis |
| Non-PII metadata | product SKU, server hostname | Default access |
The classification is metadata that lives next to the schema (in the catalog, in dbt tags, in column-level lineage) so DSAR pipelines, masking rules and access reviews can be driven from one source of truth instead of N team conventions.