Canonicalisation & Signed Graphs

Hashing RDF is harder than it looks — and signing it is the point.

0/2 done

Theory

Two semantically identical graphs can serialise very differently (blank-node labels, whitespace, prefix choices). So you can't just sha256 the Turtle string. URDNA2015 is the W3C canonicalisation algorithm: it produces a deterministic N-Quads form regardless of how you wrote the graph.

Once canonicalised you can hash and sign. Consumers re-canonicalise and verify.

Visualization

Query Pattern match Bindings Result

Canonicalise → hash → sign → publish. Consumers re-canonicalise, re-hash, verify the signature.

The whole pipeline only works because canonicalisation is deterministic: every party that re-canonicalises the same logical graph produces the identical byte sequence, no matter how it was originally serialised. Skip canonicalisation and verification fails the first time someone reformats the Turtle.

Reflect

Where in your pipeline would a signed graph matter most? Think: data marketplaces, training-set provenance, regulator audits, AI agent claims.

The unifying pattern: somebody downstream needs to act on the data and also defend that decision later. Signing turns the graph into evidence — ACLs only answer who can read this now?, signatures answer who said this, and has it been tampered with since?

  • Name one workflow where a downstream consumer needs to *prove* a graph came from you.
  • What attack does signing prevent that ACLs alone don't?

Reading in progress · 0 of 2 activities done