Why isn't `sha256(turtle_text)` a valid graph fingerprint?

Whitespace, prefixes and blank-node labels can vary while the graph stays identical.

Canonicalisation & Signed Graphs

Hashing RDF is harder than it looks — and signing it is the point.

0/3 done

Theory

Two semantically identical graphs can serialise very differently (blank-node labels, whitespace, prefix choices). So you can't just sha256 the Turtle string. URDNA2015 is the W3C canonicalisation algorithm: it produces a deterministic N-Quads form regardless of how you wrote the graph.

Once canonicalised you can hash and sign. Consumers re-canonicalise and verify.

Visualization

Canonicalise → hash → sign → publish. Consumers re-canonicalise, re-hash, verify the signature.

The whole pipeline only works because canonicalisation is deterministic: every party that re-canonicalises the same logical graph produces the identical byte sequence, no matter how it was originally serialised. Skip canonicalisation and verification fails the first time someone reformats the Turtle.

Worked example — canonicalise before you hash

Worked example — why two identical graphs hash differently.

These two files assert the exact same logical graph, yet a naive sha256 of the text gives two different digests:

# File A
:alice :knows _:b0 . _:b0 :name "Bob" .

# File B — different blank-node label, extra spacing, same meaning
:alice   :knows  _:x .
_:x :name "Bob" .

Canonicalisation (URDNA2015 / RDF Dataset Canonicalization) relabels blank nodes deterministically and emits sorted N-Quads, so both files reduce to the identical canonical byte string — and therefore the identical hash:

<...alice> <...knows> _:c14n0 .
_:c14n0 <...name> "Bob" .

Only now is it safe to hash and sign.

Theory

Going deeper — what a signature does and does not protect

Signing a canonicalised graph proves two things: integrity (the triples haven't changed since signing) and authenticity (a specific key-holder signed them). It does not prove the triples are true, fresh, or authorised — those are separate concerns:

Freshness needs a signed timestamp or nonce; otherwise an old, validly signed graph can be replayed.
Truth is not cryptographic — a key-holder can sign nonsense. Pair signatures with PROV-O so a verifier knows who vouched for the claim.
Revocation matters: signatures outlive keys, so verifiers need a way to check the signing key wasn't later compromised.

This is the foundation under Verifiable Credentials, where a signed RDF graph is the credential.

Reflect

Where in your pipeline would a signed graph matter most? Think: data marketplaces, training-set provenance, regulator audits, AI agent claims.

The unifying pattern: somebody downstream needs to act on the data and also defend that decision later. Signing turns the graph into evidence — ACLs only answer who can read this now?, signatures answer who said this, and has it been tampered with since?

›Name one workflow where a downstream consumer needs to *prove* a graph came from you.
›What attack does signing prevent that ACLs alone don't?

Reading in progress · 0 of 3 activities done