Which citation design best supports auditability?

Per-claim support linking graph edges and source spans.

One generic citation block after the full answer.

Per-claim support linking graph edges and source spans.

No citations if confidence is high.

A hidden internal confidence score only.

Which citation design best supports auditability?

Per-claim support linking specific graph edges and source spans, stored as structured fields.

A single generic citation block after the full answer.

Per-claim support linking specific graph edges and source spans, stored as structured fields.

No citations when model confidence is high.

An internal confidence score the user never sees.

Provenance, citations, and claim support — Semantic Web Academy

Overview

Hard-wiring auditability by mapping LLM assertions back to source text spans.

Why it matters

A sophisticated graph means nothing if users can't verify the final synthesis. By mapping every extracted node and edge back to its original chunk and text span, you create a direct paper trail. This ensures the generation layer can ground every single claim with bulletproof inline citations, transforming a black-box model into a trusted, auditable system.

How it actually works

In a regulated or high-trust setting, an answer is only as good as its evidence trail. Professional GraphRAG attaches support to each claim, not one citation block bolted onto the end.

{
  "answer": "Alice approved the exception under Policy P-12.",
  "claims": [
    {"text": "Alice approved the exception",
     "support": {"graph_edges": ["Alice-approved-Exception42"], "chunks": ["doc://approvals/42#L10-18"]}},
    {"text": "Exception42 is governed by P-12",
     "support": {"graph_edges": ["Exception42-governedBy-P12"], "chunks": ["doc://policy/p12#L3-11"]}}
  ]
}

Why per-claim, not per-answer. A single citation block lets one unsupported sentence hide among three supported ones. Per-claim support makes every sentence independently verifiable — and makes the unsupported claim stick out, so a critic node (or a human) can reject exactly it instead of the whole answer.

Store it as structured fields, not markdown. claim_id, source_id, edge_path, timestamp are queryable; a markdown footnote string is not. Structured provenance lets you audit ('show me every answer that cited the now-retracted doc'), prefer the newest valid evidence, and prove compliance.

Pair it with a refusal contract. If a claim has no support, the system must say so or refuse — provenance is what makes 'I don't have evidence for that' a deterministic behaviour rather than a hope.

Analogy

Per-claim citations are receipts for each line item, not one lump total at the bottom of the bill. With line-item receipts an auditor can challenge a single charge; with only a total, one padded charge hides inside a plausible sum.

Pitfalls & how to avoid them

One citation block per answer. Symptom: unsupported sentence hides. Fix: support per claim.
Citations as markdown text. Symptom: not auditable/queryable. Fix: structured fields.
No refusal when support is missing. Symptom: confident gaps. Fix: explicit 'insufficient evidence' path.
Stale evidence preferred. Fix: attach timestamps; prefer newest valid source.

Apply it to your system

Audit one real answer your system produced.

›Can every sentence be traced to a specific edge or source span today?
›Which provenance fields are missing for a real audit?
›What should the system do when a claim has no supporting evidence?

Reading in progress · 0 of 4 activities done