2 · The lightweight JSON ontology — build it, then justify JSON over OWL/RDF

Build the deployable JSON ontology + pipeline contract, see how each part corrects Swiss German, and understand exactly what JSON trades away vs OWL/RDF.

0/6 done

Theory — the ontology, block by block

Anatomy of the lightweight ontology

Open the artefact below. It is small on purpose, and every block maps to one of the four ontology jobs from Lesson 1:

BlockOntology jobWhat it fixes in Swiss German
concepts.*.canonical_valuesName the conceptsGives extraction a closed target set instead of free text
concepts.*.surface_formsSynonyms / lexical layerCollapses hagelschade / es het ghaglethagelschaden
concepts.IncidentType.broaderTaxonomy (SKOS-style)Lets hagelschaden roll up to elementarschaden for reporting
spoken_number_formsDatatype normalisationMaps zwöitusig2000
PolicyNumber.patternValue constraintRejects a misheard policy number outright
constraints.ClaimShape / SHACL-liteStates what a valid claim must contain

This is genuinely an ontology — concepts, labels, synonyms, a broader hierarchy, and constraints — just serialised as JSON instead of Turtle. The MVP pipeline (second artefact) consumes this: it feeds surface_forms keys as STT hotwords, uses surface_forms as the normaliser dictionary, and enforces constraints before anything reaches the CRM.

Theory — JSON now, OWL/RDF later (and why)

JSON vs OWL/RDF — the honest trade-off

Compare the two artefacts below: the same three claim rules as JSON validation and as SHACL on a graph. They encode identical intent. So why start with JSON?

What JSON-first buys you (why it's right for the MVP):

  • Ship speed. No triplestore, no reasoner, no IRI scheme — a file in the repo.
  • Ownership. Claims-ops can edit surface_forms in a pull request; they would never hand-edit Turtle.
  • Determinism & auditability. A regex either matches or it doesn't; routing is reproducible from (scores, thresholds). Easy to defend to a regulator.
  • Cheap to change. The accuracy loop is 'add dialect rows weekly' — a data edit, not an engineering project.

What you give up (and when it starts to hurt):

  • No reasoning / inference. JSON can't infer hagelschaden ⊑ elementarschaden ⊑ naturEreignis and propagate it; you'd hand-maintain every rollup.
  • No cross-system identity. RDF gives every concept a global IRI so the same policy lines up across claims, fraud and workshop systems; JSON keys are local.
  • Constraints don't travel with the data. SHACL ships with the graph; JSON validation lives in one service and must be re-implemented elsewhere.

The rule of thumb: start where the value is (lexical normalisation + constraints = JSON), graduate to OWL/RDF only when reasoning or cross-system identity becomes the bottleneck (Lesson 4). Choosing OWL on day 1 is paying an architecture tax for capabilities the MVP doesn't use yet.

Reflect

Make the trade-off concrete for your own system.

  • Which of the four ontology jobs (name / canonicalise / taxonomy / constrain) is delivering the most accuracy for you today?
  • Write the one inference you'd want that JSON can't do — that sentence is your future case for moving to OWL.
  • Who owns surface_forms in your org chart? If the answer is 'engineering', your dialect coverage will grow too slowly.

Reading in progress · 0 of 6 activities done