4 · RML & Streaming — Kafka events become triples

Where the Kafka track and the ontology track meet: declarative RML mapping from JSON events to RDF, lifted into the triplestore.

0/1 done

Theory — RML in production

The mapping problem

Vendor SCADA produces JSON like:

{ "turbine_id":"T-042",
  "sensor_tag":"GBX-TEMP",
  "value": 87.4,
  "unit":"C",
  "ts":"2026-05-12T08:00:00Z" }

Our ontology speaks Turtle and SOSA. The gap between them is closed by a mapping language. The two dominant choices in 2026:

  • RML (RDF Mapping Language) — the W3C-track successor to R2RML; mature for both SQL and document sources.
  • SPARQL-Generate / SPARQL-Anything — when the team already lives in SPARQL.

We chose RML because the data team is JSON-native and Carml (Carve-RML) integrates cleanly with our Kafka Streams app — read JSON in, emit RDF out, into a Kafka topic that GraphDB consumes via its kafka connector.

What this lesson does NOT do

It does NOT re-teach Kafka. If brokers, topics, consumer groups and exactly-once feel fuzzy, take the Apache Kafka & Streaming track first — that's its job. Here we focus on the RDF-shaped half of the pipeline.

Reflect

The mapping file is where most production ontology projects spend the majority of their day-2 time. Treat it as first-class code, not a deployment afterthought.

  • Where do you handle vendor-specific tag aliases — in the mapping or in a SKOS code list?
  • How would you back-fill historical observations if you change the IRI template?

Reading in progress · 0 of 1 activity done