Theory — RML in production
The mapping problem
Vendor SCADA produces JSON like:
{ "turbine_id":"T-042",
"sensor_tag":"GBX-TEMP",
"value": 87.4,
"unit":"C",
"ts":"2026-05-12T08:00:00Z" }
Our ontology speaks Turtle and SOSA. The gap between them is closed by a mapping language. The two dominant choices in 2026:
- RML (RDF Mapping Language) — the W3C-track successor to R2RML; mature for both SQL and document sources.
- SPARQL-Generate / SPARQL-Anything — when the team already lives in SPARQL.
We chose RML because the data team is JSON-native and Carml (Carve-RML) integrates cleanly with our Kafka Streams app — read JSON in, emit RDF out, into a Kafka topic that GraphDB consumes via its kafka connector.
What this lesson does NOT do
It does NOT re-teach Kafka. If brokers, topics, consumer groups and exactly-once feel fuzzy, take the Apache Kafka & Streaming track first — that's its job. Here we focus on the RDF-shaped half of the pipeline.