Theory — The brief
The Domain Context: Pharmacovigilance (PV)
MedaCore is a (fictional) mid-size pharmaceutical company. Like any drug manufacturer, their Pharmacovigilance (PV) team is legally and ethically responsible for tracking adverse events (AEs)—which are unexpected, bad medical events patients experience while taking a MedaCore drug. These events must be reported to regulators (like the EMA in Europe, or FDA in the US). Crucially, the clock is ticking: submissions for serious events must be made within 15 calendar days, and they must strictly follow complex regulatory message structures (like the ICH E2B(R3) standard).
The Pain Points (Stakeholder Interview)
To understand why we need an ontology, listen to the people dealing with the data:
Head of PV: "Every single quarter, we risk missing our 15-day submission window. Why? Because our IT system can't tell us which assessments are legally reportable until a human manually reads them. The business rule is completely deterministic—a 'serious' event combined with a WHO-UMC causality of 'possible' or higher—but that rule lives buried in a 40-page PDF standard operating procedure (SOP), not in our database."
Data Engineering Lead: "We receive data from three intake channels: doctor forms, a patient mobile app, and partner hospital feeds. They all use slightly different, incompatible vocabularies for the seriousness of an event (e.g., 'severe', 'high severity', 'grade 3'). By the time we write SQL scripts to clean and deduplicate the data, the reporting deadline is practically yesterday."
Regulatory Compliance Officer: "When regulators audit us, they don't just want the final report. They want to see the exact logical breadcrumbs that promoted a raw case to 'reportable' status. A black-box machine learning model or a tangled Python script in a Jupyter notebook is simply not auditable. We need transparent, mathematical certainty."
Translating the Brief into Ontology Requirements
Looking beyond the complaints, what MedaCore is actually asking for is an architecture built on knowledge graphs:
- Semantic Integration: A shared, machine-readable definition of core concepts like serious adverse event and reportable case that standardises all three intake channels.
- Automated Reasoning: An inference capability that classifies new data automatically based on the rules, removing the human bottleneck.
- Data Quality at the Gate: A validation layer that strictly rejects malformed or incomplete intake data BEFORE it pollutes the golden record in the data warehouse.
- Explainability: An audit trail guaranteed by formal logic, letting a regulator see exactly why the system classified a case as reportable.