Theory
The most-used biological ontology, up close
The Gene Ontology (GO) is the flagship OBO ontology and one of the most-cited resources in all of science. It is open (CC-BY). Lesson 1 placed it in the OBO Foundry; here we work with its actual structure and the way biologists use it: annotation.
Three orthogonal aspects
GO is really three sub-ontologies, and every term belongs to exactly one:
- Molecular Function (MF) — what a gene product does at the molecular level (e.g. catalytic activity).
- Biological Process (BP) — the larger program it contributes to (e.g. cell division).
- Cellular Component (CC) — where it acts (e.g. mitochondrion).
A DAG, not a tree — with typed edges
GO is a directed acyclic graph: a term can have several parents. The two workhorse relations are is_a (subtype) and part_of (the classic partonomy from Ontology Engineering). Because it's a DAG, a single specific term rolls up to multiple general ancestors along different paths.
The True Path Rule
GO's defining invariant: if a gene is annotated to a term, that annotation must hold for every term up the is_a/part_of path to the root. Annotate a gene to mitochondrial inner membrane and it is, by entailment, also in membrane and cell part. Break that path and the ontology is wrong — it's the biological cousin of transitive subsumption.
Annotation: the GAF file
Genes are linked to GO terms in a GAF (GO Annotation File) — a tab-separated record stating this gene product has this GO term, backed by an evidence code (e.g. IDA experimental, IEA electronically inferred). The evidence code is first-class: a conclusion is only as trustworthy as the evidence behind it.
Use Case Example: A lab runs an experiment yielding 300 over-expressed genes. GO enrichment analysis asks 'which biological processes are over-represented versus chance?' — and it depends on the true-path rule, because counts propagate up the DAG. Without correct annotation propagation, every enrichment p-value would be wrong.