Gene Ontology — Structure & Annotation (GAF)

GO's three aspects, the is_a/part_of DAG, the true-path rule, and annotating genes with evidence.

0/4 done

Theory

The most-used biological ontology, up close

The Gene Ontology (GO) is the flagship OBO ontology and one of the most-cited resources in all of science. It is open (CC-BY). Lesson 1 placed it in the OBO Foundry; here we work with its actual structure and the way biologists use it: annotation.

Three orthogonal aspects

GO is really three sub-ontologies, and every term belongs to exactly one:

  • Molecular Function (MF) — what a gene product does at the molecular level (e.g. catalytic activity).
  • Biological Process (BP) — the larger program it contributes to (e.g. cell division).
  • Cellular Component (CC)where it acts (e.g. mitochondrion).

A DAG, not a tree — with typed edges

GO is a directed acyclic graph: a term can have several parents. The two workhorse relations are is_a (subtype) and part_of (the classic partonomy from Ontology Engineering). Because it's a DAG, a single specific term rolls up to multiple general ancestors along different paths.

The True Path Rule

GO's defining invariant: if a gene is annotated to a term, that annotation must hold for every term up the is_a/part_of path to the root. Annotate a gene to mitochondrial inner membrane and it is, by entailment, also in membrane and cell part. Break that path and the ontology is wrong — it's the biological cousin of transitive subsumption.

Annotation: the GAF file

Genes are linked to GO terms in a GAF (GO Annotation File) — a tab-separated record stating this gene product has this GO term, backed by an evidence code (e.g. IDA experimental, IEA electronically inferred). The evidence code is first-class: a conclusion is only as trustworthy as the evidence behind it.

Use Case Example: A lab runs an experiment yielding 300 over-expressed genes. GO enrichment analysis asks 'which biological processes are over-represented versus chance?' — and it depends on the true-path rule, because counts propagate up the DAG. Without correct annotation propagation, every enrichment p-value would be wrong.

Analogy

The true-path rule is org-chart escalation. If a task is assigned to a junior on the database team, it's implicitly the responsibility of that team's lead, the engineering director and the CTO — accountability propagates up every reporting line. GO annotations escalate the same way: tag a gene with a very specific function and it automatically counts for every broader category above it. Enrichment analysis is just counting how busy each level of the org chart really is.

Annotation propagates up every path

Click a node to focus its neighbourhood · drag to pan · scroll to zoom

A GO DAG fragment with the true-path rule

A specific component term rolls up through part_of and is_a to broader ancestors; an annotation at the leaf entails all of them.

Reflect

GO works because two simple relations (is_a, part_of) plus one invariant (the true-path rule) make millions of annotations computable. Evidence codes add the humility: every inference carries how much you should believe it.

  • Where would an explicit 'evidence code' on your data change how much downstream consumers trust it?
  • Does any roll-up report you maintain quietly assume a true-path-style propagation that isn't enforced?

Reading in progress · 0 of 4 activities done