Aggregation — GROUP BY, COUNT, HAVING

Collapse many rows into per-group summaries.

0/3 done

Theory

Aggregation collapses a bag of solutions into one row per group. The moving parts:

  • GROUP BY ?key — partition rows by a key.
  • Aggregate functionsCOUNT, SUM, AVG, MIN, MAX, SAMPLE, GROUP_CONCAT.
  • HAVING(expr) — like FILTER, but applied after grouping, on aggregate values.
SELECT ?master (COUNT(?ninja) AS ?students) WHERE {
  ?master :teaches ?ninja .
}
GROUP BY ?master
HAVING (COUNT(?ninja) >= 2)
ORDER BY DESC(?students)

The rules that cause 'not a valid aggregate' errors

  1. Every non-aggregated projected variable must appear in GROUP BY. You can't SELECT ?master ?ninja (COUNT...) and group only by ?master; ?ninja must be inside an aggregate or in the GROUP BY.
  2. COUNT(?x) counts bound values; COUNT(DISTINCT ?x) de-dupes; COUNT(*) counts rows. Picking the wrong one is the classic 'my numbers are too high' bug — usually you want DISTINCT.
  3. Filter rows with FILTER (before grouping), filter groups with HAVING (after). Putting an aggregate in a FILTER is illegal.

Worked example — GROUP_CONCAT

Worked example — GROUP_CONCAT for a one-row-per-group list.

SELECT ?master (GROUP_CONCAT(?ninja; SEPARATOR=", ") AS ?roster)
WHERE { ?master :teaches ?ninja . }
GROUP BY ?master

GROUP_CONCAT is how you turn the many ?ninja rows of each master into a single comma-separated string — the SPARQL equivalent of SQL's string_agg. Use SAMPLE(?x) when you need any one value of a grouped variable without caring which.

Reading in progress · 0 of 3 activities done