Theory
Three joins, two failure modes
Broadcast (BHJ) — small side fits in memory; Spark ships it to every executor and joins locally. Fast. Default trigger is spark.sql.autoBroadcastJoinThreshold (10MB default — often worth raising).
Sort-Merge (SMJ) — both sides large; both are shuffled, sorted, then merged. The workhorse for big joins.
Shuffle Hash (SHJ) — middle ground, rarely chosen manually.
The two ways big joins blow up:
- Skew — one join key dominates. Fix with AQE skew join (Spark 3.x auto-splits skewed partitions) or manual key salting.
- Cartesian explosion — a join condition is missing or non-equi; row count multiplies. Always inspect
explain()forCartesianProduct.