Joins

The source code discussed in this chapter can be found in the logical-plan and physical-plan modules of the KQuery project.

Joins combine rows from two tables based on a condition. They are fundamental to relational databases and often the most expensive operation in a query. This chapter covers join types and algorithms, with a focus on hash joins.

Join Types

Inner Join

An inner join returns rows where the join condition matches in both tables:

SELECT e.name, d.dept_name
FROM employees e
INNER JOIN departments d ON e.dept_id = d.id

If an employee has no matching department, that employee is excluded from the results. If a department has no employees, it is also excluded.

Left Outer Join

A left outer join returns all rows from the left table, with matching rows from the right table where available:

SELECT e.name, d.dept_name
FROM employees e
LEFT JOIN departments d ON e.dept_id = d.id

Employees without a matching department still appear in the results, with NULL for dept_name.

Right Outer Join

A right outer join is the mirror of left join: all rows from the right table, with matches from the left:

SELECT e.name, d.dept_name
FROM employees e
RIGHT JOIN departments d ON e.dept_id = d.id

Departments without employees appear with NULL for employee columns.

Full Outer Join

A full outer join returns all rows from both tables, matching where possible:

SELECT e.name, d.dept_name
FROM employees e
FULL OUTER JOIN departments d ON e.dept_id = d.id

Both unmatched employees and unmatched departments appear, with NULLs for missing columns.

Cross Join

A cross join returns every combination of rows from both tables (the Cartesian product):

SELECT e.name, d.dept_name
FROM employees e
CROSS JOIN departments d

If employees has 100 rows and departments has 10 rows, the result has 1,000 rows. Cross joins are rarely useful on their own but sometimes appear in query plans as intermediate steps.

Semi Join

A semi join returns rows from the left table where at least one match exists in the right table, but does not include columns from the right table:

SELECT e.name
FROM employees e
WHERE EXISTS (SELECT 1 FROM departments d WHERE d.id = e.dept_id)

Semi joins are not directly expressible in standard SQL syntax but arise from EXISTS subqueries. The Subqueries chapter covers this in detail.

Anti Join

An anti join returns rows from the left table where no match exists in the right table:

SELECT e.name
FROM employees e
WHERE NOT EXISTS (SELECT 1 FROM departments d WHERE d.id = e.dept_id)

Anti joins arise from NOT EXISTS or NOT IN subqueries.

Join Conditions

Equi-joins

Most joins use equality conditions:

ON employees.dept_id = departments.id

These are called equi-joins. Query engines optimize heavily for equi-joins because hash-based algorithms work well with equality comparisons.

Non-equi Joins

Some joins use inequality or range conditions:

SELECT *
FROM events e
JOIN time_ranges t ON e.timestamp BETWEEN t.start_time AND t.end_time

Non-equi joins cannot use hash-based algorithms and typically require nested loop or specialized range join implementations.

Join Algorithms

The choice of join algorithm dramatically affects performance. The three main approaches are nested loop join, sort-merge join, and hash join.

Nested Loop Join

The simplest algorithm: for each row in the left table, scan the entire right table looking for matches.

for each row L in left_table:
    for each row R in right_table:
        if matches(L, R):
            emit(L, R)

Time complexity: O(n × m) where n and m are the table sizes.

Nested loop join is simple but slow for large tables. It is useful when:

One table is very small
An index exists on the join column of the inner table
The join condition is not an equality (non-equi join)

With an index, the inner loop becomes an index lookup rather than a full scan, dramatically improving performance.

Sort-Merge Join

Sort both tables by the join key, then merge them:

sort left_table by join_key
sort right_table by join_key

while both tables have rows:
    if left.key == right.key:
        emit all matching combinations
        advance both
    else if left.key < right.key:
        advance left
    else:
        advance right

Time complexity: O(n log n + m log m) for sorting, plus O(n + m) for merging.

Sort-merge join is efficient when:

Data is already sorted by the join key
The result of the join needs to be sorted anyway
Memory is limited (external sort can spill to disk)

Hash Join

Build a hash table from one table, then probe it with the other:

// Build phase
hash_table = {}
for each row R in build_table:
    key = R.join_column
    hash_table[key].append(R)

// Probe phase
for each row L in probe_table:
    key = L.join_column
    for each match in hash_table[key]:
        emit(L, match)

Time complexity: O(n + m) assuming good hash distribution.

Hash join is usually the fastest algorithm for equi-joins when:

The smaller table fits in memory
The join condition uses equality

Hash Join in Detail

Hash join is the workhorse of modern query engines. Let us examine it more closely.

Choosing the Build Side

The build side should be the smaller table. Building a hash table from 1,000 rows and probing with 1,000,000 rows is much faster than the reverse.

The query optimizer estimates table sizes and chooses the build side. With statistics, it can account for filters that reduce table sizes:

SELECT *
FROM large_table l
JOIN small_table s ON l.id = s.id
WHERE s.category = 'active'

Even if small_table has more rows than large_table, after filtering it might be smaller.

Hash Table Structure

For each unique join key, the hash table stores all rows from the build side with that key. The simplest structure is a hash map from key to list of rows:

val hashTable = HashMap<Any, MutableList<RecordBatch>>()

In practice, implementations optimize memory layout for cache efficiency.

Handling Hash Collisions

When different keys hash to the same bucket, we must compare actual key values during the probe phase:

fun probe(key: Any): List<Row> {
    val bucket = hashTable[key.hashCode()]
    return bucket.filter { it.joinKey == key }
}

Good hash functions minimize collisions, but the probe must always verify equality.

KQuery’s Hash Join Implementation

KQuery implements hash join in HashJoinExec. The implementation supports inner, left, and right joins:

class HashJoinExec(
    val left: PhysicalPlan,
    val right: PhysicalPlan,
    val joinType: JoinType,
    val leftKeys: List<Int>,
    val rightKeys: List<Int>,
    val schema: Schema,
    val rightColumnsToExclude: Set<Int>
) : PhysicalPlan {

    override fun execute(): Sequence<RecordBatch> {
        // Build phase: load all right-side rows into a hash table
        val hashTable = HashMap<List<Any?>, MutableList<List<Any?>>>()

        right.execute().forEach { batch ->
            for (rowIndex in 0 until batch.rowCount()) {
                val key = rightKeys.map { keyIndex ->
                    normalizeValue(batch.field(keyIndex).getValue(rowIndex))
                }
                val row = (0 until batch.columnCount()).map {
                    batch.field(it).getValue(rowIndex)
                }
                hashTable.getOrPut(key) { mutableListOf() }.add(row)
            }
        }

        // Probe phase: iterate through left side and find matches
        return sequence {
            left.execute().forEach { leftBatch ->
                val outputRows = mutableListOf<List<Any?>>()

                for (leftRowIndex in 0 until leftBatch.rowCount()) {
                    val probeKey = leftKeys.map { keyIndex ->
                        normalizeValue(leftBatch.field(keyIndex).getValue(leftRowIndex))
                    }
                    val leftRow = (0 until leftBatch.columnCount()).map {
                        leftBatch.field(it).getValue(leftRowIndex)
                    }
                    val matchedRows = hashTable[probeKey]

                    when (joinType) {
                        JoinType.Inner -> {
                            if (matchedRows != null) {
                                for (rightRow in matchedRows) {
                                    outputRows.add(combineRows(leftRow, rightRow))
                                }
                            }
                        }
                        JoinType.Left -> {
                            if (matchedRows != null) {
                                for (rightRow in matchedRows) {
                                    outputRows.add(combineRows(leftRow, rightRow))
                                }
                            } else {
                                // No match: include left row with nulls for right columns
                                val nullRightRow = List(rightSchema.fields.size) { null }
                                outputRows.add(combineRows(leftRow, nullRightRow))
                            }
                        }
                        // Right join handled after probe phase...
                    }
                }

                if (outputRows.isNotEmpty()) {
                    yield(createBatch(outputRows))
                }
            }
        }
    }
}

Key aspects of this implementation:

The build phase loads the entire right table into a hash table keyed by the join columns. Each key maps to a list of rows (to handle duplicate keys).

The probe phase iterates through the left table, looking up each row’s key in the hash table. For inner joins, rows without matches are skipped. For left joins, unmatched rows are emitted with NULLs for the right columns.

The rightColumnsToExclude parameter handles the common case where join keys have the same name on both sides. Without this, the output would have duplicate columns.

Outer Joins

KQuery’s implementation handles left and right outer joins:

For left outer join, when a probe row has no match in the hash table, we emit the left row combined with NULLs for all right columns. This happens inline during the probe phase.

For right outer join, we need to track which build (right) rows were matched. After the probe phase completes, we emit unmatched right rows with NULLs for the left columns. This requires either a second pass or tracking matched keys during the probe.

Full outer join combines both approaches: emit unmatched left rows during probing, then emit unmatched right rows after.

Memory Considerations

The build side must fit in memory for a simple hash join. For large tables, query engines use techniques like:

Grace hash join: Partition both tables by hash value, then join matching partitions. Each partition is smaller and more likely to fit in memory.

Hybrid hash join: Keep as much of the build side in memory as possible, spill the rest to disk, then process spilled partitions separately.

Adaptive execution: Start with hash join, switch to sort-merge if memory pressure is detected.

Join Ordering

For queries joining multiple tables, the order matters enormously:

SELECT *
FROM orders o
JOIN customers c ON o.customer_id = c.id
JOIN products p ON o.product_id = p.id

This could execute as:

(orders JOIN customers) JOIN products
(orders JOIN products) JOIN customers
(customers JOIN products) JOIN orders (usually bad)

The optimizer evaluates costs and chooses the best order. Generally, joins that produce smaller intermediate results should happen first.

Bloom Filters

A Bloom filter is a probabilistic data structure that can quickly test whether an element might be in a set. Query engines use Bloom filters to speed up joins:

Build a Bloom filter from the build side keys
Before probing, check if the probe key might exist
Skip rows that definitely have no match

Bloom filters have false positives (might say “yes” when the answer is “no”) but no false negatives. This means some unnecessary probes happen, but no matches are missed.

For selective joins where most probe rows have no match, Bloom filters significantly reduce work.

Summary

Joins are complex and performance-critical. Key points:

Hash join is typically fastest for equi-joins
The build side should be the smaller table
Join ordering affects performance dramatically
Memory constraints may require spilling to disk
Query optimizers use statistics to make good choices

KQuery implements hash join for inner, left, and right joins. The implementation demonstrates the core algorithm: build a hash table from one side, probe with the other. Production systems add optimizations like spilling to disk and Bloom filters, but the fundamental approach remains the same.

This book is also available for purchase in ePub, MOBI, and PDF format from https://leanpub.com/how-query-engines-work

Keyboard shortcuts

How Query Engines Work