tutorial · 2026-06-13 · duckbert · 15 min read

Looking Through the WINDOW: OVER, Frames, and Ranking in DuckDB

How does each duck stack up against its team, its rivals, and its own past? Let's learn window functions to answer them — without collapsing a single row.

Pondville's weekend duck league is three game days into its season — four ducks, two teams, twelve scores so far. Ask SQL for the team totals with GROUP BY team and the engine collapses everything to one row per team. Eider's individual 92 is gone, folded into a single Mallards number. You can no longer see her score and her team's average side by side, because Eider as a row no longer exists.

Window functions fix that. Same aggregates, but every input row stays put and gets its own view of the surrounding rows — its partition, its frame, the rows just before or after it. The seam where all of this happens is a single SQL clause: OVER.

The block below seeds Pondville's league table — Eider, Duckbert, Pippin, and Mira; Mallards and Pintails; three game days. Every interactive block after it runs against that table. Edit any block, change a column or a PARTITION BY, hit Run, and watch the shape of the result change.

bootstrap.sql

CREATE OR REPLACE TABLE players(name, team, score, played_on) AS
SELECT * FROM (VALUES
  ('Eider',    'Mallards', 92, DATE '2026-04-01'),
  ('Duckbert', 'Mallards', 88, DATE '2026-04-01'),
  ('Pippin',   'Pintails', 79, DATE '2026-04-01'),
  ('Mira',     'Pintails', 95, DATE '2026-04-01'),
  ('Eider',    'Mallards', 88, DATE '2026-04-08'),
  ('Duckbert', 'Mallards', 94, DATE '2026-04-08'),
  ('Pippin',   'Pintails', 90, DATE '2026-04-08'),
  ('Mira',     'Pintails', 88, DATE '2026-04-08'),
  ('Eider',    'Mallards', 90, DATE '2026-04-15'),
  ('Duckbert', 'Mallards', 91, DATE '2026-04-15'),
  ('Pippin',   'Pintails', 85, DATE '2026-04-15'),
  ('Mira',     'Pintails', 99, DATE '2026-04-15')
) t(name, team, score, played_on);

GROUP BY collapses, OVER preserves

Want the total across all twelve rows? SELECT SUM(score) FROM players gives you 1079 — one row, one number. With OVER () you get the same 1079, but attached to every row. Each player's individual score sits right next to the global figure.

total-via-over.sql

SELECT name, team, score, played_on, SUM(score) OVER () AS total_score
FROM players
ORDER BY team, name, played_on;

That trailing empty () is the entire trick. It tells the engine: compute this aggregate over a window of rows. When the window is empty parentheses, the window is "all rows in the result set."

Because that window is the whole table, empty OVER () is the cleanest way to express shares and percentages — divide each row's value by the window total and you have its fraction of the whole.

pct-of-total.sql

SELECT name, played_on, score,
       SUM(score) OVER () AS total,
       ROUND(score::DOUBLE / SUM(score) OVER () * 100, 1) AS pct_of_total
FROM players
ORDER BY played_on, team, name;

Each row now knows what fraction of the global total it represents — the window SUM(score) OVER () ran once and broadcast to every row.

PARTITION BY — windows by group

Team averages next to each duck's individual score — that's the PARTITION BY question. PARTITION BY carves the window into independent buckets — one per partition — and re-runs the aggregate inside each bucket. To any given row, the window is now "all the rows that share my partition key."

PARTITION BY is one of three things you compose inside OVER () — ORDER BY and a frame arrive section by section (the frame itself carries an EXCLUDE modifier, left to the further-reading links). The WINDOW clause is separate: a query-level clause that names a spec once so OVER can reuse it by alias — that one comes later too.

team-average.sql

SELECT name, team, played_on, score,
       ROUND(AVG(score) OVER (PARTITION BY team), 1) AS team_avg
FROM players
ORDER BY team, played_on, name;

Every Mallards row shows 90.5, every Pintails row shows 89.3 — two team averages, copied back to each team's ducks. Same row count as the input; two distinct windows underneath. The key can be multi-column, too: PARTITION BY team, played_on would carve a separate window for each team on each game day, not just each team.

Flip the toggle below and watch the two team windows collapse into one global 89.9.

ORDER BY inside OVER — the row's frame of reference

How many points has each team racked up by game three, not just in total? Add ORDER BY to the OVER and each row sees itself plus every row before it: that's a running total. Cumulative counts, "points so far this season," anything that needs a row to feel the weight of its history works the same way.

running-total.sql

SELECT played_on, team, name, score,
       SUM(score) OVER (PARTITION BY team ORDER BY played_on, name) AS running_team_total
FROM players
ORDER BY team, played_on, name;

Per team, the running sum climbs across the season. Mallards stack 88 → 180 → 274 → 362 → 453 → 543; Pintails 95 → 174 → 262 → 352 → 451 → 536. No GROUP BY, no JOIN, no self-correlated subquery — just one window with an ORDER BY inside it.

Hover any step in the chart below to see the rows in its window.

Frames: ROWS and RANGE — sliding windows

We just touched a frame implicitly — that default RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW. Here it is on purpose. A frame is the slice of the partition each row actually sees, and the explicit form ROWS BETWEEN N PRECEDING AND CURRENT ROW is what gives you a sliding window — moving averages, last-k counts, rolling sums.

DuckDB supports three frame units: ROWS (count rows), RANGE (the unit behind that default frame), and GROUPS (count peer-tied buckets). ROWS is the workhorse; RANGE shines when the order key is a date or numeric value and you want a value-based window; GROUPS matters when ties in the order key should count as a single step.

moving-average.sql

SELECT name, played_on, score,
       ROUND(AVG(score) OVER (
         PARTITION BY name ORDER BY played_on
         ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
       ), 1) AS moving_avg
FROM players
ORDER BY name, played_on;

Each row's moving_avg is the average of this game and the previous one, scoped to that duck — a coach's two-game read on form. The first game per duck sees only itself, so its moving average equals its raw score.

Swap ROWS for RANGE and the frame counts by value distance in the order key instead of by row count. With dates, that means time-based windows — "last 7 days" regardless of how many rows fell in that span:

range-7-days.sql

SELECT name, played_on, score,
       SUM(score) OVER (
         PARTITION BY name ORDER BY played_on
         RANGE BETWEEN INTERVAL '7' DAY PRECEDING AND CURRENT ROW
       ) AS last_7_days
FROM players
ORDER BY name, played_on;

Pondville plays exactly seven days apart, so last_7_days happens to equal "this game + previous game" — same shape as the ROWS BETWEEN 1 PRECEDING average above. The payoff is when gaps vary: with RANGE, an absent week stops contributing automatically; with ROWS, you'd silently pull in a game from a month ago to fill the frame.

To get a feel for how N changes the shape of the frame, here's a richer slice — a synthetic 14-game season for Eider — with a slider that re-computes the rolling average live. Slide N down to 0 and the average collapses to the game itself; then push it up and see how far a frame can reach.

The ranking family — ROW_NUMBER vs RANK vs DENSE_RANK

Frames slice a window; ranking labels the rows inside it. Same OVER shape, different question. Pondville needs a season leaderboard. When you order rows and assign positions, three nearly-identical functions disagree on how to handle ties. Two of our scores land on 90 and three more on 88, so the disagreement flares up twice as you read down the leaderboard. The block below ranks every score across the whole table so you can see all three behaviors side by side.

rankings.sql

SELECT name, team, played_on, score,
       ROW_NUMBER() OVER (ORDER BY score DESC, name) AS row_num,
       RANK()       OVER (ORDER BY score DESC) AS rnk,
       DENSE_RANK() OVER (ORDER BY score DESC) AS dense_rnk
FROM players
ORDER BY score DESC, name;

ROW_NUMBER never ties — it just keeps counting. We add , name to its OVER clause as a tiebreaker so its values are deterministic across runs; without it, the three rows tied at 88 could land in any order.

RANK and DENSE_RANK are different: their values depend only on the score key, so they're deterministic without a tiebreaker. RANK gives tied rows the same rank, then skips ahead. DENSE_RANK gives them the same rank, then doesn't skip.

Pick the function whose definition of "rank" matches the question — who finished in the top three is DENSE_RANK (no gaps); what position did they finish in is RANK (gaps count). Tap a function below to spotlight how each one walks the tied scores.

LAG and LEAD — peek across rows

Ranking labels rows by position; LAG and LEAD go fetch values from them. How does Eider's game-two 88 compare to her opening 92? That's a job for the previous row: LAG reaches backwards by N rows inside the same window; LEAD reaches forwards. Both return NULL when there's nothing to reach — at the start or end of a partition. They're the no-self-join way to compare a row to its neighbors.

lag-lead.sql

SELECT name, played_on, score,
       LAG(score)  OVER (PARTITION BY name ORDER BY played_on) AS prev_score,
       LEAD(score) OVER (PARTITION BY name ORDER BY played_on) AS next_score,
       score - LAG(score) OVER (PARTITION BY name ORDER BY played_on) AS delta_from_prev
FROM players
ORDER BY name, played_on;

The delta_from_prev column is the single most useful pattern you'll get out of LAG: subtract the previous row from the current row and you have the per-step change for free. The NULL deltas are the first game of each player's season — there's nothing to subtract from.

Got NULL gaps in a column? Append IGNORE NULLS and LAG / LEAD / FIRST_VALUE / LAST_VALUE / NTH_VALUE hop over them to the nearest non-null value in their direction instead of returning NULL. Here's the contrast on a synthetic Eider season with two missed games:

ignore-nulls.sql

WITH eider_season(played_on, score) AS (
  VALUES
    (DATE '2026-04-01', 92),
    (DATE '2026-04-08', NULL),
    (DATE '2026-04-15', 90),
    (DATE '2026-04-22', NULL),
    (DATE '2026-04-29', 95)
)
SELECT played_on, score,
       LAG(score)              OVER (ORDER BY played_on) AS lag_default,
       LAG(score IGNORE NULLS) OVER (ORDER BY played_on) AS lag_ignore_nulls
FROM eider_season
ORDER BY played_on;

The contrast lands on 04-15: lag_default returns NULL because the previous row is NULL; lag_ignore_nulls skips the gap and hands back Eider's 92 from 04-01. Why does plain LAG default to returning NULL? Because returning something else would lie about what's there. IGNORE NULLS makes the lie opt-in, which is exactly the contract you want when you've already decided gaps are noise.

Click any game card below to make it the current row. The partition is per-player, so LAG and LEAD never reach across a duck's season boundary — no borrowing from another bird.

Three SQL ergonomics: WINDOW, QUALIFY, FILTER

Top two per team used to mean wrapping the whole query in a subquery just to filter on RANK(). Three clauses retire that kind of boilerplate; each kills a different chore.

WINDOW lets you name a window once and reference it by alias — useful when several functions share the same PARTITION BY / ORDER BY.

QUALIFY filters on a window function's output the same way WHERE filters on raw columns and HAVING filters on aggregates — no wrapping subquery needed.

FILTER (WHERE …) tags onto any windowed aggregate to restrict which rows feed in — cleaner than wrapping the expression in a CASE inside the aggregate.

WINDOW + QUALIFY together:

qualify-window.sql

SELECT name, team, played_on, score,
       RANK() OVER w     AS rnk,
       SUM(score) OVER w AS running_total
FROM players
WINDOW w AS (PARTITION BY team ORDER BY score DESC, name)
QUALIFY rnk <= 2
ORDER BY team, rnk;

Both RANK and SUM reuse the named window w — change the partition or order once, both functions follow. QUALIFY rnk <= 2 keeps only the top two scores per team. Mira holds both Pintails slots — the slots belong to scores, not ducks. Try swapping rnk <= 2 for running_total >= 150 and the filter switches from a rank predicate to an aggregate one.

And FILTER (WHERE …) in action — each duck's "late half" total (games 2 + 3 only), without folding the rest into a CASE:

filter-window.sql

SELECT name, played_on, score,
       SUM(score) FILTER (WHERE played_on >= '2026-04-08') OVER (PARTITION BY name) AS late_total
FROM players
ORDER BY name, played_on;

Every row keeps its individual score, and late_total repeats the same per-duck post-opening-day sum across all three of that duck's rows.

The full toolkit

That's OVER, PARTITION BY, ORDER BY-inside-OVER, frames, ranking, row-to-row peeks, IGNORE NULLS, and the WINDOW / QUALIFY / FILTER ergonomics — about eight moving pieces that compose into most of the analytics queries you'll ever write. The DuckDB docs go further with FIRST_VALUE, LAST_VALUE, NTH_VALUE, NTILE (split a partition into N buckets — quartiles, deciles, percentiles), PERCENT_RANK, and CUME_DIST — different window functions inside the same OVER clause. Some (like RANK, LAG, NTILE) only exist with OVER; others are ordinary aggregates worn over a window.

Drill any of these into long-term memory: the athletic challenge series builds harder problems on a sport-scoring schema very similar to this one. Level 24: The ROW_NUMBER Awakens is the first window-function level; LAG/LEAD and SUM…OVER continue at the next two levels, all auto-graded.

Pondville only has four ducks. Drop in your own table — a longer roster, a different sport, the sales numbers on your desk — and watch what OVER does with it.