
Intro to Window Functions in SQL
How to use Window functions to perform operations across a set of rows
23 August, 2022
11
11
0
Contributors
Window functions can help you run operations on a selection of rows and return a value from that original query.
The term window describes the set of rows on which the function operates. A window function uses values from the rows in a window to calculate the returned values.
Some common uses of window function include calculating cumulative sums, moving average, ranking, and more. (Chartio)
Window functions are initiated with the OVER clause, and are configured using three concepts:
•
window partition (PARTITION BY) - groups rows into partitions
•
window ordering (ORDER BY) defines the order or sequence of rows within each window
•
window frame (ROWS) - defines the window by use of an offset from the specified row
For this tutorial, we will cover PARTITION and ORDER BY. I will assume you have basic to intermediate SQL experience. If you don’t, here are some great resources to get started.
Window Function vs. Group By:
•
Window functions don't reduce the number of rows in the output.
•
Window functions can retrieve values from other rows, whereas GROUP BY functions cannot.
•
Window functions can calculate running totals and moving averages, whereas GROUP BY functions cannot.
Getting Started
I will be working with an Olympic Medalist table called summer_medal from Datacamp. (If you are a student with an edu email, and want to get three months of free Datacamp visit — GitHub Student Developer Pack).
Here is an overview of the table.

The table represents the Olympic games from 1896 to 2010, containing every medal winner from each country, sport, event, gender, and discipline.
ROW_NUMBER()
As mentioned earlier, using OVER() identifies the window function. The first function in this tutorial is ROW_NUMBER(). This function assigns a number to each record in the row.

Output
We can see that we use the ROW_NUMBER() to create and assign a row number to selected variables. We alias the window function as Row_Number and sort it so we can get the first-row number on the top.
ROW NUMBER() with ORDER BY()
We can combine ORDER BY and ROW_NUMBER to determine which column should be used for the row number assignment. Let’s find the DISTINCT sports, and assign them row numbers based on alphabetical order.

PARTITION BY
Using PARTITION BY you can split a table based on a unique value from a column. It is useful when we have to perform a calculation on individual rows of a group using other rows of that group. This clause works on windows functions only, like- LAG(), LEAD(), RANK(), etc.
What is LAG()?
SQL LAG() is a window function that outputs a row that comes before the current row.
Here is an excellent example of how it relates to our data. Let’s find the players separated by gender, who won the gold medal in singles for tennis and who won the year before from 2004 onwards.
There are several steps to this problem. First, we would want to create a CTE, which allows you to define a temporary named result set that available temporarily in the execution scope of a statement — if you’re stuck here, visit my other post to learn more.
Since we would want our results to have the winner from the year before we can use LAG().

Output
We can see that the results for both males and females are outputted in a single column — this is how partition helped. We recognize there are 3 winners for males and 3 for females. The first winner for both genders was in 2004, and if we look at the right, we see a NULL, because there is no winner before this since we started in 2004. Using LAG and PARTITION BY helps achieve this.
Let’s try using LEAD().
What is LEAD()?
SQL LEAD() is a window function that outputs a row that comes after the current row — essentially the opposite to LAG().
Let’s use the same question from the tennis example, but instead, find the future champion, not the past champion.

Output
We only changed LAG to LEAD and altered the alias to future champion, and we can achieve the opposite result.
RANK()
SQL RANK is similar to ROW_NUMBER except it will assign the same number to rows with identical values, skipping over the following number. There is also DENSE_RANK which assigns a number to a row with equal values but doesn’t skip over a number. If this all seems confusing, don’t worry. See below for a side by side comparison of what that would look like.

Notice the 3 different types of columns.
•
Row_number - nothing new here, we are merely adding value for each row in the query.
•
Rank_number - Here, we give a ranking based on the values but notice we do not have the rank 3. Instead, we have 2s, and we skip to rank 4.
•
Dense_rank - Similar to rank_number but instead of skipping the rank 3, we include it.
That is the main difference between RANK and DENSE_RANK. One includes a rank preceding a jointly ranked number, and one doesn’t.
Here is the code I used to get the table above.
Final Outcomes
And that concludes this introduction to window functions. There are many more functionalities to windows functions including a ROWS, NTILE, as well as aggregate functions (SUM, MAX, MIN, etc.). I will be posting tutorials on how to utilize window functions more in SQL, so be sure to stay tuned for my latest posts.
sql