If you regularly use the Structured Query Language (SQL) to extract data from databases and produce reports, it is essential that you learn to use window functions. In this article, I’ll describe 15 types of window functions as clearly and understandably as possible.
What is a Window Function?
Window functions allow you to create partitions in tables while enabling rows to retain their separate identities.
This is an official definition from PostgreSQL’s documentation:
A window function performs a calculation across a set of table rows that are somehow related to the current row. This is comparable to the type of calculation that can be done with an aggregate function. But unlike regular aggregate functions, use of a window function does not cause rows to become grouped into a single output row — the rows retain their separate identities.
What is a “Window?”
The term “window” refers to an individual group or partition within a result set from a SQL query.
What is the PARTITION BY Clause?
Most window functions have a PARTITION BY clause. The PARTITION BY clause divides a result set into partitions and performs a computation on each partitioned data subset.
What is the OVER Clause?
A distinguishing feature of window functions is the presence of an OVER clause. The OVER clause indicates that a function will be applied to all rows returned by a query.
Use Cases for Window Functions
Some use cases include:
- Ranking results within specific windows (e.g., ranking sales in each partition)
- Accessing results from another row within specific windows (e.g., displaying a value associated with highest sales within a partition, displaying sales from a previous row, etc.)
- Aggregation within a specific window (e.g., sales totals, running sales totals, running sales averages, etc.)
- Percentiles within a specific window (e.g., sales percentiles within a group)
- Display individual date differences over time (e.g., displaying the number of days between each ship date in a purchase order table)
The Three Categories of Window Functions
There are three categories of window functions:
- Ranking – displays ranking information associated with rows and numerical columns
- Value – displays assigned values that are associated with each partition
- Aggregation – displays aggregate values from numerical columns
Below, you’ll find fifteen types of window functions divided by these variations. Tableau’s sample “Super Store Data” was used as the data source. The code and table examples were created in SQL Server.
Ranking Window Functions
ROW_NUMBER
This window function assigns a sequential integer to rows within each partition with the first row starting at 1.
SELECT
Country
,State
,Category
,[Sub-Category]
,[Sales]
,ROW_NUMBER() OVER (PARTITION BY [Sub-Category] ORDER BY Sales DESC) AS Row_Number
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the row numbers are shown in the last column. Row Numbers are assigned based on descending sales values.
RANK
This window function assigns a rank value to each row within each ordered partition of a result set.
SELECT
Country
,State
,Category
,[Sub-Category]
,[Sales]
,RANK() OVER (PARTITION BY [Sub-Category] ORDER BY [Sales] DESC) AS Sales_Rank
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the sales ranks are shown in the last column. Sales ranks are assigned based on descending sales values.
DENSE_RANK
This window function assigns a rank value to each row within each partition of a result, but it does not skip ranks if there is a tie between ranks.
SELECT
Country
,State
,Category
,[Sub-Category]
,avg([Days to Ship Actual]) as DaysToShip_Avg
,DENSE_RANK() OVER (PARTITION BY [Sub-Category] ORDER BY avg([Days to Ship Actual]) DESC) AS DaysToShip_Avg_Dense_Rank
FROM [SuperStoreSampleData].[dbo].SuperStoreData
GROUP BY [Sub-Category], Category, State, Country
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the ranks of average days to ship are shown in the last column. Ranks of average days to ship are assigned based on descending sales values. As you can see, there’s a tie between the top two ranks for the Binder sub-category, so the ranks do not skip.
PERCENT_RANK
This window function calculates the percent rank of a value in a set of values based on the number of values in a partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,FORMAT(PERCENT_RANK() OVER (PARTITION BY [Sub-Category] ORDER BY Sales DESC), ‘p’) as Sales_Percent_Rank
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the sales percent ranks are shown in the last column. Sales percent ranks are assigned based on descending sales values.
CUME_DIST
This window function calculates the cumulative distribution of a value in a set of values. In other words, it calculates the relative position of a value in a partition of values.
SELECT
Country
,State
,Category
,[Sub-Category]
,[Sales]
,CUME_DIST() OVER (PARTITION BY [Sub-Category] ORDER BY [Sales] DESC) AS Sales_Cumulative_Distribution
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the sales cumulative distributions are shown in the last column. Sales cumulative distributions are assigned based on descending sales values.
NTILE
This window function divides ordered rows in a partition into a specified number of ranked partitions of as equal size as possible and returns the partition that each row falls into.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,NTILE(4) OVER (ORDER BY [Sales] DESC) AS Sales_NTILE
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is ordered by descending sales. In the last column, four rank numbers are evenly assigned based on sales.
Value Window Functions
FIRST_VALUE
This window function displays a value with respect to the first row in a window frame.
Here is an example of the FIRST_VALUE window function:
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,FIRST_VALUE(State) OVER (PARTITION BY [Sub-Category] ORDER BY [Sales] DESC) AS State_First_Value
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the first value of State is shown in the last column.
LAST_VALUE
This window function displays a value with respect to the last row in a window frame.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,LAST_VALUE(State) OVER (PARTITION BY [Sub-Category] ORDER BY [Sales] DESC
RANGE BETWEEN
UNBOUNDED PRECEDING AND
UNBOUNDED FOLLOWING
) AS State_Last_Value
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the last value of State is shown in the last column.
LAG
This window function displays a value for a row before a current row in a partition. If no row exists, NULL is displayed.
SELECT
Country
,State
,Category
,[Sub-Category]
,[Order Date]
,LAG([Order Date], 1) OVER (PARTITION BY [Sub-Category] ORDER BY [Order Date]) AS Order_Date_Lag
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the value of Order Date in the previous row is shown in the last column.
LEAD
This window function displays a value for a row after a current row in a partition. If no row exists, NULL is displayed.
SELECT
Country
,State
,Category
,[Sub-Category]
,[Order Date]
,LEAD([Order Date], 1) OVER (PARTITION BY [Sub-Category] ORDER BY [Order Date]) AS Order_Date_Lead
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the value of Order Date in the next row is shown in the last column.
Aggregation Window Functions
SUM
This window function returns the sum of numerical values in each partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,SUM(Sales) OVER (PARTITION BY [Sub-Category]) AS Sales_Sum
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the sum of sales is shown in the last column.
AVG
This window function returns the average of numerical values in each partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,AVG(Sales) OVER (PARTITION BY [Sub-Category]) AS Sales_Avg
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the average of sales is shown in the last column.
COUNT
This window function displays the count of rows in each partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,COUNT(*) OVER (PARTITION BY [Sub-Category]) AS Sales_Count
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the count of rows is shown in the last column.
MAX
This window function returns the maximum of numerical values in each partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,MAX(Sales) OVER (PARTITION BY [Sub-Category]) AS Sales_Max
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the maximum sales value is shown in the last column.
MIN
This window function returns the minimum of numerical values in each partition.
SELECT
Country
,State
,Category
,[Sub-Category]
,Sales
,MIN(Sales) OVER (PARTITION BY [Sub-Category]) AS Sales_Min
FROM [SuperStoreSampleData].[dbo].SuperStoreData
The table above is partitioned by Sub-Category. When it comes to the Binders Sub-Category, the minimum sales value is shown in the last column.
Resources:
Udacity SQL Nanodegree:
🎓 https://imp.i115008.net/3nLAk
“T-SQL Window Functions: For data analysis and beyond” – Microsoft Press:
“T-SQL Querying (Developer Reference)” – Microsoft Press:
“SQL Practice Problems: 57 beginning, intermediate, and advanced challenges for you to solve using a “learn-by-doing” approach” – Sylvia Moestl Vasilik:
“The Complete SQL Bootcamp 2020: Go from Zero to Hero” – Jose Portilla (Udemy):
References:
- http://drill.apache.org/docs/sql-window-functions-introduction/
- https://www.postgresql.org/docs/9.1/tutorial-window.html
- https://www.sqlservertutorial.net/sql-server-window-functions/
- https://www.sqlservertutorial.net/sql-server-window-functions/sql-server-cume_dist-function/
- https://mode.com/sql-tutorial/sql-window-functions/
- https://www.sqlshack.com/sql-partition-by-clause-overview/
- https://docs.microsoft.com/en-us/sql/t-sql/queries/select-over-clause-transact-sql?view=sql-server-ver15
- https://chartio.com/resources/tutorials/using-window-functions/
- https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTILE.html
Very helpful, thanks for putting this together.