An introduction to MySQL window functions
1. Overview
Since version 8.0, MySQL supports a list of window functions that for each row from a query, perform a calculation using rows related to that row.
Here is the list of window functions:
Name | Description |
---|---|
CUME_DIST | Cumulative distribution value |
DENSE_RANK | Rank of current row within its partition, without gaps |
FIRST_VALUE | Value of argument from first row of window frame |
LAG | Value of argument from row lagging current row within partition |
LAST_VALUE | Value of argument from last row of window frame |
LEAD | Value of argument from row leading current row within partition |
NTH_VALUE | Value of argument from N-th row of window frame |
NTILE | Bucket number of current row within its partition. |
PERCENT_RANK | Percentage rank value |
RANK | Rank of current row within its partition, with gaps |
ROW_NUMBER | Number of current row within its partition |
Other than that, most aggregate functions can also be used as window functions.
2. How to use MySQL Window Function
Similar to the aggregate functions, window functions can divide the table into multiple subsets of data and process them separately while it doesn’t reduce the number of rows returned, thus boost the query performance.
Here is the generic syntax of a window function in MySQL:
window_function_name (expression)
OVER (
[partition_defintion]
[order_definition]
[frame_definition]
)
partition_defintion
Here is the syntax for the partition clause:
PARTITION BY <expression>[{,<expression>...}]
The PARTITION BY clause divines the table into several partitions, then the window function will perform within each partition and restart in the other partitions. It’s similar to a grouping.
order_definition
Here is the syntax for the order clause:
ORDER BY <expression> [ASC|DESC], [{,<expression>...}]
The ORDER BY clause defines how the rows within a partition to be sorted.
frame_definition
Here is the generic syntax for the frame description:
frame_unit {<frame_start>|<frame_between>}
A frame is a subset of the current partition and the frame clause specifies how to define the subset. It’s determined with respect to the current row, which enables a frame to move within a partition depending on the location of the current row within its partition.
For example, a frame can be defined to be all rows from the current rows to the end of the current partition, or it can be defined to be n rows on either side of the current row.
3. Example
We have this table hr_dept:
emp_no | salary | from_date | to_date |
---|---|---|---|
10001 | 60117 | 1986-06-26 | 1987-06-26 |
10001 | 62102 | 1987-06-26 | 1988-06-25 |
10001 | 66074 | 1988-06-25 | 1989-06-25 |
10002 | 69366 | 1999-08-03 | 2000-08-02 |
10002 | 71963 | 2000-08-02 | 2001-08-02 |
10003 | 43636 | 1998-12-02 | 1999-12-02 |
10003 | 43478 | 1999-12-02 | 2000-12-01 |
10004 | 42542 | 1988-11-30 | 1989-11-30 |
10004 | 46065 | 1989-11-30 | 1990-11-30 |
10004 | 48271 | 1990-11-30 | 1991-11-30 |
10004 | 50594 | 1991-11-30 | 1992-11-29 |
10004 | 52119 | 1992-11-29 | 1993-11-29 |
10004 | 54693 | 1993-11-29 | 1994-11-29 |
Let’s rank the records based on the value of salary for each employee:
SELECT
emp_no,
salary,
from_date,
to_date,
DENSE_RANK() OVER(PARTITION BY emp_no ORDER BY salary) AS rank_salary
FROM hr_dept;
The results will be:
emp_no | salary | from_date | to_date | rank_salary |
---|---|---|---|---|
10001 | 60117 | 1986-06-26 | 1987-06-26 | 1 |
10001 | 62102 | 1987-06-26 | 1988-06-25 | 2 |
10001 | 66074 | 1988-06-25 | 1989-06-25 | 3 |
10002 | 69366 | 1999-08-03 | 2000-08-02 | 1 |
10002 | 71963 | 2000-08-02 | 2001-08-02 | 2 |
10003 | 43478 | 1999-12-02 | 2000-12-01 | 1 |
10003 | 43636 | 1998-12-02 | 1999-12-02 | 2 |
10004 | 42542 | 1988-11-30 | 1989-11-30 | 1 |
10004 | 46065 | 1989-11-30 | 1990-11-30 | 2 |
10004 | 48271 | 1990-11-30 | 1991-11-30 | 3 |
10004 | 50594 | 1991-11-30 | 1992-11-29 | 4 |
10004 | 52119 | 1992-11-29 | 1993-11-29 | 5 |
10004 | 54693 | 1993-11-29 | 1994-11-29 | 6 |
To calculate the average salary of each employee up to the current rows in the table, we define the frame:
SELECT
emp_no,
salary,
from_date,
to_date,
AVG(salary) OVER(PARTITION BY emp_no ORDER BY salary ROWS UNBOUNDED PRECEDING) AS cur_avg_salary
FROM hr_dept;
The results will be:
emp_no | salary | from_date | to_date | cur_avg_salary |
---|---|---|---|---|
10001 | 60117 | 1986-06-26 | 1987-06-26 | 60117.0000 |
10001 | 62102 | 1987-06-26 | 1988-06-25 | 61109.5000 |
10001 | 66074 | 1988-06-25 | 1989-06-25 | 62764.3333 |
10002 | 69366 | 1999-08-03 | 2000-08-02 | 69366.0000 |
10002 | 71963 | 2000-08-02 | 2001-08-02 | 70664.5000 |
10003 | 43478 | 1999-12-02 | 2000-12-01 | 43478.0000 |
10003 | 43636 | 1998-12-02 | 1999-12-02 | 43557.0000 |
10004 | 42542 | 1988-11-30 | 1989-11-30 | 42542.0000 |
10004 | 46065 | 1989-11-30 | 1990-11-30 | 44303.5000 |
10004 | 48271 | 1990-11-30 | 1991-11-30 | 45626.0000 |
10004 | 50594 | 1991-11-30 | 1992-11-29 | 46868.0000 |
10004 | 52119 | 1992-11-29 | 1993-11-29 | 47918.2000 |
10004 | 54693 | 1993-11-29 | 1994-11-29 | 49047.3333 |
Need a good GUI tool for databases? TablePlus provides a native client that allows you to access and manage Oracle, MySQL, SQL Server, PostgreSQL, and many other databases simultaneously using an intuitive and powerful graphical interface.
Not on Mac? Download TablePlus for Windows.
On Linux? Download TablePlus for Linux
Need a quick edit on the go? Download TablePlus for iOS