Technical writings of Shkrt
There are plenty of posts on web, describing PostgreSQL window functions, and I decided to make another one, to share my view and my approach to understanding them.
Queries using window functions usually look like this one:
SELECT depname, empno, salary, avg(salary) OVER (PARTITION BY depname) FROM empsalary;
depname | empno | salary | avg
-----------+-------+--------+-----------------------
develop | 11 | 5200 | 5020.0000000000000000
develop | 7 | 4200 | 5020.0000000000000000
develop | 9 | 4500 | 5020.0000000000000000
develop | 8 | 6000 | 5020.0000000000000000
develop | 10 | 5200 | 5020.0000000000000000
personnel | 5 | 3500 | 3700.0000000000000000
personnel | 2 | 3900 | 3700.0000000000000000
sales | 3 | 4800 | 4866.6666666666666667
sales | 1 | 5000 | 4866.6666666666666667
sales | 4 | 4800 | 4866.6666666666666667
(10 rows)
copied from official documentation
To follow along, you can use the following setup:
createdb window_test --host=localhost --user=postgres
create table empsalary(
depname character varying,
empno integer,
salary integer);
insert into empsalary(depname, empno, salary) values
('develop', 11, 5200),
('develop', 7, 4200),
('develop', 9, 4500),
('develop', 8, 6000),
('develop', 10, 5200),
('personnel', 5, 3500),
('personnel', 2, 3900),
('sales', 3, 4800),
('sales', 1, 5000),
('sales', 4, 4800);
Window function part is starting from avg(salary)
. The window functions introduce the concepts of window frames
In the above example, avg(salary)
is the function, that is being applied to every row in the output.
The frame is representing the relation that is available to avg(salary)
on the current row. In the example, the frame is
defined by PARTITION BY depname
part. The whole relation available to window function initially, after filtering clauses,
is being partitioned by depname
column’s value. It results in
(5200 + 4200 + 4500 + 6000 + 5200)/5 = 5020.0
And we get the value of 5020.0
in every row related to develop
depname. The same applies to personnel
and sales
depnames.
The frame is defined either by partition by
or order by
clauses, otherwise, the defaults are applied.
when partition by
is used - the frame consists of the whole partition
when order by
is used - the frame includes all records up to current row. Further records are omitted
SELECT depname, empno, salary, avg(salary) OVER (ORDER BY empno) FROM empsalary;
depname │ empno │ salary │ avg
═══════════╪═══════╪════════╪═══════════════════════
sales │ 1 │ 5000 │ 5000.0000000000000000
personnel │ 2 │ 3900 │ 4450.0000000000000000
sales │ 3 │ 4800 │ 4566.6666666666666667
sales │ 4 │ 4800 │ 4625.0000000000000000
personnel │ 5 │ 3500 │ 4400.0000000000000000
develop │ 7 │ 4200 │ 4366.6666666666666667
develop │ 8 │ 6000 │ 4600.0000000000000000
develop │ 9 │ 4500 │ 4587.5000000000000000
develop │ 10 │ 5200 │ 4655.5555555555555556
develop │ 11 │ 5200 │ 4710.0000000000000000
(10 rows)
Here we see the illustration of frame scoping to all the rows up to the current row. We have ordered the relation by empno
column value. On the first row, avg(salary)
takes into account only first row, on the second row - first row + second row, and so on.
SELECT depname, empno, salary, avg(salary) OVER () FROM empsalary;
depname │ empno │ salary │ avg
═══════════╪═══════╪════════╪═══════════════════════
develop │ 11 │ 5200 │ 4710.0000000000000000
develop │ 7 │ 4200 │ 4710.0000000000000000
develop │ 9 │ 4500 │ 4710.0000000000000000
develop │ 8 │ 6000 │ 4710.0000000000000000
develop │ 10 │ 5200 │ 4710.0000000000000000
personnel │ 5 │ 3500 │ 4710.0000000000000000
personnel │ 2 │ 3900 │ 4710.0000000000000000
sales │ 3 │ 4800 │ 4710.0000000000000000
sales │ 1 │ 5000 │ 4710.0000000000000000
sales │ 4 │ 4800 │ 4710.0000000000000000
avg
is now applied to the whole table.
The avg
function used in the previous example is an aggregation function.
Window functions can be used with aggregate functions, or with so-called built-in window functions.
List of built-in window functions can be viewed here
postgres@window_test=# SELECT depname, empno, salary, rank() OVER (partition by depname order by salary desc) FROM empsalary;
depname │ empno │ salary │ rank
═══════════╪═══════╪════════╪══════
develop │ 8 │ 6000 │ 1
develop │ 10 │ 5200 │ 2
develop │ 11 │ 5200 │ 2
develop │ 9 │ 4500 │ 4
develop │ 7 │ 4200 │ 5
personnel │ 2 │ 3900 │ 1
personnel │ 5 │ 3500 │ 2
sales │ 1 │ 5000 │ 1
sales │ 3 │ 4800 │ 2
sales │ 4 │ 4800 │ 2
(10 rows)
Another thing to note is that window functions are evaluated after filtering and grouping clauses, therefore we cannot use grouping and filtering over windowing results.
Suggested reading:
[postgresql
]