Window Functions

The Data Virtuality Server supports ANSI SQL 2003 window functions. A window function allows an aggregate function to be applied to a subset of the result set without needing a GROUP BY clause. A window function is similar to an aggregate function, but requires the use of an OVER clause or window specification.

Usage

CODE

aggregate|ranking OVER ([PARTITION BY ...] [ORDER BY ...] [<FRAME-CLAUSE>])

aggregate can be any aggregate function. Ranking can be one of ROW_NUMBER(), RANK(), DENSE_RANK().

Syntax Rules

Window functions can only appear in the SELECT and ORDER BY clauses of a query expression;
Window functions cannot be nested in one another;
Partitioning and order by expressions cannot contain subqueries or outer references;
The ranking (ROW_NUMBER, RANK, DENSE_RANK) functions require the use of the window specification ORDER BY clause;
An XMLAGG ORDER BY clause cannot be used when windowed;
The window specification ORDER BY clause cannot reference alias names or use positional ordering;
Windowed aggregates may not use DISTINCT if the window specification is ordered;
The Frame Clause can only be used in combination with the Partition Clause (at least one of PARTITION BY or ORDER BY should be present).

Frame Clause

The frame clause allows the user to specify a dynamic group of rows, or a range, inside the partition (analogous to a sliding frame in a window). It can be used in two ways:

```
[ RANGE | ROWS ] frame_start
```

[ RANGE | ROWS ] BETWEEN frame_start AND frame_end

Where frame_start and frame_end can be one of:

```
UNBOUNDED PRECEDING
```
```
value PRECEDING (ROWS only)
```
```
CURRENT ROW
```
```
value FOLLOWING (ROWS only)
```
UNBOUNDED FOLLOWING

Restrictions are that frame_start cannot be UNBOUNDED FOLLOWING, frame_end cannot be UNBOUNDED PRECEDING, and the frame_end choice cannot appear earlier in the above list then the frame_start choice. For example, RANGE BETWEEN CURRENT ROW AND value PRECEDING is not allowed.

Examples of Usage

Let's assume we have the following table called my_table:

x	y
1	1
2	1
3	1
4	2
5	3

Examples of Frame Clause usage

SQL

SELECT x, SUM(y) OVER (PARTITION BY y ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS window_column 
FROM data_source.my_table;
 
-- Result:
-- |-----------|---------------|
-- |     x     | window_column |
-- |-----------|---------------|
-- |     1     |       1       |
-- |     2     |       2       |
-- |     3     |       3       |
-- |     4     |       2       |
-- |     5     |       3       |
-- |-----------|---------------|
 
 
SELECT x, COUNT(y) OVER (PARTITION BY y RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS window_column 
FROM data_source.my_table;
 
-- Result:
-- |-----------|---------------|
-- |     x     | window_column |
-- |-----------|---------------|
-- |     1     |       3       |
-- |     2     |       3       |
-- |     3     |       3       |
-- |     4     |       1       |
-- |     5     |       1       |
-- |-----------|---------------|
 
 
SELECT x, y*100/SUM(y) OVER (PARTITION BY y RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS window_column 
FROM data_source.my_table;
 
-- Result:
-- |-----------|---------------|
-- |     x     | window_column |
-- |-----------|---------------|
-- |     1     |       33      |
-- |     2     |       33      |
-- |     3     |       33      |
-- |     4     |       100     |
-- |     5     |       100     |
-- |-----------|---------------|

Function Definitions

Function	Description
`ROW_NUMBER()`	Functionally same as `COUNT(*)` with the same window specification. Assigns a number to each row in a partition starting at 1
`RANK()`	Assigns a number to each unique ordering value within each partition starting at 1, such that the next rank is equal to the count of prior rows
`DENSE_RANK()`	Assigns a number to each unique ordering value within each partition starting at 1, such that the next rank is sequential
`LEAD(scalar_expression [, offset [, default]])`	Returns `scalar_expression` evaluated at the row that is offset rows before the current row within the partition; if there is no such row, instead returns default (which must be of the same type as `scalar_expression`). Both offset and default are evaluated for the current row. If omitted, offset defaults to 1 and default to null
`LAG(scalar_expression [, offset [, default]])`	Returns `scalar_expression` evaluated at the row that is offset rows after the current row within the partition; if there is no such row, instead returns default (which must be of the same type as `scalar_expression`). Both offset and default are evaluated for the current row. If omitted, offset defaults to 1 and default to null
`FIRST_VALUE(scalar_expression)`	Returns `scalar_expression` evaluated at the row that is the first row of the window frame
`LAST_VALUE(scalar_expression)`	Returns `scalar_expression` evaluated at the row that is the last row of the window frame

Processing

Window functions are logically processed just before creating the output from the SELECT clause. Window functions can use nested aggregates if a GROUP BY clause is present. The is no guaranteed effect on the output ordering from the presence of window functions. The SELECT statement must have an ORDER BY clause to have a predictable ordering.

The Data Virtuality Server will process all window functions with the same window specification together. Generally, a full pass over the row values coming into the SELECT clause will be required for each unique window specification. For each window specification, the values will be grouped according to the PARTITION BY clause. If no PARTITION BY clause is specified, the entire input is treated as a single partition. The output value is determined based upon the current row value, its peers (that is, rows that are the same for their ordering), and all prior row values based upon ordering in the partition. The ROW_NUMBER function will assign a unique value to every row regardless of the number of peers.

Example Windowed Results

SQL

SELECT name, salary, max(salary) over (partition by name) as max_sal,
          rank() over (order by salary) as rank, dense_rank() over (order by salary) as dense_rank,
          row_number() over (order by salary) as row_num 
FROM data_source.employees

name	salary	max_sal	rank	dense_rank	row_num
John	100000	100000	2	2	2
Henry	50000	50000	5	4	5
John	60000	100000	3	3	3
Suzie	60000	150000	3	3	4
Suzie	150000	150000	1	1	1