Window Aggregation Analytics (Spark)
The Window Aggregation analytics plugin for Spark is available in the Hub.
Plugin version: 1.1.0
Specify a window over which functions should be applied. Supported functions:Â Rank
, Dense Rank
, Percent Rank
, N tile
, Row Number
, Median
, Continuous Percentile
, Lead
, Lag
, First
, Last
, Cumulative distribution
, Accumulate
.
The plugin is used when you want to calculate some basic aggregations in your data similar to what you could do with a window function in SQL.
BigQuery ELT Transformation Pushdown (6.9.0+)
Window aggregation stages are now eligible to execute in BigQuery when BigQuery ELT Transformation Pushdown is enabled in a pipeline. Window aggregation stages will be executed in BigQuery when a preceding stage has already been executed in BigQuery (such as a Join operation or another aggregation stage) or if the source is BigQuery. All the above mentioned functions are supported in BigQuery.
Configuration
Property | Macro Enabled? | Description |
---|---|---|
Partition fields | Yes | Required. Specifies a list of fields, comma-separated, to partition the data by. At least 1 field must be provided. Records with the same value for all these fields will be grouped together. |
Order | Yes | Optional. Specifies key-value pairs containing the ordering field, and the order (ascending or descending). All data types are allowed, except when |
Frame Type | Yes | Optional. Selects the type of window frame to create within each partition. Options can be Default is |
Unbounded preceding | Yes | Optional. Whether to use an unbounded start boundary for a frame. Default is |
Unbounded following N | Yes | Optional. Whether to use an unbounded end boundary for a frame. Default is |
Preceding | Yes | Optional. Specifies the number of preceding rows in the window frame. When Frame Type is |
Following | Yes | Optional. Specifies the number of following rows to include in the window frame. When Frame Type is |
Aggregates | Yes | Required. Specifies a list of functions to run on the selected window. Supported aggregate functions are |
Number of partitions | Yes | Optional. Number of partitions to use when grouping fields. If not specified, the execution framework will decide on the number to use. |
Clause Constraints
Function | Partition fields | Order | Frame Type |
---|---|---|---|
rank | Required | Required | Not supported |
dense_rank | Required | Required | Not supported |
percent_rank | Required | Required | Not supported |
n_tile | Required | Required | Not supported |
row_number | Required | Required | Not supported |
continous_percentile | Required | Not supported | Not supported |
lead | Required | Required | Not supported |
lag | Required | Required | Not supported |
first | Required | Required | Optional |
last | Required | Required | Optional |
cumulative_distribution | Required | Required | Not supported |
accumulate | Required | Optional | Optional |
Functions with Arguments
There are few functions which require the field
and argument
as per the syntax alias:function(field,encoded(arguments),ignoreNulls)
. If the function doesn't require the field or the argument, then it's ignored.
Function | field | argument |
---|---|---|
rank | Â | Â |
dense_rank | Â | Â |
percent_rank | Â | Â |
n_tile | Â | Required : an integer greater than 0 |
row_number | Â | Â |
continous_percentile | Required | Required : a numeric between 0 and 1 (both inclusive) |
lead | Required | Required : a non-negative integer |
lag | Required | Required : a non-negative integer |
first | Required | Â |
last | Required | Â |
cumulative_distribution | Â | Â |
accumulate | Required | Â |
Sample Pipeline
Input Records
name | age | location |
---|---|---|
peter | 20 | US |
foo | 22 | US |
rajeev | 24 | US |
john | 28 | US |
alex | 30 | US |
ravi | 20 | INDIA |
kenny | 30 | INDIA |
Window Aggregations Configuration
Partition fields : location
Order : age:ascending
Frame Type : None
Aggregates :
my_rank: rank(,,true)
next_value:lead(age,1,false)
Output Records
name | age | location | my_rank | next_value |
---|---|---|---|---|
peter | 20 | US | 1 | 22 |
foo | 22 | US | 2 | 24 |
rajeev | 24 | US | 3 | 28 |
john | 28 | US | 4 | 30 |
alex | 30 | US | 5 | Â |
ravi | 20 | INDIA | 1 | 30 |
kenny | 30 | INDIA | 2 | Â |
Created in 2020 by Google Inc.