ELT Window Functions

In this article

An account for the Snap

You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:

Overview

You can use this Snap to add window functions such as rank-relation functions, aggregate functions, and bitwise functions to the table's data. See Snowflake Window FunctionsRedshift Window FunctionsAzure Synapse Window FunctionsAzure Databricks Window Functions, or BigQuery Expressions, functions, and operators for details on these functions.

Prerequisites

None.

Limitation

ELT Snap Pack  does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.

Known Issues

In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):

  • WHERE clause (ELT Filter Snap)
  • WHEN clause
  • ON condition (ELT Join, ELT Merge Into Snaps)
  • HAVING clause
  • QUALIFY clause
  • Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
  • Update expressions list (column names and values in ELT Merge Into Snap)
  • Secondary AND condition
  • Inside SQL query editor (ELT Select and ELT Execute Snaps)

Workaround

As a workaround while using these SQL query constructs, you can:

  • Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
  • In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like SELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:
    • Write the data in the CSV file to a DLP table beforehand, as in: CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.
    • Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
  • In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.

Snap Input and Output

Input/OutputType of ViewNumber of ViewsExamples of Upstream and Downstream SnapsDescription
Input 

Document

  • Min: 1
  • Max: 1
  • ELT Select
  • ELT Copy
The data on which to add the window functions.
Output

Document

  • Min: 1
  • Max: 1
  • ELT Transform
  • ELT Insert-Select

The input data modified with the window functions. If the Pass through check box is selected, the output also includes the original input data.

Snap Settings

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

Parameter NameData TypeDescriptionDefault ValueExample 
LabelString
Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
ELT Window FunctionsAggregate Salary Data
Get preview dataCheck box

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in Databricks Lakehouse Platform

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Not selectedSelected
Pass throughCheck boxSelect to include the original data along with the aggregated data in the output. Otherwise, the output has only aggregated data. Not selectedSelected

Rank Related Functions List

Use this field set to enter the rank-related function to apply to the table's data. You must also specify an alias for each rank-related function. Specify each rank-related function as a separate row. Click  to add a new row. 

This field set contains the following fields:

  • Rank Function
  • Alias Name
Rank FunctionString/Suggestion

Enter the rank-related function to apply to the table data. Alternatively, click  to view and select a rank function from the suggestion list. The list displays the functions supported by the database that you selected in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details.  

The following functions are available for Snowflake, Redshift, Azure Synapse and Databricks Lakehouse Platform:

  • CUME_DIST
  • DENSE_RANK
  • PERCENT_RANK
  • ROW_NUMBER
  • RANK

The Snap fails during Pipeline validation/execution if you specify an incorrect function name. 

For more information on the ranking functions, refer to the following: 

N/ACUME_DIST
Alias NameStringEnter the alias name to associate with the specified rank function.N/ATOTAL
General Window Functions List

Use this field set to apply the aggregate function on the specified field in the table's data. You must also specify an alias for each aggregate function. Specify each aggregate function as a separate row. Click  to add a new row. 

This field set contains the following fields:

  • Aggregate Function
  • Field
  • Alias Name
Aggregate FunctionString/Suggestion

Enter the name of the aggregate function to apply to the table data. Alternatively, click  to view and select an aggregate function from the suggestion list. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details.

The available functions are:

SnowflakeRedshiftAzure Synapse
  • COUNT
  • SUM
  • MIN
  • MAX
  • AVG
  • ANY_VALUE
  • MEDIAN
  • MODE
  • STDDEV_POP
  • STDDEV_SAMP
  • VAR_POP
  • VAR_SAMP
  • CONDITIONAL_CHANGE_EVENT
  • CONDITIONAL_TRUE_EVENT
  • COUNT_IF
  • RATIO_TO_REPORT
  • NTILE
  • SKEW
  • KURTOSIS

See Snowflake Window Functions for more information. 

  • COUNT
  • SUM
  • MIN
  • MAX
  • AVG
  • MEDIAN
  • STDDEV_POP
  • STDDEV_SAMP
  • VAR_POP
  • VAR_SAMP
  • RATIO_TO_REPORT
  • NTILE
  • SKEW
  • MODE
  • KURTOSIS

See Redshift Window Functions for more information. 

  • COUNT
  • COUNT_BIG
  • SUM
  • MIN
  • MAX
  • AVG
  • STDEV
  • STDDEV
  • VAR
  • VARP
  • NTILE
  • SKEW
  • MODE
  • KURTOSIS

See Synapse Window Functions for more information. 

Databricks Lakehouse PlatformBigQuery
  • COUNT
  • SUM
  • MIN
  • MAX
  • AVG
  • STDDEV_POP
  • STDDEV_SAMP
  • VAR_POP
  • VAR_SAMP
  • NTILE
  • SKEW
  • MODE
  • KURTOSIS

See Azure Databricks Window Functions for more information.

  • ANY_VALUE
  • ARRAY_AGG
  • AVG
  • CORR
  • COUNT
  • COUNTIF
  • COVAR_POP
  • COVAR_SAMP
  • MAX
  • MIN
  • NTILE
  • STDDEV_POP
  • STDDEV_SAMP
  • STRING_AGG
  • SUM
  • VAR_POP
  • VAR_SAMP
  • SKEW
  • MODE
  • KURTOSIS

See BigQuery Aggregate Functions and Statistical Aggregate Functions for more information.


The Snap fails during Pipeline validation/execution if you specify an incorrect function name. 

N/ACOUNT
FieldString/ExpressionEnter the name of the field in the input table on which to apply the aggregate function.N/ASALARY
Alias NameStringEnter the alias name to associate with the specified aggregate function.N/ATOTAL_SALARY
Bitwise Aggregate Functions List (Snowflake Only)

Specify the bitwise aggregate function to apply to the specified field in the table's data. You must also specify an alias for each bitwise aggregate function. Specify each bitwise aggregate function as a separate row. Click  to add a new row. 

Applicable only to the Snowflake database.

This field set contains the following fields:

  • Bitwise Aggregate Function
  • Field
  • Alias Name
Bitwise Aggregate FunctionString/Suggestion

Enter the name of the bitwise aggregate function to apply to the table data. Alternatively, click the  button to view and select a bitwise aggregate function from the list of supported bitwise aggregate functions. 

Available options are:

The Snap fails during Pipeline validation/execution if you specify an incorrect function name. 

N/ABITOR_AGG
FieldString/ExpressionEnter the name of the field in the input table upon which to apply the bitwise aggregate function.N/ARANGE
Alias NameStringEnter the alias name to associate with the specified bitwise aggregate function.N/ABITOR_RANGE
Boolean Aggregate Functions List (Snowflake Only)

Specify the boolean aggregate function to apply to the specified field in the table's data. You must also specify an alias for each boolean aggregate function. Specify each boolean aggregate function as a separate row. Click  to add a new row. 

Applicable only to the Snowflake database.

This field set contains the following fields:

  • Boolean Aggregate Function
  • Field
  • Alias Name
Boolean Aggregate FunctionString/Suggestion

Enter the name of the boolean aggregate function to apply to the table data. Alternatively, click the  button to view and select a boolean aggregate function from the list of supported bitwise aggregate functions. 

Available options are:

The Snap fails during Pipeline validation/execution if you specify an incorrect function name. 

N/ABOOLOR_AGG
FieldString/ExpressionEnter the name of the field in the input table upon which to apply the boolean aggregate function.N/ARANGE
Alias NameStringEnter the alias name to associate with the specified boolean aggregate function.N/ABOOLOR_RANGE
Value Based Window Functions List

Use this field set to specify the columns for which to apply value-based Window functions. Each Window function must be specified as a new row. 
Click  to add a row.

This field set contains the following fields:

  • Value Window Function
  • Field
  • nth Value
  • Alias Name
Value Window FunctionString/Suggestion

Required. Enter the name of the window function to apply to the table data. You must ensure that the function is spelled correctly, else the Snap displays an error. See the Troubleshooting section for details.

Alternatively, click  to view and select a function from the list of window functions. The list displays the functions supported by the database that you select in the Account settings. 

The available functions are:

Snowflake
Redshift
Azure Synapse
  • FIRST_VALUE_IGNORE_NULLS
  • FIRST_VALUE_RESPECT_NULLS
  • LAST_VALUE_IGNORE_NULLS
  • LAST_VALUE_RESPECT_NULLS
  • NTH_VALUE_FROM_FIRST_IGNORE_NULLS
  • NTH_VALUE_FROM_FIRST_RESPECT_NULLS
  • NTH_VALUE_FROM_LAST_IGNORE_NULLS
  • NTH_VALUE_FROM_LAST_RESPECT_NULLS

See