ELT Window Functions
In this article
An account for the Snap
You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:
Overview
You can use this Snap to add window functions such as rank-relation functions, aggregate functions, and bitwise functions to the table's data. See Snowflake Window Functions, Redshift Window Functions, Azure Synapse Window Functions, Azure Databricks Window Functions, or BigQuery Expressions, functions, and operators for details on these functions.
Prerequisites
None.
Limitation
ELT Snap Pack does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.
Known Issues
In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):
WHERE
clause (ELT Filter Snap)WHEN
clauseON
condition (ELT Join, ELT Merge Into Snaps)HAVING
clauseQUALIFY
clause- Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
- Update expressions list (column names and values in ELT Merge Into Snap)
- Secondary
AND
condition Inside SQL query editor (ELT Select and ELT Execute Snaps)
Workaround
As a workaround while using these SQL query constructs, you can:
- Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
- In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like
SELECT * FROM CSV.`/mnt/csv1.csv`
returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:- Write the data in the CSV file to a DLP table beforehand, as in:
CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv`
where a1, b1, and so on are the new column names. - Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
- Write the data in the CSV file to a DLP table beforehand, as in:
- In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.
Snap Input and Output
Input/Output | Type of View | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The data on which to add the window functions. |
Output | Document |
|
| The input data modified with the window functions. If the Pass through check box is selected, the output also includes the original input data. |
Snap Settings
SQL Functions and Expressions for ELT
You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.
Parameter Name | Data Type | Description | Default Value | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Label | String | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | ELT Window Functions | Aggregate Salary Data | ||||||||||||
Get preview data | Check box | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following:
Rendering Complex Data Types in Databricks Lakehouse Platform Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type. | Not selected | Selected | ||||||||||||
Pass through | Check box | Select to include the original data along with the aggregated data in the output. Otherwise, the output has only aggregated data. | Not selected | Selected | ||||||||||||
Rank Related Functions List | Use this field set to enter the rank-related function to apply to the table's data. You must also specify an alias for each rank-related function. Specify each rank-related function as a separate row. Click to add a new row. This field set contains the following fields:
| |||||||||||||||
Rank Function | String/Suggestion | Enter the rank-related function to apply to the table data. Alternatively, click to view and select a rank function from the suggestion list. The list displays the functions supported by the database that you select The following functions are available for Snowflake, Redshift, Azure Synapse and Databricks Lakehouse Platform:
The Snap fails during Pipeline validation/execution if you specify an incorrect function name. For more information on the ranking functions, refer to the following: | N/A | CUME_DIST | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified rank function. | N/A | TOTAL | ||||||||||||
General Window Functions List | Use this field set to apply the aggregate function on the specified field in the table's data. You must also specify an alias for each aggregate function. Specify each aggregate function as a separate row. Click to add a new row. This field set contains the following fields:
| |||||||||||||||
Aggregate Function | String/Suggestion | Enter the name of the aggregate function to apply to the table data. Alternatively, click to view and select an aggregate function from the suggestion list. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. The available functions are:
The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | COUNT | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table on which to apply the aggregate function. | N/A | SALARY | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified aggregate function. | N/A | TOTAL_SALARY | ||||||||||||
Bitwise Aggregate Functions List (Snowflake Only) | Specify the bitwise aggregate function to apply to the specified field in the table's data. You must also specify an alias for each bitwise aggregate function. Specify each bitwise aggregate function as a separate row. Click to add a new row. Applicable only to the Snowflake database. This field set contains the following fields:
| |||||||||||||||
Bitwise Aggregate Function | String/Suggestion | Enter the name of the bitwise aggregate function to apply to the table data. Alternatively, click the button to view and select a bitwise aggregate function from the list of supported bitwise aggregate functions. Available options are: The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | BITOR_AGG | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table upon which to apply the bitwise aggregate function. | N/A | RANGE | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified bitwise aggregate function. | N/A | BITOR_RANGE | ||||||||||||
Boolean Aggregate Functions List (Snowflake Only) | Specify the boolean aggregate function to apply to the specified field in the table's data. You must also specify an alias for each boolean aggregate function. Specify each boolean aggregate function as a separate row. Click to add a new row. Applicable only to the Snowflake database. This field set contains the following fields:
| |||||||||||||||
Boolean Aggregate Function | String/Suggestion | Enter the name of the boolean aggregate function to apply to the table data. Alternatively, click the button to view and select a boolean aggregate function from the list of supported bitwise aggregate functions. Available options are: The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | BOOLOR_AGG | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table upon which to apply the boolean aggregate function. | N/A | RANGE | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified boolean aggregate function. | N/A | BOOLOR_RANGE | ||||||||||||
Value Based Window Functions List | Use this field set to specify the columns for which to apply value-based Window functions. Each Window function must be specified as a new row. This field set contains the following fields:
| |||||||||||||||
Value Window Function | String/Suggestion | Required. Enter the name of the window function to apply to the table data. You must ensure that the function is spelled correctly, else the Snap displays an error. See the Troubleshooting section for details. Alternatively, click to view and select a function from the list of window functions. The list displays the functions supported by the database that you select in the Account settings. The available functions are:
Redshift does not natively support the NTH_VALUE_FROM_LAST_RESPECT_NULL and NTH_VALUE_FROM_LAST_IGNORE_NULL functions. But, the ELT Window Functions Snap supports them through special rewrites (using internal SQL rewrites) on Redshift. | N/A | FIRST_VALUE_IGNORE_NULLS | ||||||||||||
Field | String/Expression | Required. Apply the window function to the target column. | N/A | QTY | ||||||||||||
nth Value | String/Expression | Enter a number for the nth value function. This field is ignored by the FIRST_VALUE and the LAST_VALUE functions during the execution. | N/A | 3 | ||||||||||||
Alias Name | String | Required. Specify the column alias name in which to display the result of the aggregate function. You can also reference this name in downstream Snaps to process the data further. | N/A | |||||||||||||
LEAD and LAG Window Functions List | Use this field set to add LEAD and LAG Window functions. Specify each function as a new row. Click to add a row. This field set contains the following fields:
| |||||||||||||||
Lead Lag Window Function | String/Suggestion | Required. Enter the name of the Lead Lag function to use. You must ensure that it is spelled correctly, else the Snap displays an error. See the Troubleshooting section for details.
Redshift does not natively support the following default value expressions in the LEAD and LAG syntax which is supported by Snowflake and Azure Synapse. But, the ELT Window Functions Snap supports these functions through special rewrites:
| N/A | LEAD_IGNORE_NULLS | ||||||||||||
Field | String/Expression | Required. Apply the Lead Lag Window function to the target column. | N/A | QTY | ||||||||||||
Offset Value | String/Expression | Enter the number the LEAD or LAG function uses to fetch the leading or lagging value from the input table partition for the given input field. | N/A | 2 | ||||||||||||
Default Value | String/Expression | Enter a SQL expression the LEAD or LAG functions use when the offset goes out of the bounds of the window. | N/A | 5 | ||||||||||||
Alias Name | String | Specify the column alias name in which to display the result of the aggregate function. You can also reference this name in downstream Snaps to process the data further. | N/A | lead_rn | ||||||||||||
Linear Regression Related Functions List | Use this field set to add Linear Regression Related Window functions. Specify each function as a new row. Click to add a row. This field set contains the following fields:
| |||||||||||||||
Window function | String/Suggestion | Required. Enter the name of the Linear Regression function to use. You must ensure that it is spelled correctly, else the Snap displays an error. See the Troubleshooting section for details.
See Snowflake Window Functions for more information. Linear Regression Window Functions in Redshift, Azure Synapse, Databricks Lakehouse Platform, and BigQuery These databases do not natively support the window functions listed above. But, the ELT Window Functions Snap supports all of these functions on Redshift, Azure Synapse, Databricks Lakehouse Platform, and BigQuery databases through special rewrites. Accuracy in Output Values The Scale and Precision that the different databases (Snowflake/Redshift/Azure Synapse) allow for calculation of the values has a direct bearing on the accuracy of the Window Function values that the Snap returns in the output. | N/A | REGR_AVGY | ||||||||||||
First Field | String/Expression | Required. Specify the first field (argument) for the selected function. | N/A | QTY | ||||||||||||
Second Field | String/Expression | Required. Specify the second field (argument) for the selected function. | N/A | 2 | ||||||||||||
Alias Name | String | Required. Specify the column alias name in which to display the result of the Window function. You can also reference this name in downstream Snaps to process the data further. | N/A | WinFunc_RegrValue | ||||||||||||
Aggregate Concatenation Functions List (Not valid for Azure Synapse, Databricks Lakehouse Platform) | This field set enables you to specify the list of aggregate concatenation functions to be used for aggregation. The list displays the functions supported by the database that you select in the Account settings. Each function must be specified as a new row. Click to add a row. This field set contains the following fields:
| |||||||||||||||
Aggregate Concatenation Function | String/Expression/Suggestion | Required. Select the aggregate concatenation function to use. Click to retrieve the list of the supported functions. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. Alternatively, you can also enter the name of the aggregate concatenation function to use; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. You can use the following aggregate functions:
See Snowflake Aggregate Functions, Redshift SQL Functions Reference, Aggregate Functions (Transact-SQL), or BigQuery Aggregate Functions in Standard SQL for more information on the respective aggregate function. | N/A | LISTAGG | ||||||||||||
Field Name | String/Expression | Required. Specify the column name to apply the concatenation function. | N/A | NEW_LOCATIONS | ||||||||||||
Alias Name | String | Required. Specify the column in which to display the result of the concatenation function. You can also reference this name in downstream Snaps to process the data further. | N/A | UNIQUE_LOCS_LIST | ||||||||||||
Delimiter | String/Expression | Specify the delimiting character (string constant) to be used to separate the concatenated values. | , (comma) | ; (semi-colon) | ||||||||||||
Percentile Distribution Functions List (Not valid for Azure Synapse, Databricks Lakehouse Platform) | This field set enables you to specify the list of percentile distribution functions to be used for aggregation. The list displays the functions supported by the database that you select in the Account settings. Each function must be specified as a new row. Click to add a row. This field set contains the following fields:
| |||||||||||||||
Percentile Distribution Function | String/Expression/Suggestion | Required. Select the percentile distribution function to use. Click to retrieve the list of the supported functions. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. Alternatively, you can also enter the name of the percentile distribution function to use; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. You can use the following aggregate functions:
| N/A | PERCENTILE_DISC | ||||||||||||
Percentile | String/Expression | Required. Specify the percentile value to be considered for applying the percentile distribution function. | N/A | 0.8 (80th percentile) | ||||||||||||
Alias Name | String | Required. Specify the column in which to display the result of the percentile distribution function. You can also reference this name in downstream Snaps to process the data further. | N/A | REVENUE_CPERCENTILE | ||||||||||||
PARTITION BY Fields List | Enter the fields in the table to use to partition the output data. Specify each field in a separate row. Click to add a new row. This field set contains the following fields:
You cannot apply the window framing clause, window ordering clause, or window partitioning clause to different functions in the same Snap. You must add another ELT Window Functions Snap and configure it with the target functions to do so. | |||||||||||||||
Partition By Field | String/Expression | Enter the name of the field to partition the data with. | N/A | TOTAL_SALARY | ||||||||||||
Order By Fields List | Specify the field to use to sort the output data along with the sort order preference and the placement for rows with null values in the specified field. Specify each field in a separate row. Click to add a new row. This field set contains the following fields:
You cannot apply the window framing clause, window ordering clause, or window partitioning clause to different functions in the same Snap. You must add another ELT Window Functions Snap and configure it with the target functions to do so. | |||||||||||||||
Order By Field | String | Enter the name of the field/column in the table based on which to sort the rows. | N/A | TOTAL_SALARY | ||||||||||||
Sort Order | String | Select the order in which to sort the rows. Available options are:
| ASC | DESC | ||||||||||||
Null Value Sort Preference | Drop-down list | Select whether to keep rows with null values first or last after the sort operation. Available options are:
| NULLS FIRST | NULLS LAST | ||||||||||||
Window Frame | Use this field set to specify the range of the rows in the input table upon which to apply the window functions. This field set contains the following fields:
| |||||||||||||||
Start of Window Frame | Drop-down list | Select the policy/condition to indicate the first row in the window frame. Available options are:
| UNBOUNDED PRECEDING | CURRENT ROW | ||||||||||||
Bounded Start Value | Integer | Activates when you select the BOUNDED VALUE PRECEDING or BOUNDED VALUE FOLLOWING option in the Start of Window Frame field. Enter the row value from which to start the window frame. | N/A | 2 | ||||||||||||
End of Window Frame | Drop-down list | Select the policy/condition to indicate the last row in the window frame. Available options are:
| UNBOUNDED FOLLOWING | BOUNDED VALUE FOLLOWING | ||||||||||||
Bounded End Value | Integer | Activates when you select the BOUNDED VALUE PRECEDING or BOUNDED VALUE FOLLOWING option in the End of Window Frame field. Enter the row value at which to end the window frame. | N/A | 8 |
Troubleshooting
Error | Reason | Resolution |
---|---|---|
Set an account for the Snap in the Accounts tab. | Account configuration is mandatory in the ELT Window Function Snap. An account has not been selected/configured in the Snap. | Select or configure an account for the Snap. See Configuring ELT Database Accounts for details. |
Examples
Calculating Aggregate Data for an Organization
We want to calculate the aggregate scores such as, MIX, MAX, and MEDIAN and also want to apply bitwise and boolean aggregate functions to a table in the Snowflake database. This Pipeline shows how you can use the ELT Windows Function Snap to accomplish this task.
First, we use the ELT Select Snap to build a query to retrieve all records from the target table.
Upon execution, this Snap builds the query as shown below:
We want to perform the following functions on the above table:
- Apply rank-related functions.
- Calculate aggregate data for the columns RANGE and QTY.
- Apply bitwise aggregate functions to the data in the RANGE and QTY columns.
- Apply boolean aggregate functions to the data in the LOCAL column.
- Partition the output by the data in the DEPT column.
- Sort the output based on the data in the LOCAL in descending order and keep rows with null values in this column at the last.
To do so, we add the ELT Window Functions Snap and configure it as follows:
Upon successful execution, the ELT Window Functions Snap performs the specified calculations upon the input data and shows the following output:
We can add an ELT Insert-Select or ELT Merge Into Snap downstream of the ELT Window Function Snap and write the above output into another table.
Downloads
Important Steps to Successfully Reuse Pipelines
- Download and import the Pipeline into SnapLogic.
- Configure Snap accounts as applicable.
- Provide Pipeline parameters as applicable.
Snap Pack History
See Also
- ELT Snap Pack
- Best Practices for Using the ELT Snap Pack
- Snowflake Window Functions
- Snowflake Rank Functions
- Redshift Window Functions
- Getting Started with SnapLogic
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2024 SnapLogic, Inc.