ELT Window Functions
In this article
An account for the Snap
You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:
Overview
You can use this Snap to add window functions such as rank-relation functions, aggregate functions, and bitwise functions to the table's data. See Snowflake Window Functions, Redshift Window Functions, Azure Synapse Window Functions, Azure Databricks Window Functions, or BigQuery Expressions, functions, and operators for details on these functions.
Prerequisites
None.
Limitation
ELT Snap Pack does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.
Known Issues
In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):
WHERE
clause (ELT Filter Snap)WHEN
clauseON
condition (ELT Join, ELT Merge Into Snaps)HAVING
clauseQUALIFY
clause- Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
- Update expressions list (column names and values in ELT Merge Into Snap)
- Secondary
AND
condition Inside SQL query editor (ELT Select and ELT Execute Snaps)
Workaround
As a workaround while using these SQL query constructs, you can:
- Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
- In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like
SELECT * FROM CSV.`/mnt/csv1.csv`
returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:- Write the data in the CSV file to a DLP table beforehand, as in:
CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv`
where a1, b1, and so on are the new column names. - Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
- Write the data in the CSV file to a DLP table beforehand, as in:
- In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.
Snap Input and Output
Input/Output | Type of View | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The data on which to add the window functions. |
Output | Document |
|
| The input data modified with the window functions. If the Pass through check box is selected, the output also includes the original input data. |
Snap Settings
SQL Functions and Expressions for ELT
You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.
Parameter Name | Data Type | Description | Default Value | Example | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Label | String | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | ELT Window Functions | Aggregate Salary Data | ||||||||||||
Get preview data | Check box | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following:
Rendering Complex Data Types in Databricks Lakehouse Platform Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type. | Not selected | Selected | ||||||||||||
Pass through | Check box | Select to include the original data along with the aggregated data in the output. Otherwise, the output has only aggregated data. | Not selected | Selected | ||||||||||||
Rank Related Functions List | Use this field set to enter the rank-related function to apply to the table's data. You must also specify an alias for each rank-related function. Specify each rank-related function as a separate row. Click to add a new row. This field set contains the following fields:
| |||||||||||||||
Rank Function | String/Suggestion | Enter the rank-related function to apply to the table data. Alternatively, click to view and select a rank function from the suggestion list. The list displays the functions supported by the database that you select The following functions are available for Snowflake, Redshift, Azure Synapse and Databricks Lakehouse Platform:
The Snap fails during Pipeline validation/execution if you specify an incorrect function name. For more information on the ranking functions, refer to the following: | N/A | CUME_DIST | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified rank function. | N/A | TOTAL | ||||||||||||
General Window Functions List | Use this field set to apply the aggregate function on the specified field in the table's data. You must also specify an alias for each aggregate function. Specify each aggregate function as a separate row. Click to add a new row. This field set contains the following fields:
| |||||||||||||||
Aggregate Function | String/Suggestion | Enter the name of the aggregate function to apply to the table data. Alternatively, click to view and select an aggregate function from the suggestion list. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. The available functions are:
The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | COUNT | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table on which to apply the aggregate function. | N/A | SALARY | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified aggregate function. | N/A | TOTAL_SALARY | ||||||||||||
Bitwise Aggregate Functions List (Snowflake Only) | Specify the bitwise aggregate function to apply to the specified field in the table's data. You must also specify an alias for each bitwise aggregate function. Specify each bitwise aggregate function as a separate row. Click to add a new row. Applicable only to the Snowflake database. This field set contains the following fields:
| |||||||||||||||
Bitwise Aggregate Function | String/Suggestion | Enter the name of the bitwise aggregate function to apply to the table data. Alternatively, click the button to view and select a bitwise aggregate function from the list of supported bitwise aggregate functions. Available options are: The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | BITOR_AGG | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table upon which to apply the bitwise aggregate function. | N/A | RANGE | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified bitwise aggregate function. | N/A | BITOR_RANGE | ||||||||||||
Boolean Aggregate Functions List (Snowflake Only) | Specify the boolean aggregate function to apply to the specified field in the table's data. You must also specify an alias for each boolean aggregate function. Specify each boolean aggregate function as a separate row. Click to add a new row. Applicable only to the Snowflake database. This field set contains the following fields:
| |||||||||||||||
Boolean Aggregate Function | String/Suggestion | Enter the name of the boolean aggregate function to apply to the table data. Alternatively, click the button to view and select a boolean aggregate function from the list of supported bitwise aggregate functions. Available options are: The Snap fails during Pipeline validation/execution if you specify an incorrect function name. | N/A | BOOLOR_AGG | ||||||||||||
Field | String/Expression | Enter the name of the field in the input table upon which to apply the boolean aggregate function. | N/A | RANGE | ||||||||||||
Alias Name | String | Enter the alias name to associate with the specified boolean aggregate function. | N/A | BOOLOR_RANGE | ||||||||||||
Value Based Window Functions List | Use this field set to specify the columns for which to apply value-based Window functions. Each Window function must be specified as a new row. This field set contains the following fields:
| |||||||||||||||
Value Window Function | String/Suggestion | Required. Enter the name of the window function to apply to the table data. You must ensure that the function is spelled correctly, else the Snap displays an error. See the Troubleshooting section for details. Alternatively, click to view and select a function from the list of window functions. The list displays the functions supported by the database that you select in the Account settings. The available functions are: |