ELT Aggregate

ELT Aggregate

Overview

Use this Snap to add aggregate functions such as COUNT, SUM, MIN, and MAX along with the GROUP BY clause in the incoming SQL query. GROUP BY clauses are used to group table records based on columns that contain classifying data and are optional in this Snap. This Snap also allows you to preview the result of the output query. You can validate the modified query using this preview functionality. The supported aggregate functions vary based on the account configuration. See the description of the Aggregate Function field in the Snap Settings section for details.

Prerequisites

None.

Limitations

  • Only those columns that are referenced in this Snap are passed to the output. The rest are dropped. 

 

Known Issues

  • If you define a HAVING predicate in the HAVING Predicate List field set of the ELT Aggregate Snap (to apply on the Snap’s output data), the Snap connected downstream of this ELT Aggregate Snap returns an error corresponding to this predicate during the Pipeline execution.

  • Due to an issue with Databricks Runtime Version 11 and above, the Snap fails to calculate the value for the linear regression aggregate function REGR_R2 for the target DLP instance and returns a cast exception. As a workaround, you can revert your Databricks Runtime Version to 10.5 or below.

  • When running without Sub-Query Pushdown Optimization (SPDO), ELT Pipelines that contain an ELT Aggregate Snap and configured with one or more GROUP BY ROLLUP fields, do not verify the column data types while inserting the Snap output values in the target table. This may lead to incorrect data written to the target table. However, as long as SPDO is on, the same Pipeline runs without this issue.

Snap Input and Output

 

Input/Output

Type of View

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input/Output

Type of View

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 1

  • Max: 1

  • ELT Select

  • ELT Transform

The SQL query in which to add the aggregate functions and the optional GROUP BY clause.

Output

Document

  • Min: 1

  • Max: 1

  • ELT Insert-Select

  • ELT Limit

The modified SQL query with the aggregate functions and GROUP BY clause. 

Snap Settings

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

 

Parameter Name

Data Type

Description

Default Value

Example 

Parameter Name

Data Type

Description

Default Value

Example 

Label

String

ELT Aggregate

Aggregate Revenue

Get preview data

Checkbox

Not selected

Selected

General Aggregate Functions List

This field set enables you to specify the columns for which to add the aggregate functions. Each aggregate function must be specified as a new row. Click

to add a row.

This field set contains the following fields:

  • Function

  • Argument

  • Eliminate duplicates

  • Alias Name

Function*

String/Expression/Suggestion

Required. Select the aggregate function to use. Click

 to retrieve the list of the supported aggregate functions. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. 

Alternatively, you can also enter the name of the aggregate function to use; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. 

You can use the following aggregate functions:

  • AVG

  • COUNT

  • COUNT_IF

  • MAX

  • MIN

  • SUM

  • SKEW

  • MODE

  • KURTOSIS

Additionally, the following aggregate functions are available based on the database type:

Snowflake

Redshift

Azure Synapse

  • ANY_VALUE

  • BITAND_AGG

  • BITOR_AGG

  • BITXOR_AGG 

  • BOOLAND_AGG 

  • BOOLOR_AGG

  • BOOLXOR_AGG

  • MEDIAN

  • STDDEV

  • STDDEV_POP

  • STDDEV_SAMP

  • VAR_POP

  • VAR_SAMP

  • APPROXIMATE.COUNT

  • BIT_AND 

  • BIT_OR

  • BOOL_AND

  • BOOL_OR

  • MEDIAN

  • STDDEV_POP

  • STDDEV_SAMP

  • VAR_POP

  • VAR_SAMP

  • APPROX_COUNT_DISTINCT

  • COUNT_BIG

  • GROUPING

  • STDEV

  • STDEVP

  • VAR

  • VARP

Databricks Lakehouse Platform

BigQuery

  • ANY

  • BIT_OR

  • BIT_XOR

  • BOOL_AND

  • BOOL_OR

  • COLLECT_LIST

  • COLLECT_SET

  • EVERY

  • FIRST_VALUE_RESPECT_NULLS

  • FIRST_VALUE_IGNORE_NULLS

  • LAST_VALUE_RESPECT_NULLS

  • LAST_VALUE_IGNORE_NULLS

  • SOME

  • SKEWNESS

  • STDDEV

  • STDDEV_POP

  • STDDEV_SAMP

  • VAR_POP

  • VAR_SAMP

  • ANY_VALUE

  • BIT_AND

  • BIT_OR

  • BIT_XOR 

  • LOGICAL_AND

  • LOGICAL_OR

See Snowflake Aggregate Functions, Redshift SQL Functions Reference, Aggregate Functions (Transact-SQL), Databricks Built-in Functions Reference, or BigQuery Aggregate Functions in Standard SQL for more information on the respective aggregate function.

N/A

AVG

Argument*

String/Expression

Required. Enter the field name or expression on which you want to apply the general aggregate function.

N/A

REVENUE

Eliminate duplicates

Check box

Select to apply DISTINCT to the column specified in the Field field. This means that the aggregate function is applied only to the unique values in the column. 

Behavior with COUNT_IF aggregate function

Selecting this checkbox while using COUNT_IF aggregate function does not eliminate duplicate records in case of Snowflake and BigQuery databases, as there is no native support. However, it eliminates the duplicates from the list of records when used with a Redshift, Azure Synapse, or Databricks Lakehouse Platform (DLP) instance.

Not selected

Selected

Alias Name*

String

Required. Specify the column in which to display the result of the aggregate function. You can also reference this name in downstream Snaps to process the data further. 

N/A

TOTAL_REVENUE

Linear Regression Aggregate Functions List 

This field set enables you to specify the columns for which to apply the linear regression aggregate functions.  Each function must be specified as a new row. Click 

to add a row.

This field set contains the following fields:

  • Function

  • Argument 1

  • Argument 2

  • Alias Name

Function*

String/Expression/Suggestion

Required. Select the aggregate function to use. Click

 to retrieve the list of the supported linear regression aggregate functions. The list displays the functions supported by the database. An error is displayed if an account is not configured. See the Troubleshooting section for details. 

Alternatively, you can also enter the name of an aggregate function; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. 

You can use the following aggregate functions:

  • CORR

  • COVAR_POP

  • COVAR_SAMP

  • REGR_AVGX

  • REGR_AVGY

  • REGR_COUNT

  • REGR_INTERCEPT

  • REGR_R2

  • REGR_SLOPE

  • REGR_SXX

  • REGR_SXY

  • REGR_SYY

  • MINHASH (for Snowflake only)

  • OBJECT_AGG (for Snowflake only)

See Snowflake Aggregate FunctionsDatabricks Built-in Functions Reference, or BigQuery Aggregate Functions in Standard SQL for more information on the respective aggregate function.

Though not supported natively

BigQuery does not natively support the above list of Linear Regression Aggregate Functions. However, SnapLogic provides you with the ability to use these functions with BigQuery through a series of internal query rewrites.

N/A

CORR

Argument 1*

String/Expression

Required. Enter the first field name or expression on which you want to apply the Aggregate function.

N/A

REVENUE_LOC1

Argument 2*

String/Expression

Required. Enter the second field name or expression on which you want to apply the Aggregate function.

N/A

REVENUE_LOC2

Alias Name*

String

Required. Specify the column in which to display the result of the aggregate function. You can also reference this name in downstream Snaps to process the data further.

N/A

LINREGAGG_REVENUE

Aggregate Concatenation Functions List (Not valid for Databricks Lakehouse Platform)

This field set enables you to specify the list of aggregate concatenation functions for aggregation. The list displays the functions supported by the database that you select in the Account settings. Each function must be specified as a new row. Click 

to add a row.

This field set contains the following fields:

  • Aggregate Concatenation Function

  • Field Name

  • Alias Name

  • Delimiter

Function*

String/Expression/Suggestion

Required. Select the aggregate concatenation function to use. Click

 to retrieve the list of the supported functions. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. 

Alternatively, you can also enter the name of the aggregate concatenation function to use; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. 

You can use the following aggregate functions:

Snowflake

Redshift

Azure Synapse

  • LISTAGG

  • LISTAGG_DISTINCT

  • ARRAY_AGG

  • ARRAY_AGG_DISTINCT

  • LISTAGG

  • DISTINCT_LISTAGG

  • STRING_AGG

BigQuery





  • ARRAY_AGG

  • ARRAY_AGG_DISTINCT

  • ARRAY_CONCAT_AGG

  • STRING_AGG

  • STRING_AGG_DISTINCT





See Snowflake Aggregate Functions, Redshift SQL Functions ReferenceAggregate Functions (Transact-SQL), or BigQuery Aggregate Functions in Standard SQL for more information on the respective aggregate function.

N/A

LISTAGG

Argument*

String/Expression

RequiredEnter the field name or expression on which you want to apply the Aggregate Concatenation function.

N/A

NEW_LOCATIONS

Alias Name*

String

Required. Specify the column in which to display the result of the concatenation function. You can also reference this name in downstream Snaps to process the data further.

N/A

UNIQUE_LOCS_LIST

Delimiter

String/Expression

Specify the delimiting character (string constant) to be used to separate the concatenated values.

, (comma)

; (semi-colon)

Percentile Distribution Functions List (Not valid for Azure Synapse, Databricks Lakehouse Platform, BigQuery)

This field set enables you to specify the list of percentile distribution functions to be used for aggregation. The list displays the functions supported by the database that you select in the Account settings. Each function must be specified as a new row. Click 

to add a row.

This field set contains the following fields:

  • Function

  • Percentile

  • Alias Name

 Function*

String/Expression/Suggestion

Required. Select the percentile distribution function to use. Click

 to retrieve the list of the supported functions. The list displays the functions supported by the database that you select in the Account settings. An error is displayed if an account is not configured. See the Troubleshooting section for details. 

Alternatively, you can also enter the name of the percentile distribution function to use; you must, however, ensure that it is spelled correctly. Otherwise, the Snap displays an error. 

You can use the following aggregate functions:

Snowflake

Redshift

  • PERCENTILE_DISC

  • PERCENTILE_CONT

  • APPROXIMATE_PERCENTILE_DISC

  • PERCENTILE_CONT

See Snowflake Aggregate Functions or Redshift SQL Functions Reference for more information on the respective aggregate function.