ELT Merge Into

Overview

You can use this Snap to perform the MERGE INTO operation from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to merge data from:

  • Amazon S3 buckets in to the Amazon Redshift database.

  • Amazon Redshift or Google Cloud Storage in to BigQuery tables. 

  • A DBFS folder in to a DLP table.

Refer to ELT Source-Target-Hosting Support Matrix for complete information. 

The MERGE INTO operation updates, inserts, and deletes rows in/from the target table based on values in a source table. You can set match conditions in the Snap based on which the operation is carried out. The Snap creates a table and inserts the data if the target table does not exist.

After successfully validating the Pipeline with this Snap, you can use this Snap's output view to verify whether the data would be inserted correctly in the target table.

Snap Type

The ELT Merge Into Snap is a write-type Snap that writes data from the source file to matching rows in an existing table in the target CDW.

Prerequisites

  • Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):

    • Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage

    • Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery

  • Appropriate database permissions to read, insert, update, or delete records.

  • Appropriate database permissions to create a table.

  • Appropriate read permissions for the source table that you want to use in the MERGE INTO operation. 

Limitations

  • Though Snowflake allows any number of merge (insert, update, and delete) actions in any combination, you can configure a maximum of three Merge-into Statements in this Snap. The ELT Merge Into Snap returns an error when you specify more than 3 actions, more than 2 WHEN MATCHED actions, or more than 1 WHEN NOT MATCHED action.

Known Issues

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 1

  • Max: 1

  • ELT Transform

  • ELT Select

  • ELT Copy

The source table you want to compare with the target table before updating, inserting, or deleting the values in the target table. 

Output

Document

  • Min: 0

  • Max: 1

  • ELT Transform

  • ELT Select

A document containing the SQL query executed on the target database and the rows data inserted in the target table (after running the Pipeline successfully). 

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current Pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about SQL Expressions and Functions supported for ELT. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

  • Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

Field Dependency

Description

Field Name

Field Type

Field Dependency

Description

Label*

 

Default ValueELT Merge Into
ExampleMerge Sales Data

String

None.

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

 

Get preview data

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in DLP

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Database Name

 

Default Value: N/A
Example: TESTDB

String/Expression/Suggestion

None.

Specify the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings.

Schema Name*

 

Default Value: N/A
Example: SALES

String/Expression/Suggestion

None.

Specify the name of the database schema. In case it is not defined, then the suggestions for the schema name retrieves all schema names in the specified database when you click the Suggestion icon.

Target Table Name*

 

Default Value: N/A
Example: SALES_ORDERS

String/Expression/Suggestion

None.

Specify the name of the table or view in which you want to perform the MERGE INTO operation. 

Target Table Hash Distribution Column (Azure Synapse Only)

 

Default Value: N/A
Example: var_table

String/Expression

None.

Specify the Hash distribution column name for the target table in Azure Synapse, if you configure INSERT action in the Snap.

 

Target Table Alias

 

Default Value: N/A
Example: tgt

String/Expression

None.

Specify the alias name for the target table.
You can use the alias names in MERGE INTO statements, instead of using the lengthy target table names.

Input Source Alias

 

Default Value: N/A
Example: src

String/Expression

None.

Enter the alias name that you want to use for the source table.

ON Condition*

 

Default Value: N/A
Example: src.PROD_ID=tgt.PROD_ID

String/Expression

None.

The condition based on which you want to check the values in the target table. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Enable default settings for merge into operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to run the merge into operation with the default settings and assumptions (the same set of column names are used in the source and target tables in the same order) in the following scenarios.

  • For an Update operation: To update all columns of the target table with the corresponding source table column values.

    • Extra columns in the target table, if any, undergo no changes.

  • For an Insert operation: To insert all the corresponding source column values into the target table.

    • Extra columns in the target table, if any, are populated with nulls.

    • Extra columns in the source table, if any, result in failure with the appropriate error message.

Selecting this checkbox disables all the fields below except Merge-into Statements fieldset. Proceed to define your Merge-into Statements.

Enable automatic source table alias prefix

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to prefix the Input Source Alias value to all the source column names inside the expressions defined (without an alias) in the Insert and Update Expression List fieldsets below.

For example: colA+colB-colC in the Insert value field is replaced with src.colA+src.colB-src.colC when the Snap prepares the final MERGE INTO SQL statement.

Proceed to define your Merge-into Statements and the appropriate expression lists.

Enable update all columns operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to perform a WHEN MATCHED update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.

  • Selecting this checkbox disables the Update Expression List - When Matched fieldset and enables the Merge Into All Columns Operations Type field.

Proceed to define your Merge-into Statements and the appropriate expression lists.

Enable update all columns by source operation (Only for supported CDWs)

 

Default ValueNot selected
ExampleSelected

Checkbox

Target CDW is Azure Synapse, DLP, or Google BigQuery.

 

Select this checkbox to perform a WHEN NOT MATCHED BY SOURCE update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.

  • Selecting this checkbox disables the Update Expression List - When Not Matched by Source (Only for Supported CDWs) fieldset and enables the Merge Into All Columns Operations Type field.

Enable insert all columns operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to perform a WHEN NOT MATCHED insert operation on all columns in the target table. You can choose the insert operation type (by source table column name (default) or by the order of source table columns) for this operation.

  • Selecting this checkbox disables the Insert Expression List fieldset and enables the Merge Into All Columns Operations Type field.

Merge Into All Columns Operations Type

 

Default ValueSource and target table column names are identical
ExampleSource table column order

Drop-down list

Enable update all columns operation checkbox, Enable update all columns by source operation (Only for supported CDWs) checkbox, or Enable insert all columns operation checkbox is selected.

Select one of the following scenarios associated with the selected operation type for the Merge-Into operation.

  • Source and target table column names are identical. When the source column names exactly match with the target table column names (though not in the same order).

  • Source table column order. When the order of your source column names does not match with the order of target table column names and hence the Merge Into operation shall follow the order of the source table column names while performing the INSERT and UPDATE operations defined in the Merge-into Statements fieldset.

Merge-into Statements

Use this fieldset to specify the conditions that activate the MERGE INTO operation and the additional conditions that must be met. Specify each condition in a separate row.

This fieldset contains the following fields:

  • When Clause

  • Condition

  • Action

When Clause

Default Value: N/A
ExampleWHEN MATCHED

String/Expression/Suggestion

 

Specify the matching condition based on the outcome of the ON Condition. Alternatively, select a clause from the suggestion list.

Available options are:

  • WHEN MATCHED: Applies the specified condition and action when the source data matches with the target.

  • WHEN NOT MATCHED: Applies the specified condition and action when the source data does not match with the target.

  • WHEN NOT MATCHED BY SOURCE: (Available only for Azure Synapse, DLP, and BigQuery.) Applies the specified condition and action when the target rows do not match the source rows.

Condition

Default Value: N/A
ExampleTOTAL_ORDERS > 0

String/Expression

None.

Specify the additional criteria that must be specified.
The action associated for the specified condition is not performed if the condition's criteria is not fulfilled. It can be a combination of both source and target tables, source table only, target table only, or may not contain references to any table at all.

Having this additional condition allows the Snap to identify whether the UPDATE or DELETE action must be performed (since both the actions correspond to the WHEN MATCHED clause).

You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Action

 

Default ValueINSERT
ExampleDELETE

Drop-down list

None.

Choose the action to apply on the condition.

Available options are:

  • INSERT

  • UPDATE

  • DELETE

With Snowflake, Amazon Redshift, and DLP, you can specify up to three Merge Into statements with one of the above options for each WHEN clause.

With Azure Synapse and Google BigQuery, you can specify up to five Merge Into statements with one of the above options for each WHEN clause.

Insert Expression List

Use this fieldset to specify the column in which you want to insert the data and also the values that you want to insert into them. Each column must be specified in a separate row.

This fieldset contains the following fields:

  • Insert Column

  • Insert Value

Insert Column

 

Default Value: N/A
ExamplePRODUCT_ID

String/Expression/Suggestion

None.

Specify the column in which the value must be inserted. Alternatively, select a column from the suggestion list.

Insert Value

 

Default Value: N/A
Examplesrc.PRODUCT_ID

String/Expression

None.

Specify the value to be inserted for the column. This field can be an SQL expression that references the source table's columns only or none. It cannot reference the target table's columns.

You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Update Expression List - When Matched

Use this fieldset to add expressions whose values must be updated when the the Merge Into condition matches the source data. Specify each column to be updated in a separate row.

This fieldset contains the following fields:

  • Update Column

  • Update Value

Update Column

 

Default Value: N/A
ExampleTOTAL_ORDERS

String/Expression/Suggestion

None.

Specify the column in which the value must be updated. Alternatively, select a column name from the suggestion list.

 

Update Value

 

Default Value: N/A
Examplesrc.TOTAL_ORDERS + 1

String/Expression

None.

Specify the new value to be inserted when the Merge Into condition matches the source.
This field can be an SQL expression that may reference columns in the source table, the target table, both source and target, or neither. 
You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Update Expression List - When Not Matched By Source (Only for Supported CDWs)

Applicable only for Azure Synapse, DLP, and BigQuery, and is used only when you choose the WHEN NOT MATCHED BY SOURCE as the When clause. 
Use this fieldset to add expressions listing out columns whose values must be updated or deleted if the source rows do not match the target rows in the Merge Into operation. 

This fieldset contains the following fields:

  • Update Column

  • Update Value

Update Column

 

Default Value: N/A
ExampleTOTAL_ORDERS

String/Expression/Suggestion

None.

Specify the column name in which the value must be updated or deleted as specified in the conditions. Alternatively, select a column from the suggestion list.

Update Value

 

Default Value: N/A
Examplesrc.TOTAL_ORDERS + 1

String/Expression

None.

Specify the new value to be inserted when the Merge Into condition matches the source.
This field can be an SQL expression that may reference columns in the source table, the target table, both source and target, or neither. 
You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Different Flavors of Merge Into Operation

Merge Into Operation

Snap Settings needed (in addition to the mandatory fields)

Merge Into Operation

Snap Settings needed (in addition to the mandatory fields)

Default merge into operation

(Choose this operation to update and insert all the differential data from the source table to the target table)

  • Select the Enable default settings for merge into operation checkbox.

Use expressions based on source column names to specify the Insert values for the Insert Expression List

  • Select the Enable automatic source table alias prefix checkbox.

  • Select the type of merge into operation as Source and target table column names are identical or Source table column order.

  • Specify the Merge-into Statements.

  • Define the list of insert expressions for individual columns in the Insert Expression List.

Update all columns but insert values only in specific columns

  • Select the Enable update all columns (by target) operation checkbox.

  • Select the type of merge into operation as Source and target table column names are identical or Source table column order.

  • Specify the Merge-into Statements.

  • Define the lists of insert and update expressions for individual columns in the respective Expression Lists. Ensure that you use appropriate table aliases both for source and target columns.

Update all columns by source (Target CDW is Azure Synapse or BigQuery) but insert values only in specific columns

  • Select the Enable update all columns by source operation (Only for ) checkbox.

  • Select the type of merge into operation as Source and target table column names are identical or Source table column order.

  • Specify the Merge-into Statements.

  • Provide the values for each of these columns where the source is not updated (WHEN NOT MATCHED BY SOURCE) in the Insert Expression List fieldset.

  • Define the list of update expressions for individual columns in the Update Expression List - When Matched fieldset. Ensure that you use appropriate table aliases both for source and target columns.

Insert all columns but update values only in specific columns

  • Select the Enable insert all columns operation checkbox.

  • Select the type of merge into operation as Source and target table column names are identical or Source table column order.

  • Specify the Merge-into Statements.

  • Define the lists of update expressions (when matched and when not matched by source) for individual columns in the respective Update Expression Lists. Ensure that you use appropriate table aliases both for source and target columns.

To update the target table without specifying any column/field-wise insert or update expressions.

  • Select the last four checkboxes (all except the Enable default settings for merge into operation checkbox).

  • Select the type of merge into operation as Source and target table column names are identical or Source table column order.

  • Specify the Merge-into Statements.

Preventing SQL Injection

You can pass Pipeline parameters as values in an SQL expression; however, if you do not properly phrase the expression it can lead to the parameter's name being bound as a value in the database. This potentially incorrect information being inserted into the database is known as SQL injection. It is thus necessary to take precautions when including Pipeline parameters in your SQL expression to prevent SQL injection. Based upon the intended use of the Pipeline parameter, use one or both of the following methods to prevent accidental SQL injection:

Method 1: Simple Substitutions

You can reference the Pipeline parameter directly with a JSON-path without enabling expressions.

For example, if you want to use the Pipeline parameter, name, which contains the value of a column in the ON Condition, ConditionInsert Value, or Update Value fields: 

ON Condition or Condition field: colname = _name Insert Value or Update Value field: _name

Method 2: Dynamic Substitutions

You must enable expressions when using Pipeline parameters for dynamic substitutions. Format the SQL expression, except the Pipeline parameter's reference, as a string. 

For example, if you want to use the Pipeline parameter, name, which contains the value of a column in the ON Condition, ConditionInsert Value, or Update Value fields: 

ON Condition or Condition field: _columnname + “= _name” Insert Value or Update Value field: “_name”

The Snap evaluates the expression and also carries out path substitutions.

Here is how it works

The Snap pre-processes the query to extract any JSON-Paths and converts them to bound parameters. For example, consider the following query:

_columnname + “= _name”

The Snap converts this query into the following before turning it into a prepared statement for the database:

The Snap evaluates the JSON-Path to get the value to bind the Pipeline parameter in the prepared statement. 

Pipeline Execution Statistics

As a Pipeline executes, the Snap shows the following statistics updating periodically. You can monitor the progress of the Pipeline as each Snap performs executes.

  • Records Added

  • Records Updated

  • Records Deleted

You can view more information when clicking the Download Query Details link.

Troubleshooting

Example

Retrieving and Transforming Records and Merging Them Into A Table

In this example, we want to retrieve records from two tables. First, we transform the retrieved records using conditions and use a JOIN clause to join the queries from the two tables. We then perform transformations on the records and apply aggregate function. Finally, we merge the records by applying conditions and actions. This example demonstrates how we can accomplish this task.

First, we use two ELT Select Snaps to build SELECT queries to retrieve all records from the tables (ORDERS_34R_2KB and CUSTOMER_50R_3KB). The configuration and query output is as shown below:

ELT Select Snaps

Output

ELT Select Snaps

Output

The tables have several columns. But, we want to retrieve only the CUST_CODE and ORD_AMOUNT columns. Therefore, we configure the ELT Transform Snap with the following settings, which builds the query as shown below:

ELT Transform

Output

ELT Transform

Output

The SELECT queries in the ELT Select and ELT Transform Snaps form the inputs for the ELT Join Snap. Now, we apply an inner join based on matching values of the CUST_CODE column in the tables. Hence, we configure the ELT Join Snap and validate. Upon validation, the ELT Join Snap combines both the incoming SELECT queries and adds the JOIN clause. The output preview is as shown below:

ELT Join

Output

ELT Join

Output

We use the ELT Transform Snap to retrieve only the CUST_NAME and ORD_AMOUNT columns after joining the queries. We configure the Snap with the following setting which builds the query as shown below:

ELT Transform Snap

Output

ELT Transform Snap

Output

Subsequently, we use an ELT Aggregate Snap to apply an aggregate function on the values in the ORD_AMOUNT column. Here, we want to calculate the SUM for the ORD_AMOUNT column.


The Snap builds the query as shown below:


Finally, we use the ELT Merge Into Snap to merge the records into the Target table MERGE_INTO_OUT_01 based on the ON Condition, apply Merge-into statements, update actions, and update expressions.
The ON condition checks for the customer's name - if the customer's name matches, it applies the Merge-into statements and applies conditions as configured below. In this case, if the customer's name matches and if the condition (SUM_COLUMN<5000) is met, then the Snap updates the record with the customer's name. If the customer's name does not match and if the condition (SrcAlias.SUM_COLUMN<5000) is met, then it inserts the customer's name into the record.

Upon successful execution, the Pipeline adds and updates customer records in the Target database table in the configured database.

Download this Pipeline.

Downloads