ELT Merge Into

Overview

You can use this Snap to perform the MERGE INTO operation from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to merge data from:

  • Amazon S3 buckets in to the Amazon Redshift database.

  • Amazon Redshift or Google Cloud Storage in to BigQuery tables. 

  • A DBFS folder in to a DLP table.

Refer to ELT Source-Target-Hosting Support Matrix for complete information. 

The MERGE INTO operation updates, inserts, and deletes rows in/from the target table based on values in a source table. You can set match conditions in the Snap based on which the operation is carried out. The Snap creates a table and inserts the data if the target table does not exist.

After successfully validating the Pipeline with this Snap, you can use this Snap's output view to verify whether the data would be inserted correctly in the target table.

Snap Type

The ELT Merge Into Snap is a write-type Snap that writes data from the source file to matching rows in an existing table in the target CDW.

Prerequisites

  • Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):

    • Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage

    • Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery

  • Appropriate database permissions to read, insert, update, or delete records.

  • Appropriate database permissions to create a table.

  • Appropriate read permissions for the source table that you want to use in the MERGE INTO operation. 

Limitations

  • Though Snowflake allows any number of merge (insert, update, and delete) actions in any combination, you can configure a maximum of three Merge-into Statements in this Snap. The ELT Merge Into Snap returns an error when you specify more than 3 actions, more than 2 WHEN MATCHED actions, or more than 1 WHEN NOT MATCHED action.

Known Issues

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 1

  • Max: 1

  • ELT Transform

  • ELT Select

  • ELT Copy

The source table you want to compare with the target table before updating, inserting, or deleting the values in the target table. 

Output

Document

  • Min: 0

  • Max: 1

  • ELT Transform

  • ELT Select

A document containing the SQL query executed on the target database and the rows data inserted in the target table (after running the Pipeline successfully). 

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current Pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about SQL Expressions and Functions supported for ELT. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

  • Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

Field Dependency

Description

Field Name

Field Type

Field Dependency

Description

Label*

 

Default ValueELT Merge Into
ExampleMerge Sales Data

String

None.

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

 

Get preview data

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in DLP

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Database Name

 

Default Value: N/A
Example: TESTDB

String/Expression/Suggestion

None.

Specify the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings.

Schema Name*

 

Default Value: N/A
Example: SALES

String/Expression/Suggestion

None.

Specify the name of the database schema. In case it is not defined, then the suggestions for the schema name retrieves all schema names in the specified database when you click the Suggestion icon.

Target Table Name*

 

Default Value: N/A
Example: SALES_ORDERS

String/Expression/Suggestion

None.

Specify the name of the table or view in which you want to perform the MERGE INTO operation. 

Target Table Hash Distribution Column (Azure Synapse Only)

 

Default Value: N/A
Example: var_table

String/Expression

None.

Specify the Hash distribution column name for the target table in Azure Synapse, if you configure INSERT action in the Snap.

 

Target Table Alias

 

Default Value: N/A
Example: tgt

String/Expression

None.

Specify the alias name for the target table.
You can use the alias names in MERGE INTO statements, instead of using the lengthy target table names.

Input Source Alias

 

Default Value: N/A
Example: src

String/Expression

None.

Enter the alias name that you want to use for the source table.

ON Condition*

 

Default Value: N/A
Example: src.PROD_ID=tgt.PROD_ID

String/Expression

None.

The condition based on which you want to check the values in the target table. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Enable default settings for merge into operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to run the merge into operation with the default settings and assumptions (the same set of column names are used in the source and target tables in the same order) in the following scenarios.

  • For an Update operation: To update all columns of the target table with the corresponding source table column values.

    • Extra columns in the target table, if any, undergo no changes.

  • For an Insert operation: To insert all the corresponding source column values into the target table.

    • Extra columns in the target table, if any, are populated with nulls.

    • Extra columns in the source table, if any, result in failure with the appropriate error message.

Selecting this checkbox disables all the fields below except Merge-into Statements fieldset. Proceed to define your Merge-into Statements.

Enable automatic source table alias prefix

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to prefix the Input Source Alias value to all the source column names inside the expressions defined (without an alias) in the Insert and Update Expression List fieldsets below.

For example: colA+colB-colC in the Insert value field is replaced with src.colA+src.colB-src.colC when the Snap prepares the final MERGE INTO SQL statement.

Proceed to define your Merge-into Statements and the appropriate expression lists.

Enable update all columns operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to perform a WHEN MATCHED update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.

  • Selecting this checkbox disables the Update Expression List - When Matched fieldset and enables the Merge Into All Columns Operations Type field.

Proceed to define your Merge-into Statements and the appropriate expression lists.

Enable update all columns by source operation (Only for supported CDWs)

 

Default ValueNot selected
ExampleSelected

Checkbox

Target CDW is Azure Synapse, DLP, or Google BigQuery.

 

Select this checkbox to perform a WHEN NOT MATCHED BY SOURCE update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.

  • Selecting this checkbox disables the Update Expression List - When Not Matched by Source (Only for Supported CDWs) fieldset and enables the Merge Into All Columns Operations Type field.

Enable insert all columns operation

 

Default ValueNot selected
ExampleSelected

Checkbox

None.

Select this checkbox to perform a WHEN NOT MATCHED insert operation on all columns in the target table. You can choose the insert operation type (by source table column name (default) or by the order of source table columns) for this operation.

  • Selecting this checkbox disables the Insert Expression List fieldset and enables the Merge Into All Columns Operations Type field.

Merge Into All Columns Operations Type

 

Default ValueSource and target table column names are identical
ExampleSource table column order

Drop-down list

Enable update all columns operation checkbox, Enable update all columns by source operation (Only for supported CDWs) checkbox, or Enable insert all columns operation checkbox is selected.

Select one of the following scenarios associated with the selected operation type for the Merge-Into operation.

  • Source and target table column names are identical. When the source column names exactly match with the target table column names (though not in the same order).

  • Source table column order. When the order of your source column names does not match with the order of target table column names and hence the Merge Into operation shall follow the order of the source table column names while performing the INSERT and UPDATE operations defined in the Merge-into Statements fieldset.

Merge-into Statements

Use this fieldset to specify the conditions that activate the MERGE INTO operation and the additional conditions that must be met. Specify each condition in a separate row.

This fieldset contains the following fields:

  • When Clause

  • Condition

  • Action

When Clause

Default Value: N/A
ExampleWHEN MATCHED

String/Expression/Suggestion

 

Specify the matching condition based on the outcome of the ON Condition. Alternatively, select a clause from the suggestion list.

Available options are:

  • WHEN MATCHED: Applies the specified condition and action when the source data matches with the target.

  • WHEN NOT MATCHED: Applies the specified condition and action when the source data does not match with the target.

  • WHEN NOT MATCHED BY SOURCE: (Available only for Azure Synapse, DLP, and BigQuery.) Applies the specified condition and action when the target rows do not match the source rows.

Condition

Default Value: N/A
ExampleTOTAL_ORDERS > 0

String/Expression

None.

Specify the additional criteria that must be specified.
The action associated for the specified condition is not performed if the condition's criteria is not fulfilled. It can be a combination of both source and target tables, source table only, target table only, or may not contain references to any table at all.

Having this additional condition allows the Snap to identify whether the UPDATE or DELETE action must be performed (since both the actions correspond to the WHEN MATCHED clause).

You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details.

Action

 

Default ValueINSERT
ExampleDELETE

Drop-down list

None.

Choose the action to apply on the condition.

Available options are:

  • INSERT

  • UPDATE

  • DELETE

With Snowflake, Amazon Redshift, and DLP, you can specify up to three Merge Into statements with one of the above options for each WHEN clause.

With Azure Synapse and Google BigQuery, you can specify up to five Merge Into statements with one of the above options for each WHEN clause.

Insert Expression List

Use this fieldset to specify the column in which you want to insert the data and also the values that you want to insert into them. Each column must be specified in a separate row.

This fieldset contains the following fields:

  • Insert Column

  • Insert Value

Insert Column

 

Default Value: N/A
ExamplePRODUCT_ID

String/Expression/Suggestion

None.

Specify the column in which the value must be inserted. Alternatively, select a column from the suggestion list.

Insert Value

 

Default Value: N/A
Examplesrc.PRODUCT_ID

String/Expression

None.

Specify the value to be inserted for the column. This field can be an SQL expression that references the source table's columns only or none. It cannot reference the target table's columns.

You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Prevent