ELT Merge Into
Overview
You can use this Snap to perform the MERGE INTO operation from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to merge data from:
Amazon S3 buckets in to the Amazon Redshift database.
Amazon Redshift or Google Cloud Storage in to BigQuery tables.
A DBFS folder in to a DLP table.
Refer to ELT Source-Target-Hosting Support Matrix for complete information.
The MERGE INTO operation updates, inserts, and deletes rows in/from the target table based on values in a source table. You can set match conditions in the Snap based on which the operation is carried out. The Snap creates a table and inserts the data if the target table does not exist.
After successfully validating the Pipeline with this Snap, you can use this Snap's output view to verify whether the data would be inserted correctly in the target table.
Snap Type
The ELT Merge Into Snap is a write-type Snap that writes data from the source file to matching rows in an existing table in the target CDW.
Prerequisites
Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):
Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage
Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery
Appropriate database permissions to read, insert, update, or delete records.
Appropriate database permissions to create a table.
Appropriate read permissions for the source table that you want to use in the MERGE INTO operation.
Limitations
Though Snowflake allows any number of merge (insert, update, and delete) actions in any combination, you can configure a maximum of three Merge-into Statements in this Snap. The ELT Merge Into Snap returns an error when you specify more than 3 actions, more than 2 WHEN MATCHED actions, or more than 1 WHEN NOT MATCHED action.
Known Issues
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The source table you want to compare with the target table before updating, inserting, or deleting the values in the target table. |
Output | Document |
|
| A document containing the SQL query executed on the target database and the rows data inserted in the target table (after running the Pipeline successfully). |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about SQL Expressions and Functions supported for ELT. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Upload icon ( ): Indicates that you can upload files.
Field Name | Field Type | Field Dependency | Description | |
---|---|---|---|---|
Label*
Default Value: ELT Merge Into | String | None. | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.
| |
Get preview data
Default Value: Not selected | Checkbox | None. | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following:
Rendering Complex Data Types in DLP Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type. | |
Database Name
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings. | |
Schema Name*
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the database schema. In case it is not defined, then the suggestions for the schema name retrieves all schema names in the specified database when you click the Suggestion icon. | |
Target Table Name*
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the table or view in which you want to perform the MERGE INTO operation. | |
Target Table Hash Distribution Column (Azure Synapse Only)
Default Value: N/A | String/Expression | None. | Specify the Hash distribution column name for the target table in Azure Synapse, if you configure INSERT action in the Snap.
| |
Target Table Alias
Default Value: N/A | String/Expression | None. | Specify the alias name for the target table. | |
Input Source Alias
Default Value: N/A | String/Expression | None. | Enter the alias name that you want to use for the source table. | |
ON Condition*
Default Value: N/A | String/Expression | None. | The condition based on which you want to check the values in the target table. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details. | |
Enable default settings for merge into operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to run the merge into operation with the default settings and assumptions (the same set of column names are used in the source and target tables in the same order) in the following scenarios.
Selecting this checkbox disables all the fields below except Merge-into Statements fieldset. Proceed to define your Merge-into Statements. | |
Enable automatic source table alias prefix
Default Value: Not selected | Checkbox | None. | Select this checkbox to prefix the Input Source Alias value to all the source column names inside the expressions defined (without an alias) in the Insert and Update Expression List fieldsets below. For example: Proceed to define your Merge-into Statements and the appropriate expression lists. | |
Enable update all columns operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to perform a WHEN MATCHED update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.
Proceed to define your Merge-into Statements and the appropriate expression lists. | |
Enable update all columns by source operation (Only for supported CDWs)
Default Value: Not selected | Checkbox | Target CDW is Azure Synapse, DLP, or Google BigQuery.
| Select this checkbox to perform a WHEN NOT MATCHED BY SOURCE update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.
| |
Enable insert all columns operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to perform a WHEN NOT MATCHED insert operation on all columns in the target table. You can choose the insert operation type (by source table column name (default) or by the order of source table columns) for this operation.
| |
Merge Into All Columns Operations Type
Default Value: Source and target table column names are identical | Drop-down list | Enable update all columns operation checkbox, Enable update all columns by source operation (Only for supported CDWs) checkbox, or Enable insert all columns operation checkbox is selected. | Select one of the following scenarios associated with the selected operation type for the Merge-Into operation.
| |
Merge-into Statements | Use this fieldset to specify the conditions that activate the MERGE INTO operation and the additional conditions that must be met. Specify each condition in a separate row. This fieldset contains the following fields:
| |||
When Clause Default Value: N/A | String/Expression/Suggestion |
| Specify the matching condition based on the outcome of the ON Condition. Alternatively, select a clause from the suggestion list. Available options are:
| |
Condition Default Value: N/A | String/Expression | None. | Specify the additional criteria that must be specified. Having this additional condition allows the Snap to identify whether the UPDATE or DELETE action must be performed (since both the actions correspond to the WHEN MATCHED clause). You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details. | |
Action
Default Value: INSERT | Drop-down list | None. | Choose the action to apply on the condition. Available options are:
With Snowflake, Amazon Redshift, and DLP, you can specify up to three Merge Into statements with one of the above options for each WHEN clause. With Azure Synapse and Google BigQuery, you can specify up to five Merge Into statements with one of the above options for each WHEN clause. | |
Insert Expression List | Use this fieldset to specify the column in which you want to insert the data and also the values that you want to insert into them. Each column must be specified in a separate row. This fieldset contains the following fields:
| |||
Insert Column
Default Value: N/A | String/Expression/Suggestion | None. | Specify the column in which the value must be inserted. Alternatively, select a column from the suggestion list. | |
Insert Value
Default Value: N/A | String/Expression | None. | Specify the value to be inserted for the column. This field can be an SQL expression that references the source table's columns only or none. It cannot reference the target table's columns. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details. | |
Update Expression List - When Matched | Use this fieldset to add expressions whose values must be updated when the the Merge Into condition matches the source data. Specify each column to be updated in a separate row. This fieldset contains the following fields:
| |||
Update Column
Default Value: N/A | String/Expression/Suggestion | None. | Specify the column in which the value must be updated. Alternatively, select a column name from the suggestion list.
| |
Update Value
Default Value: N/A | String/Expression | None. | Specify the new value to be inserted when the Merge Into condition matches the source. | |
Update Expression List - When Not Matched By Source (Only for Supported CDWs) | Applicable only for Azure Synapse, DLP, and BigQuery, and is used only when you choose the WHEN NOT MATCHED BY SOURCE as the When clause. This fieldset contains the following fields:
| |||
Update Column
Default Value: N/A | String/Expression/Suggestion | None. | Specify the column name in which the value must be updated or deleted as specified in the conditions. Alternatively, select a column from the suggestion list. | |
Update Value
Default Value: N/A | String/Expression | None. | Specify the new value to be inserted when the Merge Into condition matches the source. |
Different Flavors of Merge Into Operation
Merge Into Operation | Snap Settings needed (in addition to the mandatory fields) |
---|---|
Default merge into operation (Choose this operation to update and insert all the differential data from the source table to the target table) |
|
Use expressions based on source column names to specify the Insert values for the Insert Expression List |
|
Update all columns but insert values only in specific columns |
|
Update all columns by source (Target CDW is Azure Synapse or BigQuery) but insert values only in specific columns |
|
Insert all columns but update values only in specific columns |
|
To update the target table without specifying any column/field-wise insert or update expressions. |
|
Preventing SQL Injection
You can pass Pipeline parameters as values in an SQL expression; however, if you do not properly phrase the expression it can lead to the parameter's name being bound as a value in the database. This potentially incorrect information being inserted into the database is known as SQL injection. It is thus necessary to take precautions when including Pipeline parameters in your SQL expression to prevent SQL injection. Based upon the intended use of the Pipeline parameter, use one or both of the following methods to prevent accidental SQL injection:
Method 1: Simple Substitutions
You can reference the Pipeline parameter directly with a JSON-path without enabling expressions.
For example, if you want to use the Pipeline parameter, name, which contains the value of a column in the ON Condition, Condition, Insert Value, or Update Value fields:
ON Condition or Condition field:
colname = _name
Insert Value or Update Value field:
_name
Method 2: Dynamic Substitutions
You must enable expressions when using Pipeline parameters for dynamic substitutions. Format the SQL expression, except the Pipeline parameter's reference, as a string.
For example, if you want to use the Pipeline parameter, name, which contains the value of a column in the ON Condition, Condition, Insert Value, or Update Value fields:
ON Condition or Condition field:
_columnname + “= _name”
Insert Value or Update Value field:
“_name”
The Snap evaluates the expression and also carries out path substitutions.
Here is how it works
The Snap pre-processes the query to extract any JSON-Paths and converts them to bound parameters. For example, consider the following query:
_columnname + “= _name”
The Snap converts this query into the following before turning it into a prepared statement for the database:
The Snap evaluates the JSON-Path to get the value to bind the Pipeline parameter in the prepared statement.
Pipeline Execution Statistics
As a Pipeline executes, the Snap shows the following statistics updating periodically. You can monitor the progress of the Pipeline as each Snap performs executes.
Records Added
Records Updated
Records Deleted
You can view more information when clicking the Download Query Details link.
Troubleshooting
Example
Retrieving and Transforming Records and Merging Them Into A Table
In this example, we want to retrieve records from two tables. First, we transform the retrieved records using conditions and use a JOIN clause to join the queries from the two tables. We then perform transformations on the records and apply aggregate function. Finally, we merge the records by applying conditions and actions. This example demonstrates how we can accomplish this task.
First, we use two ELT Select Snaps to build SELECT queries to retrieve all records from the tables (ORDERS_34R_2KB and CUSTOMER_50R_3KB). The configuration and query output is as shown below:
ELT Select Snaps | Output |
---|---|
The tables have several columns. But, we want to retrieve only the CUST_CODE and ORD_AMOUNT columns. Therefore, we configure the ELT Transform Snap with the following settings, which builds the query as shown below:
ELT Transform | Output |
---|---|
The SELECT queries in the ELT Select and ELT Transform Snaps form the inputs for the ELT Join Snap. Now, we apply an inner join based on matching values of the CUST_CODE column in the tables. Hence, we configure the ELT Join Snap and validate. Upon validation, the ELT Join Snap combines both the incoming SELECT queries and adds the JOIN clause. The output preview is as shown below:
ELT Join | Output |
---|---|
We use the ELT Transform Snap to retrieve only the CUST_NAME and ORD_AMOUNT columns after joining the queries. We configure the Snap with the following setting which builds the query as shown below:
ELT Transform Snap | Output |
---|---|
Subsequently, we use an ELT Aggregate Snap to apply an aggregate function on the values in the ORD_AMOUNT column. Here, we want to calculate the SUM for the ORD_AMOUNT column.
The Snap builds the query as shown below:
Finally, we use the ELT Merge Into Snap to merge the records into the Target table MERGE_INTO_OUT_01 based on the ON Condition, apply Merge-into statements, update actions, and update expressions.
The ON condition checks for the customer's name - if the customer's name matches, it applies the Merge-into statements and applies conditions as configured below. In this case, if the customer's name matches and if the condition (SUM_COLUMN<5000) is met, then the Snap updates the record with the customer's name. If the customer's name does not match and if the condition (SrcAlias.SUM_COLUMN<5000) is met, then it inserts the customer's name into the record.
Upon successful execution, the Pipeline adds and updates customer records in the Target database table in the configured database.