ELT Merge Into
Overview
You can use this Snap to perform the MERGE INTO operation from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to merge data from:
Amazon S3 buckets in to the Amazon Redshift database.
Amazon Redshift or Google Cloud Storage in to BigQuery tables.
A DBFS folder in to a DLP table.
Refer to ELT Source-Target-Hosting Support Matrix for complete information.
The MERGE INTO operation updates, inserts, and deletes rows in/from the target table based on values in a source table. You can set match conditions in the Snap based on which the operation is carried out. The Snap creates a table and inserts the data if the target table does not exist.
After successfully validating the Pipeline with this Snap, you can use this Snap's output view to verify whether the data would be inserted correctly in the target table.
Snap Type
The ELT Merge Into Snap is a write-type Snap that writes data from the source file to matching rows in an existing table in the target CDW.
Prerequisites
Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):
Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage
Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery
Appropriate database permissions to read, insert, update, or delete records.
Appropriate database permissions to create a table.
Appropriate read permissions for the source table that you want to use in the MERGE INTO operation.
Limitations
Though Snowflake allows any number of merge (insert, update, and delete) actions in any combination, you can configure a maximum of three Merge-into Statements in this Snap. The ELT Merge Into Snap returns an error when you specify more than 3 actions, more than 2 WHEN MATCHED actions, or more than 1 WHEN NOT MATCHED action.
Known Issues
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The source table you want to compare with the target table before updating, inserting, or deleting the values in the target table. |
Output | Document |
|
| A document containing the SQL query executed on the target database and the rows data inserted in the target table (after running the Pipeline successfully). |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about SQL Expressions and Functions supported for ELT. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Upload icon ( ): Indicates that you can upload files.
Field Name | Field Type | Field Dependency | Description | |
---|---|---|---|---|
Label*
Default Value: ELT Merge Into | String | None. | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.
| |
Get preview data
Default Value: Not selected | Checkbox | None. | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following:
Rendering Complex Data Types in DLP Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type. | |
Database Name
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings. | |
Schema Name*
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the database schema. In case it is not defined, then the suggestions for the schema name retrieves all schema names in the specified database when you click the Suggestion icon. | |
Target Table Name*
Default Value: N/A | String/Expression/Suggestion | None. | Specify the name of the table or view in which you want to perform the MERGE INTO operation. | |
Target Table Hash Distribution Column (Azure Synapse Only)
Default Value: N/A | String/Expression | None. | Specify the Hash distribution column name for the target table in Azure Synapse, if you configure INSERT action in the Snap.
| |
Target Table Alias
Default Value: N/A | String/Expression | None. | Specify the alias name for the target table. | |
Input Source Alias
Default Value: N/A | String/Expression | None. | Enter the alias name that you want to use for the source table. | |
ON Condition*
Default Value: N/A | String/Expression | None. | The condition based on which you want to check the values in the target table. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details. | |
Enable default settings for merge into operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to run the merge into operation with the default settings and assumptions (the same set of column names are used in the source and target tables in the same order) in the following scenarios.
Selecting this checkbox disables all the fields below except Merge-into Statements fieldset. Proceed to define your Merge-into Statements. | |
Enable automatic source table alias prefix
Default Value: Not selected | Checkbox | None. | Select this checkbox to prefix the Input Source Alias value to all the source column names inside the expressions defined (without an alias) in the Insert and Update Expression List fieldsets below. For example: Proceed to define your Merge-into Statements and the appropriate expression lists. | |
Enable update all columns operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to perform a WHEN MATCHED update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.
Proceed to define your Merge-into Statements and the appropriate expression lists. | |
Enable update all columns by source operation (Only for supported CDWs)
Default Value: Not selected | Checkbox | Target CDW is Azure Synapse, DLP, or Google BigQuery.
| Select this checkbox to perform a WHEN NOT MATCHED BY SOURCE update operation on all columns in the target table. You can choose the update operation type (by source table column name (default) or by the order of source table columns) for this update.
| |
Enable insert all columns operation
Default Value: Not selected | Checkbox | None. | Select this checkbox to perform a WHEN NOT MATCHED insert operation on all columns in the target table. You can choose the insert operation type (by source table column name (default) or by the order of source table columns) for this operation.
| |
Merge Into All Columns Operations Type
Default Value: Source and target table column names are identical | Drop-down list | Enable update all columns operation checkbox, Enable update all columns by source operation (Only for supported CDWs) checkbox, or Enable insert all columns operation checkbox is selected. | Select one of the following scenarios associated with the selected operation type for the Merge-Into operation.
| |
Merge-into Statements | Use this fieldset to specify the conditions that activate the MERGE INTO operation and the additional conditions that must be met. Specify each condition in a separate row. This fieldset contains the following fields:
| |||
When Clause Default Value: N/A | String/Expression/Suggestion |
| Specify the matching condition based on the outcome of the ON Condition. Alternatively, select a clause from the suggestion list. Available options are:
| |
Condition Default Value: N/A | String/Expression | None. | Specify the additional criteria that must be specified. Having this additional condition allows the Snap to identify whether the UPDATE or DELETE action must be performed (since both the actions correspond to the WHEN MATCHED clause). You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Preventing SQL Injection for details. | |
Action
Default Value: INSERT | Drop-down list | None. | Choose the action to apply on the condition. Available options are:
With Snowflake, Amazon Redshift, and DLP, you can specify up to three Merge Into statements with one of the above options for each WHEN clause. With Azure Synapse and Google BigQuery, you can specify up to five Merge Into statements with one of the above options for each WHEN clause. | |
Insert Expression List | Use this fieldset to specify the column in which you want to insert the data and also the values that you want to insert into them. Each column must be specified in a separate row. This fieldset contains the following fields:
| |||
Insert Column
Default Value: N/A | String/Expression/Suggestion | None. | Specify the column in which the value must be inserted. Alternatively, select a column from the suggestion list. | |
Insert Value
Default Value: N/A | String/Expression | None. | Specify the value to be inserted for the column. This field can be an SQL expression that references the source table's columns only or none. It cannot reference the target table's columns. You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. See Prevent |