...
COPY INTO - Enables loading data from staged files to an existing table.
CREATE TABLE [USING] - Enables loading data from some external sources like JDBC.
CREATE TABLE - Creates table in our case temporary table.
MERGE INTO - Inserts new rows, updates existing rows and delete by condition rows.
...
Snap Type
Databricks - Merge Into Snap is a write-type Snap that inserts and updates data in a DLP instance.
...
Field Name | Field Type | Field Dependency | Description | |||
---|---|---|---|---|---|---|
Label* Default Value: Databricks - Merge Into | String | None. | The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline. | |||
Database name Default Value: None. | String/Expression/Suggestion | None. | Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Database Name field in the account settings. | |||
Table Name* Default Value: None. | String/Expression/Suggestion | None. | Enter the name of the table in which you want to perform the MERGE INTO operation. | |||
Target Table Alias* Default Value: None. | String | None. | Enter an alias name for the target table to use in the MERGE INTO operation. | |||
Input Source Alias* Default Value: None. | String | None. | Enter an alias name for the source table/data to use in the MERGE INTO operation. | |||
ON Condition* Default Value: None. | String/Expression | None. | Specify the condition on which the Snap should update the target table with the data from the source table/files. | |||
Merge-into Statements | You can use this fieldset to specify the conditions that activate the MERGE INTO operation and the additional conditions that must be met. Specify each condition in a separate row. This field set contains the following fields:
| |||||
When Clause Default Value: None. | String/Expression/Suggestion | None. | Specify the matching condition based on the outcome of the ON Condition. Alternatively, select a clause from the suggestion list. Available options are:
DLP supports the following MERGE INTO operations:
| |||
Condition Default Value: None. | String | None. | Specify the additional criteria if needed. The action associated for the specified condition is not performed if the condition's criteria is not fulfilled. It can be a combination of both source and target tables, source table only, target table only, or may not contain references to any table at all. Having this additional condition allows the Snap to identify whether the UPDATE or DELETE action must be performed (since both the actions correspond to the WHEN MATCHED clause). You can also use Pipeline parameters in this field to bind values. However, you must be careful to avoid SQL injection. | |||
Action Default Value: INSERT | Dropdown list | None. | Choose the action to apply on the condition. Available options are:
| |||
Source Type Default Value: Cloud Storage File | Dropdown list | None. | Select the type of source from which you want to update the data in your DLP instance. The available options are:
| |||
Source table name | String | Source Type is JDBC. | Enter the source table name. The default values (database) configured in the Snap’s account for JDBC Account type are considered, if not specified in this field. | |||
File format type Default Value: CSV | Dropdown list | Source Type is Cloud Storage file. | Select the file format of the source data file. It can be CSV, JSON, ORC, PARQUET, or TEXT. | |||
File Format Option List | Source Type is Cloud Storage file. | You can use this field set to choose the file format options to associate with the MERGE INTO operation, based on your source file format. Choose one file format option in each row. | ||||
File format option Default Value: None. | String/Expression/Suggestion | Source Type is Cloud Storage file. | Select a file format option from the available options and set appropriate values to suit your MERGE INTO needs, without affecting the syntax displayed in this field. | |||
Files provider Default Value: File list | Dropdown list | Source Type is Cloud Storage file. | Declare the manner in which you are specifying the source files list - File list or pattern. Based on your selection in this field, the corresponding fields change: File list fieldset for File list and File pattern field for pattern. | |||
File list | Source Type is Cloud Storage file and Files provider is File list. | You can use this field set to specify the file paths to be used for the MERGE INTO operation. Choose one file path in each row. | ||||
File Default Value: None. | String | Source Type is Cloud Storage file and Files provider is File list. | Enter the path of the file to be used for the MERGE INTO operation. | |||
File pattern Default Value: None. | String/Expression | Source Type is Cloud Storage file and Files provider is pattern. | Enter the regex pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed in single quotes. Learn more: Examples of COPY INTO (Delta Lake on Databricks) for DLP. | |||
Encryption type Default Value: None. | String | Source Type is Cloud Storage file. | Select the encryption type to use for decrypting the source data and/or files staged in the S3 buckets.
| |||
KMS key Default Value: None. | String/Expression | Source Type is Cloud Storage file and Encryption type is Server-Side KMS Encryption. | Enter the AWS Key Management Service (KMS) ID or ARN to use to decrypt the encrypted files from the S3 location. In case that your source files are in S3, see Loading encrypted files from Amazon S3 for more detail. | |||
Number of Retries Example: 3 Minimum value: 0 Default value: 0 | Integer | Source Type is Input View | Specifies the maximum number of retry attempts when the Snap fails to write. | |||
Retry Interval (seconds) Example: 3 Minimum value: 1 Default value: 1 | Integer | Source Type is Input View | Specifies the minimum number of seconds the Snap must wait before each retry attempt. | |||
Manage Queued Queries Default value: Continue to execute queued queries when pipeline is stopped or if it fails. Example: Cancel queued queries when pipeline is stopped or if it fails | Dropdown list | None. | Select this property to determine whether the Snap should continue or cancel the execution of the queued Databricks SQL queries when you stop the Pipeline. If you select Cancel queued queries when pipeline is stopped or if it fails, then the read queries under execution are cancelled, whereas the write type of queries under execution are not cancelled. Databricks internally determines which queries are safe to be cancelled and cancels those queries. Due to an issue with DLP, aborting an ELT Pipeline validation (with preview data enabled) causes only those SQL statements that retrieve data using bind parameters to get aborted while all other static statements (that use values instead of bind parameters) persist.
To avoid this issue, ensure that you always configure your Snap settings to use bind parameters inside its SQL queries. | |||
Snap Execution Default Value: Execute only | Dropdown list | None. | Select one of the three modes in which the Snap executes. Available options are:
|
...