Databricks - Execute

In this article

Overview

You can use this Snap to run one or more Databricks SQL statements on your target Databricks Lakehouse Platform (DLP) instance. You can run the following types of queries using this Snap:

  • Data Definition Language (DDL) queries

  • Data Manipulation Language (DML) queries

  • Data Control Language (DCL) queries

This Snap works only with single queries.

The Snap runs each statement as a single atomic unit so as to allow rolling back changes when a statement fails during its execution.

Snap Type

Databricks - Execute Snap is a write-type Snap that can read, fetch, and write data and tables into a target DLP instance.

Prerequisites

  • Valid access credentials to a DLP instance with adequate access permissions to perform the action in context.

  • Valid access to the external source data in one of the following: Azure Blob Storage, ADLS Gen2, DBFS, GCP, AWS S3, or another database (JDBC-compatible).

Support for Ultra Pipelines  

Does not support Ultra Pipelines

Limitations

  • This Snap does not support multi-statement transaction rollback.

  • Each statement is auto-committed upon successful execution. In the event of a failure, the Snap can rollback only updates corresponding to the failed statement execution. All previous statements (during that Pipeline execution runtime) that ran successfully are not rolled back.

  • You cannot run Data Query Language (DQL) queries using this Snap. For example, SELECT and WITH query constructs.

Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • JSON Generator

  • Mapper

  • Copy

  • Databricks - Select

Input document is not mandatory. The Snap can fetch and apply values for parameterized queries from an upstream Snap output.

Output

Document

  • Min: 0

  • Max: 1

  • Databricks - Select

  • Databricks - Insert

A JSON document containing each SQL statement along with its execution status (or result).

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label*

 

Default ValueDatabricks - Execute
ExampleDb_MultiQuery

String

The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

 

SQL Statements*

Use this fieldset to define your SQL statements, one in each row. You can add as many SQL statements as you need.

SQL statement*

Default Value: None.
Examplecreate table employee (name String, age BigInt)

String/Expression

Specify the Databricks SQL statement you want the Snap to execute. We recommend you to add a single query in the SQL Statement field. The SQL statement must follow the SQL syntax as stipulated in DLP.

Number of Retries

Default Value: 0
Example: 3

Minimum value: 0

 

Integer

Specify the maximum number of retry attempts the Snap must make in case there is a network failure and is unable to read the target file. The request is terminated if the attempts do not result in a response.

  • If the Number of retries value is set to 0 (the default value), the retry option is disabled, and the Snap does not initiate a retry. The pipeline will not attempt to retry the operation in case of a failure—any failure encountered during the database operation will immediately result in the pipeline failing without any retry attempts to recover from the errors.

  • If the Number of retries value is greater than 0, the Snap initiates a download of the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and then attempts to download the file again from the beginning. After the download is successful, the Snap streams the data from the temporary file to the downstream pipeline. All temporary local files are deleted when they are no longer needed.

Ensure that the local drive has sufficient free disk space to store the temporary local file.

Retry Interval (Seconds)

Default value: 1
Example: 3

Minimum value: 1

 

Integer

Specify the minimum number of seconds the Snap must wait before each retry attempt.

Use Result Query

Checkbox

Select this checkbox to write the SQL statement execution result to the Snap's output view for each successful execution. The output of the Snap is enclosed in the key ResultQuery, and the value will be the actual output produced by the SQL statement. The example Pipeline below depicts the difference between the output previews when this checkbox is selected and when it is not.

This option allows you to effectively track the SQL statement's execution by clearly indicating the successful execution and the number of records affected, if any, after the execution.

Manage Queued Queries

Default ValueContinue to execute queued queries when pipeline is stopped or if it fails.
Example: Cancel queued queries when pipeline is stopped or if it fails

Dropdown list

Select either of the following options from the dropdown list to handle queued SQL queries:

  • Continue to execute queued queries when pipeline is stopped or if it fails: Continues the execution of the queued Databricks SQL queries when you stop the Pipeline.

  • Cancel queued queries when pipeline is stopped or if it fails: Cancels the execution of the queued Databricks SQL queries when you stop the Pipeline

Snap Execution

 

Default ValueExecute only
Example: Validate & Execute

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error

Reason

Resolution

Error

Reason

Resolution

Missing property value

You have not specified a value for the required field where this message appears.

Ensure that you specify valid values for all required fields.

Examples

Replacing old data in a DLP table with the latest data

Consider the scenario where the data in a DLP table becomes obsolete every few hours. We need to refresh the data in the table on a frequent basis. To do so, we can create the following Pipeline with only Databricks - Execute Snap.

Configure the Snap (Pipeline) to run two Databricks SQL statements in a specific order - Delete the existing table and create a new table with the same schema as the source file and populate the latest values into this new table. Ensure that the DLP account used with the Snap has the required permissions to perform operations you specify in your SQL statements.

The Snap upon successful validation displays the output in the preview pane as follows. This output contains the SQL statement we passed and the respective result of execution.

 

Download this Pipeline

Downloads

  File Modified

File Databricks_MultiExecute_FEP1.slp

Jul 14, 2022 by Anand Vedam