In this article

1 Overview
- 1.1 Snap Type
- 1.2 Prerequisites
- 1.3 Support for Ultra Pipelines
- 1.4 Limitations
- 1.5 Known Issues
2 Snap Views
3 Snap Settings
4 Troubleshooting
5 Examples
- 5.1 Unloading data from a table in DLP to a CSV file
- 5.2 Downloads
6 Snap Pack History

Overview

You can use this Snap to unload data from the tables in your Databricks Lakehouse Platform (DLP) instance to a file in the cloud location. You can save the file in CSV, JSON, PARQUET, ORC, or AVRO format and as multiple partitions if needed. Ensure that you configure a valid target location in the Snap’s account for saving the file in the preferred format.

Snap Type

Databricks - Unload Snap is a write-type Snap that writes data from a DLP table to a preferred file format.

Prerequisites

Valid access credentials to a DLP instance with adequate access permissions to perform the action in context.
Valid access to the external source data in one of the following: Azure Blob Storage, ADLS Gen2, DBFS, GCP, AWS S3, or another database (JDBC-compatible).

Support for Ultra Pipelines

Does not support Ultra Pipelines.

Limitations

Snaps in the Databricks Snap Pack do not support array, map, and struct data types in their input and output documents.

Known Issues

None.

Snap Views

Type	Format	Number of Views	Examples of Upstream and Downstream Snaps	Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 0
Max: 1

Mapper
Copy
JSON Generator

A JSON document containing parameterized inputs for the Snap’s configuration, if needed.

Output

Document

Min: 0
Max: 1

JSON Parser
CSV Parser
Mapper
AVRO Parser

A JSON document containing the unload request details and the result of the unload operation.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.

Field Name		Field Type	Field Dependency	Description

Field Name		Field Type	Field Dependency	Description
Label* Default Value: Databricks - Unload Example: ELT RS Account		String	None.	The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.
Use unity catalog Default state: Deselected		Checkbox	None	Select this checkbox to use the Unity catalog to access data from the catalog.
Catalog name Default value: hive_metastore Example: xyzcatalog		String/Expression/Suggestion	Appears when you select Use unity catalog.	Specify the name of the catalog for using the unity catalog.
Enable SQL query editor Default Value: Not selected Example: Selected		Checkbox	None.	Select this checkbox to write the SQL statement instead of specifying the Database name and Table name.
SQL Query Editor* Default Value: None Example: SELECT from cust_table1*		String/Expression	Appears when you Enable SQL query editor checkbox is selected.	Specify the SQL statement, usually the SELECT statement, to indicate the source data for the unload operation. Avoid specifying multiple SQL statements in this query editor field.
Database name Default Value: None. Example: customer_data		String/Expression/Suggestion	Appears when you Enable SQL query editor checkbox is not selected.	Specify the name of the source table’s database in plain text, as an expression, or using the suggestions list for the unload operation.
Table name* Default Value: None. Example: customer_data		String/Expression/Suggestion	Appears when you Enable SQL query editor checkbox is not selected.	Specify the name of the source table in plain text, as an expression, or using the suggestions list. The unload operation is performed on this table when your run the Pipeline.
File format* Default Value: ORC Example: AVRO		Dropdown list	None.	Select the file format for the file to which the source table data must be unloaded.
File Format Option List	You can use this fieldset to define the list of file format options to apply while unloading the data. Specify one file format option in each row. The list of file format options available vary based on the File format selected.
File Format Option List	File format option Default Value: None. Example: 'nullValue' = ''	String/Expression/Suggestion	None.	Enter the file format option with the right syntax or choose from the suggested list of file format options that is based on the file format selected. Ensure to edit the option without changing the expected syntax to suit your unloading requirement.
Overwrite File Default Value: Not selected Example: Selected		Checkbox	None.	Select this checkbox to overwrite the file if it exists in the specified target path.
Target Path Default Value: None. Example: /databricks/unloads/csv_files		String/Expression	None.	Enter the relative path for saving the target file that contains the unloaded data. Ensure that you have configured the Snap’s account with the details required to access this target location.
Repartition Default Value: None. Example: 4		String/Expression	None.	Specify the number of partitions (positive integers only) that you need to store the unloaded data file in the target location.
Manage Queued Queries Default value: Continue to execute queued queries when pipeline is stopped or if it fails. Example: Cancel queued queries when pipeline is stopped or if it fails		Dropdown list	None.	Select this property to determine whether the Snap should continue or cancel the execution of the queued Databricks SQL queries when you stop the Pipeline. If you select Cancel queued queries when pipeline is stopped or if it fails, then the read queries under execution are cancelled, whereas the write type of queries under execution are not cancelled. Databricks internally determines which queries are safe to be cancelled and cancels those queries. Due to an issue with DLP, aborting an ELT Pipeline validation (with preview data enabled) causes only those SQL statements that retrieve data using bind parameters to get aborted while all other static statements (that use values instead of bind parameters) persist. For example, `select * from a_table where id = 10` will not be aborted while `select * from test where id = ?` gets aborted. To avoid this issue, ensure that you always configure your Snap settings to use bind parameters inside its SQL queries.
Snap Execution Default Value: Validate & Execute Example: Execute only		Dropdown list	N/A	Select one of the three modes in which the Snap executes. Available options are: Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime. Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data. Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error	Reason	Resolution

Error	Reason	Resolution
Missing property value	You have not specified a value for the required field where this message appears.	Ensure that you specify valid values for all required fields.

Examples

Unloading data from a table in DLP to a CSV file

Consider the scenario where we want to unload the employee data for an organization from a DLP table to a CSV file. Let us see how we can achieve this using the Databricks - Unload Snap.

Here is the configuration for the Snap and its account:

Snap Account Configuration	Snap Configuration

Snap Account Configuration	Snap Configuration

For the account, we configure the Databricks Account to connect to our DLP instance using the basic user credentials.
- As this is an unload operation, we choose the Target Location for saving the output CSV file as an AWS S3 location.
- We use the Location credentials authorization type (Access key ID and Secret Key) to access the target folder in this S3 location.
- We proceed with the default values for the fields in Advanced properties fieldset.
We specify the Database name and Table name (source data) for performing the unload operation. Alternatively, we can use the SQL query editor to specify these details in the form of an SQL statement.
The File Format is CSV as we want a CSV file at the end of the unload operation.
- We specify a File Format Option ‘header’ = ‘true’ to include the header (column names from the table) in the output file.
While specifying the target AWS S3 location path to store the CSV file, we also choose to:
- Not partition the data (all data in one CSV file) but
- Overwrite any existing files in that folder

Upon running this Pipeline, we can find the CSV in the specified target path without any partitions. The Snap output preview, if configured, displays the details of the unload request and the status/result of the operation (Finished/Failed).

Download this Pipeline.

Downloads

Download and import the Pipeline into SnapLogic.
Configure Snap accounts as applicable.
Provide Pipeline parameters as applicable.

	File	Modified
Labels No labels Preview View	File Databricks_FEP1_Unload.slp	Jul 19, 2022 by Anand Vedam

SnapLogic Documentation

Databricks - Unload