In this article
Overview
You can use this Snap to perform a bulk load operation on your DLP instance. The source of your data can be a file from a cloud storage location, an input view from an upstream Snap, or a source that can be accessed through a JDBC connection. The source data can be in a CSV, JSON, PARQUET, TEXT, or an ORC file.
Snap Type
Databricks - Bulk Load Snap is a write-type Snap that loads data into your DLP instance.
Prerequisites
Valid access credentials to a DLP instance with adequate access permissions to perform the action in context.
Valid access to the external source data in one of the following: Azure Blob Storage, ADLS Gen2, DBFS, GCP, AWS S3, or another database (JDBC-compatible).
Support for Ultra Pipelines
Does not support Ultra Pipelines.
Limitations
Snaps in the Databricks Snap Pack do not support array, map, and struct data types in their input and output documents.
Known Issues
None.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| This Snap allows two views: Another JSON document that contains the table schema (metadata) for creating the target table. |
Output | Document |
|
| A JSON document containing the bulk load request details and the result of the bulk load operation. |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Field Name | Field Type | Field Dependency | Description | |
---|---|---|---|---|
Label* Default Value: Databricks - Bulk Load | String | None. | The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline. | |
Database name Default Value: None. | String/Expression/Suggestion | None. | Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Database Name field in the account settings. | |
Table Name* Default Value: None. | String/Expression/Suggestion | None. | Enter the name of the table in which you want to perform the bulk load operation. | |
Source Type Default Value: Cloud Storage File | Dropdown list | None. | Select the type of source from which you want to load the data into your DLP instance. The available options are:
| |
Load action* Default Value: Drop and create table | Dropdown list | None. | Select the appropriate load action you want to perform on the target table for this bulk upload operation. You can:
| |
Source table name | String | Source Type is JDBC. | Enter the source table name. The default values (database) configured in the Snap’s account for JDBC Account type are considered, if not specified in this field. | |
Target Table Columns | Source Type is Cloud Storage file or JDBC and Load action is Drop and create table. | Use this fieldset to specify the target table schema for creating a new table. Specify the Column Name and Data Type for as many columns you need to load in the target table. | ||
Column Default Value: None. | String | None. | Enter the name of the column that you want to load in the target table. | |
Data Type Default Value: None. | String | None. | Enter the data type of the values in the specified column. | |
File format type Default Value: CSV | Dropdown list | Source Type is Cloud Storage file. | Select the file format of the source data file. It can be CSV, JSON, ORC, PARQUET, or TEXT. | |
File Format Option List | Source Type is Cloud Storage file. | You can use this field set to choose the file format options to associate with the bulk load operation, based on your source file format. Choose one file format option in each row. | ||
File format option Default Value: None. | String/Expression/Suggestion | Source Type is Cloud Storage file. | Select a file format option from the available options and set appropriate values to suit your bulk load needs, without affecting the syntax displayed in this field. | |
Files provider Default Value: File list | Dropdown list | Source Type is Cloud Storage file. | Declare the manner in which you are specifying the source files list - File list or pattern. Based on your selection in this field, the corresponding fields change: File list fieldset for File list and File pattern field for pattern. | |
File list | Source Type is Cloud Storage file and Files provider is File list. | You can use this field set to specify the file paths to be used for the bulk load operation. Choose one file path in each row. | ||
File Default Value: None. | String | Source Type is Cloud Storage file and Files provider is File list. | Enter the path of the file to be used for the bulk upload operation. | |
File pattern Default Value: None. | String/Expression | Source Type is Cloud Storage file and Files provider is pattern. | Enter the regex pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed in single quotes. Learn more: Examples of COPY INTO (Delta Lake on Databricks) for DLP. | |
Encryption type Default Value: None. | String | Source Type is Cloud Storage file. | Select the encryption type that you want to use for the loaded data and/or files. Server-side encryption is available only for S3 accounts. | |
KMS key Default Value: None. | String/Expression | Source Type is Cloud Storage file. | Enter the KMS key to use to encrypt the files. In case that your source files are in S3, see Loading encrypted files from Amazon S3 for more detail. | |
Snap Execution Default Value: Execute only | Dropdown list | None. | Select one of the three modes in which the Snap executes. Available options are:
|
Troubleshooting
Error | Reason | Resolution |
---|---|---|
Missing property value | You have not specified a value for the required field where this message appears. | Ensure that you specify valid values for all required fields. |
Examples
Excluding Fields from the Input Data Stream
We can exclude the unrequired fields from the input data stream by omitting them in the Input schema field set. This example demonstrates how we can use the <Snap Name> to achieve this result:
<screenshot of Pipeline/Snap and description>
Download this Pipeline.
Downloads
Download and import the Pipeline into SnapLogic.
Configure Snap accounts as applicable.
Provide Pipeline parameters as applicable.
Snap Pack History
Release | Snap Pack Version | Date | Type | Updates |
---|---|---|---|---|
May 2024 | 437patches26400 |
| Latest | Fixed an invalid session handle issue with the Databricks Snap Pack that intermittently triggered an error message when the Snaps failed to connect with Databricks to execute the SQL statement. |
May 2024 | main26341 |
| Stable | Updated the Delete Condition (Truncates a Table if empty) field in the Databricks - Delete Snap to Delete condition (deletes all records from a table if left blank) to indicate that all entries will be deleted from the table when this field is blank, but no truncate operation is performed. |
February 2024 | main25112 |
| Stable | Updated and certified against the current SnapLogic Platform release. |
November 2023 | main23721 |
| Stable | Updated and certified against the current SnapLogic Platform release. |
August 2023 | main22460 |
| Stable | Updated and certified against the current SnapLogic Platform release. |
May 2023 | 433patches21630 |
| Latest | Enhanced the performance of the Databricks - Insert Snap to improve the amount of time it takes for validation. |
May 2023 | main21015 |
| Stable | Upgraded with the latest SnapLogic Platform release. |
February 2023 | main19844 |
| Stable | Upgraded with the latest SnapLogic Platform release. |
November 2022 | main18944 |
| Stable | The Databricks - Insert Snap now creates the target table only from the table metadata of the second input view when the following conditions are met:
|
September 2022 | 430patches18305 |
| Latest |
The following fields are added to each Databricks Snap as part of this enhancement:
|
September 2022 | 430patches17796 |
| Latest | The Manage Queued Queries property in the Databricks Snap Pack enables you to decide whether a given Snap should continue or cancel executing the queued Databricks SQL queries. |
August 2022 | main17386 |
| Stable | Upgraded with the latest SnapLogic Platform release. |
4.29.2.0 | 42920rc17045 |
| Latest | A new Snap Pack for Databricks Lakehouse Platform (Databricks or DLP) introduces the following Snaps:
|