Databricks - Select
In this article
Overview
You can use this Snap to fetch data from a database by providing the table name from a Databricks Lakehouse Platform (DLP) instance. You can read and process this output data from a downstream Snap.
Snap Type
Databricks - Select Snap is a read-type Snap that reads data from a configured DLP instance.
Prerequisites
Valid access credentials to a DLP instance with adequate access permissions to perform the action in context.
Valid access to the external source data in one of the following: Azure Blob Storage, ADLS Gen2, DBFS, GCP, AWS S3, or another database (JDBC-compatible).
Support for Ultra Pipelines
Does not support Ultra Pipelines.
Limitations
Snaps in the Databricks Snap Pack do not support array, map, and struct data types in their input and output documents.
Known Issues
None.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| A JSON document containing the input values for the Snap’s configuration. |
Output | Document |
|
| The default output view contains the data fetched by running the SELECT statement on the source location. A second view can be added to show the metadata for the table as a Document. The metadata document can then be fed into the second input view of Databricks - Merge Into or Bulk Load Snaps so that the table is created in Databricks with a similar schema as the source table. |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Field Name | Field Type | Description | |
---|---|---|---|
Label*
Default Value: Databricks - Select | String | The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline. | |
Database name
Default Value: N/A | String/Expression/Suggestion | Enter your DLP database name for the SELECT statement to read the data from. | |
Table name*
Default Value: N/A | String/Expression/Suggestion | Enter your table name for the SELECT statement to read the data from. | |
Where clause
Default Value: N/A | String/Expression/Suggestion | Specify the WHERE clause (condition) for the SELECT statement to apply while retrieving the data from the DLP table. | |
Order by | You can use this fieldset to define the order in which the retrieved rows are populated in the output. Click the Add fieldset row icon ( ) and specify one column name in each row to define your sort order. | ||
Column names Default Value: None. | String/Expression/Suggestion | Click the Suggest icon ( ) and select a column name from the suggested list of column names in the table specified above. If you do not specify a sort order in this fieldset, the Snap returns data in the same order as the DLP table. | |
Limit rows
Default Value: None. | String/Expression | Specify the number of rows of data you want the Snap to retrieve from the table. If you do not specify a limit in this fieldset, the Snap returns entire data in the the DLP table during execution. See Snap Execution field below for the limits during Pipeline validation. | |
Output fields | You can use this fieldset to define the list of columns to be retrieved in the output. Click the Add fieldset row icon ( ) and specify one column name in each row. | ||
Output field
Default Value: None. | String/Expression/Suggestion | Click the Suggest icon ( ) and select a column name from the suggested list of column names in the table specified above. | |
Fetch Output Fields In Schema
Default Value: Not selected | Checkbox | Select this checkbox to retrieve schema information from the table for only the output fields specified above. If not selected, the Snap retrieves schema information from all fields available in the table. | |
Pass through
Default Value: Selected | Checkbox | Select this checkbox to include the input document as-is under the | |
Ignore empty result
Default Value: Not selected | Checkbox | Select this checkbox to prevent the Snap from writing any document to its output view when no records are retrieved for the specified Snap configuration. | |
Match data types
Default Value: Not selected | Checkbox | With at least one output field selected in the Output fields fieldset, select this checkbox to match the data types of the columns with those retrieved when no output field is specified (as in | |
Number of Retries
Default value: 0 Example: 3 Minimum value: 0
| Integer | Specifies the maximum number of retry attempts when the Snap fails to write. | |
Retry Interval (seconds)
Default value: 1 Example: 3 Minimum value: 1
| Integer | Specifies the minimum number of seconds the Snap must wait before each retry attempt. | |
Staging Mode
Default value: In Memory Example: On disk
| Dropdown list | Select this property to specify the location where the in-flight records should be stored while using the Retry option. Options available are: On disk: The records are stored on the disk. In Memory: The records are stored in the internal memory. | |
Manage Queued Queries
Default value: Continue to execute queued queries when pipeline is stopped or if it fails. Example: Cancel queued queries when pipeline is stopped or if it fails | Dropdown list | Select this property to determine whether the Snap should continue or cancel the execution of the queued Databricks SQL queries when you stop the Pipeline. | |
Snap Execution
Default Value: Validate & Execute | Dropdown list | Select one of the three modes in which the Snap executes. Available options are:
|
Troubleshooting
Error | Reason | Resolution |
---|---|---|
Missing property value | You have not specified a value for the required field where this message appears. | Ensure that you specify valid values for all required fields. |
Examples
Inserting data from a Databricks table into a Snowflake table
Consider the scenario where we want to load data available in a Databricks table into a new table in your Snowflake instance with the table schema in tact. This example demonstrates how we can use the Databricks - Select Snap to achieve this result:
In this Pipeline, we configure the Databricks - Select Snap with the appropriate account, zero input views, and open two output views—one for capturing the rows data and the other for capturing the table schema (as metadata). See the sample configuration of the Snap and its account below:
Databricks - Select Snap Settings | Databricks - Select Snap Views |
---|---|
| |
Account Settings for Databricks - Select Snap | Notes |
| We configure the Snap to fetch the table schema (output1) and the table data (rows) from the example_company_employees table in the DLP instance. The Snap uses a Databricks JDBC driver and the corresponding basic auth credentials to connect to the Databricks instance hosted on the AWS cloud. |
Upon validation, the Snap displays the table schema and a sample set of records from the specified table in its two output views. During runtime, the Snap retrieves all data from the example_company_employees table that match the Snap’s configuration (WHERE conditions, LIMIT, and so on).
Databricks - Select Snap Output View for Table data | Databricks - Select Snap Output View for Table Schema |
---|---|
Similarly, we use a Snowflake - Insert Snap from the Snowflake Snap Pack with the appropriate account and two input views to consume the two outputs coming from the Databricks - Select Snap. See the sample configuration of the Snap and its account.
Snowflake - Insert Snap Settings | Snowflake - Insert Snap Views |
---|---|
Account Settings for Snowflake - Insert Snap | Notes |
| We configure the Snap to consume the table schema (output1) and the table data (rows) from the example_company_employees table in the DLP instance. The Snap uses a Snowflake JDBC driver and the corresponding basic auth credentials to connect to the Snowflake (target) instance hosted on the AWS cloud. The Snap is configured to perform the Insert operation only during Pipeline execution and hence, it does not display any results during validation. Upon Pipeline Execution, the Snap creates a new table example_company_employees in the specified Snowflake database using the schema from the Databricks - Select Snap and populates the retrieved rows data in this new table. |