Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • Mapper

  • Copy

  • Databricks - Bulk Load

A JSON document containing the input values for the Snap’s configuration.

Output

Document

  • Min: 1

  • Max: 2

  • Databricks - Insert

  • Mapper

  • Router

The default output view contains the data fetched by running the SELECT statement on the source location.

A second view can be added to show the metadata for the table as a Document. The metadata document can then be fed into the second input view of Databricks - Merge Into or Bulk Load Snaps so that the table is created in Databricks with a similar schema as the source table.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

Info
  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon ((blue star)): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ((blue star) ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( (blue star) ): Indicates that you can add fields in the fieldset.

  • Remove icon ( (blue star)): Indicates that you can remove fields from the fieldset.

Field Name

Field Type

Description

Label*

Default ValueDatabricks - Select
ExampleDLP Select

String

The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

Database name

Default Value: N/A
Example: SalesDB

String/Expression/Suggestion

Enter your DLP database name for the SELECT statement to read the data from.

Table name*

Default Value: N/A
Example: leads_to_sales_conversion

String/Expression/Suggestion

Enter your table name for the SELECT statement to read the data from.

Where clause

Default Value: N/A
Example: LEAD_CITY=Delware

String/Expression/Suggestion

Specify the WHERE clause (condition) for the SELECT statement to apply while retrieving the data from the DLP table.

Order by

You can use this fieldset to define the order in which the retrieved rows are populated in the output. Click the Add fieldset row icon ( (blue star) ) and specify one column name in each row to define your sort order.

Column names

Default Value: None.
ExampleLEAD_REGION

String/Expression/Suggestion

Click the Suggest icon ( (blue star) ) and select a column name from the suggested list of column names in the table specified above.

Info

If you do not specify a sort order in this fieldset, the Snap returns data in the same order as the DLP table.

Limit rows

Default Value: None.
Example100

String/Expression

Specify the number of rows of data you want the Snap to retrieve from the table.

Info

If you do not specify a limit in this fieldset, the Snap returns entire data in the the DLP table during execution. See Snap Execution field below for the limits during Pipeline validation.

Output fields

You can use this fieldset to define the list of columns to be retrieved in the output. Click the Add fieldset row icon ( (blue star) ) and specify one column name in each row.

Output field

Default Value: None.
ExampleLEAD_EXP_VALUE

String/Expression/Suggestion

Click the Suggest icon ( (blue star) ) and select a column name from the suggested list of column names in the table specified above.

Info

If you do not specify any output fields, the Snap returns data from all the columns in the DLP table.

Fetch Output Fields In Schema

Default ValueNot selected
ExampleSelected

Checkbox

Select this checkbox to retrieve schema information from the table for only the output fields specified above. If not selected, the Snap retrieves schema information from all fields available in the table.

Pass through

Default ValueSelected
ExampleNot selected

Checkbox

Select this checkbox to include the input document as-is under the original key of this Snap's output.

Ignore empty result

Default ValueNot selected
ExampleSelected

Checkbox

Select this checkbox to prevent the Snap from writing any document to its output view when no records are retrieved for the specified Snap configuration.

Info

When no records are retrieved and the Pass through checkbox is selected, the Snap output contains the input document in the original key.

Match data types

Default ValueNot selected
ExampleSelected

Checkbox

With at least one output field selected in the Output fields fieldset, select this checkbox to match the data types of the columns with those retrieved when no output field is specified (as in SELECT * FROM... ). The output preview will be in the same format as the one when the statement SELECT * FROM <specified_table> is run and all the contents of the table are displayed.

Number of Retries

Default value: 0

Example: 3

Minimum value: 0

Integer

Specifies the maximum number of retry attempts when the Snap fails to write.

If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.

Ensure that the local drive has sufficient free disk space to store the temporary local file.

Retry Interval (seconds)

Default value: 1

Example: 3

Minimum value: 1

Integer

Specifies the minimum number of seconds the Snap must wait before each retry attempt.

Staging Mode

Default value: In Memory

Example: On disk

Dropdown list

Select this property to specify the location where the in-flight records should be stored while using the Retry option. Options available are:

On disk: The records are stored on the disk.

In Memory: The records are stored in the internal memory.

Manage Queued Queries

Default valueContinue to execute queued queries when pipeline is stopped or if it fails.

Example: Cancel queued queries when pipeline is stopped or if it fails

Dropdown list

Select this property to determine whether the Snap should continue or cancel the execution of the queued Databricks SQL queries when you stop the Pipeline.

If you select Cancel queued queries when pipeline is stopped or if it fails, then the read queries under execution are cancelled, whereas the write type of queries under execution are not cancelled. Databricks internally determines which queries are safe to be cancelled and cancels those queries.

Due to an issue with DLP, aborting an ELT Pipeline validation (with preview data enabled) causes only those SQL statements that retrieve data using bind parameters to get aborted while all other static statements (that use values instead of bind parameters) persist.

  • For example, select * from a_table where id = 10 will not be aborted while select * from test where id = ? gets aborted.

To avoid this issue, ensure that you always configure your Snap settings to use bind parameters inside its SQL queries.

Snap Execution

Default ValueValidate & Execute
Example: Validate & Execute

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error

Reason

Resolution

Missing property value

You have not specified a value for the required field where this message appears.

Ensure that you specify valid values for all required fields.

Examples

Inserting data from a Databricks table into a Snowflake table

...

In this Pipeline, we configure the Databricks - Select Snap with the appropriate account, zero input views, and open two output views—one for capturing the rows data and the other for capturing the table schema (as metadata). See the sample configuration of the Snap and its account below:

Databricks - Select Snap Settings

Databricks - Select Snap Views

Image Modified

Image Modified

Account Settings for Databricks - Select Snap

Notes

Image Removed
Databricks_FEP1_DbSelect_SfInsert_Snap1_Account.pngImage Added

We configure the Snap to fetch the table schema (output1) and the table data (rows) from the example_company_employees table in the DLP instance.

The Snap uses a Databricks JDBC driver and the corresponding basic auth credentials to connect to the Databricks instance hosted on the AWS cloud.

Upon validation, the Snap displays the table schema and a sample set of records from the specified table in its two output views. During runtime, the Snap retrieves all data from the example_company_employees table that match the Snap’s configuration (WHERE conditions, LIMIT, and so on).

Databricks - Select Snap Output View for Table data

Databricks - Select Snap Output View for Table Schema

Image ModifiedImage Modified

Similarly, we use a Snowflake - Insert Snap from the Snowflake Snap Pack with the appropriate account and two input views to consume the two outputs coming from the Databricks - Select Snap. See the sample configuration of the Snap and its account.

Snowflake - Insert Snap Settings

Snowflake - Insert Snap Views

Image Removed
Image AddedImage Modified

Account Settings for Snowflake - Insert Snap

Notes

Image Removed
Databricks_FEP1_DbSelect_SfInsert_Snap2_Account.pngImage Added

We configure the Snap to consume the table schema (output1) and the table data (rows) from the example_company_employees table in the DLP instance.

The Snap uses a Snowflake JDBC driver and the corresponding basic auth credentials to connect to the Snowflake (target) instance hosted on the AWS cloud.

The Snap is configured to perform the Insert operation only during Pipeline execution and hence, it does not display any results during validation. Upon Pipeline Execution, the Snap creates a new table example_company_employees in the specified Snowflake database using the schema from the Databricks - Select Snap and populates the retrieved rows data in this new table.

Download this Pipeline

Downloads

...