Databricks Account (Source: DBFS)

In this article

Overview

You can use this account type to connect Databricks Snaps with data sources that use a Databricks Account with a DBFS location as the source.

Prerequisites

  • A valid Databricks account.

  • Certified JDBC JAR File: databricks-jdbc-2.6.25-1.jar

Limitations and Known Issues

None.

Account Settings

 

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon ( ): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

Field Name

Field Type

Field Dependency

Description

Field Name

Field Type

Field Dependency

Description

Label*

 

Default Value: N/A
Example: STD DB Acc DeltaLake AWS ALD

String

None.

Specify a unique label for the account.

 

Account Properties*

Use this fieldset to configure the information required to establish a JDBC connection with the account.

This fieldset consists of the following fields:

  • Download JDBC Driver Automatically

  • JDBC URL

  • Use Token Based Authentication

  • Token

  • Database Name

  • Source/Target Location

  • DBFS Folder path (source for loading Databricks table)

Download JDBC Driver Automatically

 

Default Value: Not Selected

Example: Selected

Checkbox

None.

Select this checkbox to allow the Snap account to download the certified JDBC Driver for DLP. The following fields are disabled when this checkbox is selected:

  • JDBC JAR(s) and/or ZIP(s) : JDBC Driver

  • JDBC driver class

To use a JDBC Driver that of your choice, clear this checkbox, upload (to SLDB), and choose the required JAR files in the JDBC JAR(s): and/or ZIPs: JDBC Driver field.

Use of Custom JDBC JAR version

You can use a different JAR file version other than the recommended list of JAR file versions.

Spark JDBC and Databricks JDBC

If you do not select this checkbox and use an older JDBC JAR file (older than version 2.6.25), ensure that you use: 

  • The old format JDBC URL ( jdbc:spark:// ) instead of the new one ( jdbc:databricks:// )

    • For a JDBC driver prior to version 2.6.25, the JDBC URL starts with jdbc:spark://

    • For a JDBC driver version 2.6.25 or later, the JDBC URL starts with jdbc:databricks://

  • The older JDBC Driver Class com.simba.spark.jdbc.Driver instead of the new com.databricks.client.jdbc.Driver.

JDBC URL*

 

Default Value: N/A

Example: jdbc:spark://adb-2409532680880038.18.azuredatabricks.net:443/default;transportMode=http;ssl=1;httpPath=sql/protocolv1/o/2409532680880038/0326-212833-drier754;AuthMech=3;

String

None.

Enter the JDBC driver connection string that you want to use in the syntax provided below, for connecting to your DLP instance. Learn more in Microsoft's JDBC and ODBC drivers and configuration parameters.

jdbc:spark://dbc-ede87531-a2ce.cloud.databricks.com:443/default;transportMode=http;ssl=1;httpPath=
sql/protocolv1/o/6968995337014351/0521-394181-guess934;AuthMech=3;UID=token;PWD=<personal-access-token> 

Use Token Based Authentication

 

Default value: Selected

Example: Not selected

Checkbox

None.

Select this checkbox to use token-based authentication for connecting to the target database (DLP) instance. Activates the Token field.

Token*

 

Default value: N/A

Example: <Encrypted>

String

Use Token Based Authentication checkbox is selected.

Enter the token value for accessing the target database/folder path.

 

Database name*

 

Default value: N/A

Example: Default

String

None.

Enter the name of the database to use by default. This database is used if you do not specify one in the Databricks Select or Databricks Insert Snaps.

 

Source/Target Location*

 

Default value: DBFS

Example: JDBC

Dropdown list

None.

Select the source or target data warehouse into which the queries must be loaded, that is DBFS. Activates the DBFS Folder path (source for loading Databricks table) field.

DBFS Folder path (source for loading Databricks table)

 

Default value: N/A

Example: /data_folder/path

String

Source/Target Location is DBFS.

Enter the folder path for the source files to be loaded from. The path must begin with a forward slash /.

Advanced Properties

Other parameters that you want to specify to configure the account.

URL properties

Use this fieldset to define the account parameter's name and its corresponding value. Click + to add the parameters and the corresponding values. Add each URL property-value pair in a separate row. 

URL property name

 

Default Value: N/A

ExamplequeryTimeout

N/A

None

Specify the name of the parameter for the URL property.

 

URL property value

 

Default Value: N/A

Example: 0

N/A

None

Specify the value for the URL property parameter.

 

Batch size*

 

Default Value: N/A

Example3

Integer

None

Specify the number of queries that you want to execute at a time.

  • If the Batch size is one, the query is executed as-is, that is the Snap skips the batch (nonbatch execution).

  • If the Batch size is greater than one, the Snap performs the regular batch execution.

Fetch size*

 

Default Value: 100

Example: 12

Integer

None

Specify the number of rows a query must fetch for each execution.

Larger values could cause the server to run out of memory.

Min pool size*

 

Default Value: 3

Example: 0

Integer

None

Specify the minimum number of idle connections that you want the pool to maintain at a time. 

 

Max pool size*

 

Default Value: 15

Example0

Integer

None

Specify the maximum number of connections that you want the pool to maintain at a time.

 

Max life time*

 

Default Value: 60

Example50

Integer

None

Specify the maximum lifetime of a connection in the pool, in seconds:

  • Ensure that the value you enter is a few seconds shorter than any database or infrastructure-imposed connection time limit.

  • 0 (zero) indicates an infinite lifetime, subject to the Idle Timeout value.

  • An in-use connection is never retired. Connections are removed only after they are closed.

Minimum value: 0
Maximum value: No limit

Idle Timeout*

 

Default Value5

Example4

Integer

None

Specify the maximum amount of time in seconds that a connection is allowed to sit idle in the pool. 

0 (zero) indicates that idle connections are never removed from the pool.

Minimum value: 0
Maximum value: No limit

Checkout timeout*

 

Default Value10000

Example9000

Integer

None

Specify the maximum time in milliseconds you want the system to wait for a connection to become available when the pool is exhausted.

Minimum value: 0
Maximum value: No limit

Snap Pack History