Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Field Name

Field Type

Field Dependency

Description

Label*

Default ValueDatabricks - Bulk Load
ExampleDb_BulkLoad_FromS3

String

None.

The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

Database name

Default Value: None.
Example: cust_db

String/Expression/Suggestion

None.

Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Database Name field in the account settings.

Table Name*

Default Value: None.
Example: cust_records

String/Expression/Suggestion

None.

Enter the name of the table in which you want to perform the bulk load operation. 

Source Type

Default ValueCloud Storage File
Example: Input View

Dropdown list

None.

Select the type of source from which you want to load the data into your DLP instance. The available options are:

  • Cloud Storage File. A file from a cloud location like AWS S3, Azure, or GCS. You can configure a series of options for the bulk load operation as described in this document.

  • Input View. A JSON file coming from the preceding Snap’s output. You need to specify only the Load action.

  • JDBC. A table in another database that can be connected to using a JDBC connector. You can specify the Source table name to load the data from or the Target Table Columns to replace the existing target table with a new one.

Load action*

Default ValueDrop and create table
Example: Append rows to existing table

Dropdown list

None.

Select the appropriate load action you want to perform on the target table for this bulk upload operation. You can:

  • Drop and create a table. To remove existing table in the specified database and create a new table with the schema defined below in the Snap, from an Input View, or a JDBC-connected database table.

  • Append rows to existing table. To insert new rows of data to an existing target table.

Source table name

String

Source Type is JDBC.

Enter the source table name. The default values (database) configured in the Snap’s account for JDBC Account type are considered, if not specified in this field.

Target Table Columns

Source Type is Cloud Storage file or JDBC and Load action is Drop and create table.

Use this fieldset to specify the target table schema for creating a new table. Specify the Column Name and Data Type for as many columns you need to load in the target table.

Column

Default Value: None.
Examplecust_ID

String

None.

Enter the name of the column that you want to load in the target table.

Data Type

Default Value: None.
Exampleint, string

String

None.

Enter the data type of the values in the specified column.

File format type

Default ValueCSV
ExamplePARQUET

Dropdown list

Source Type is Cloud Storage file.

Select the file format of the source data file. It can be CSV, JSON, ORC, PARQUET, or TEXT.

File Format Option List

Source Type is Cloud Storage file.

You can use this field set to choose the file format options to associate with the bulk load operation, based on your source file format. Choose one file format option in each row.

File format option

Default Value: None.
Examplecust_ID

String/Expression/Suggestion

Source Type is Cloud Storage file.

Select a file format option from the available options and set appropriate values to suit your bulk load needs, without affecting the syntax displayed in this field.

Files provider

Default ValueFile list
Examplepattern

Dropdown list

Source Type is Cloud Storage file.

Declare the manner in which you are specifying the source files list - File list or pattern. Based on your selection in this field, the corresponding fields change: File list fieldset for File list and File pattern field for pattern.

File list

Source Type is Cloud Storage file and Files provider is File list.

You can use this field set to specify the file paths to be used for the bulk load operation. Choose one file path in each row.

File

Default Value: None.
Examplecust_data.csv

String

Source Type is Cloud Storage file and Files provider is File list.

Enter the path of the file to be used for the bulk upload operation.

File pattern

Default Value: None.
Examplefolder1/file_[a-g].csv

String/Expression

Source Type is Cloud Storage file and Files provider is pattern.

Enter the regex pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed in single quotes. Learn more: Examples of COPY INTO (Delta Lake on Databricks) for DLP.

Encryption type

Default Value: None.
ExampleServer-Side KMS Encryption

String

Source Type is Cloud Storage file.

Select the encryption type that you want to use for the loaded data and/or files.

Info

Server-side encryption is available only for S3 accounts.

KMS key

Default Value: None.
ExampleMF96D-M9N47-XKV7X-C3GCQ-G5349

String/Expression

Source Type is Cloud Storage file.

Enter the KMS key to use to encrypt the files. In case that your source files are in S3, see Loading encrypted files from Amazon S3 for more detail.

Snap Execution

Default ValueExecute only
Example: Validate & Execute

Dropdown list

None.

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

...

Error

Reason

Resolution

Missing property value

You have not specified a value for the required field where this message appears.

Ensure that you specify valid values for all required fields.

Examples

Excluding Fields from the Input Data Stream

We can exclude the unrequired fields from the input data stream by omitting them in the Input schema field set. This example demonstrates how we can use the <Snap Name> to achieve this result:

...

Bulk Loading of Employee data from a CSV file into a DLP instance

<Work in Progress>

Download this Pipeline. 

Downloads

...

Attachments
previewtrue
patterns*.slp, *.zip
sortByname

Snap Pack History

Insert excerpt
Databricks Snap Pack
Databricks Snap Pack
nameDatabricks Snap Pack History
nopaneltrue

...