Support for Ultra Pipelines

Works in Ultra Pipelines. However, we recommend that you not use this Snap in an Ultra Pipeline.

Limitations

Special character'~' is not supported if it is there in the temp directory name for Windows. It is reserved for user's home directory.
Snowflake provides the option to use the Cross Account IAM in the external staging. You can adopt the cross-account access through the option Storage Integration. With this setup, you don’t need to pass any credentials around, and access to the storage only using the named stage or integration object. For more details: Configuring Cross Account IAM Role Support for Snowflake Snaps
Snowflake Bulk Load expects column order should be like a table from upstream snaps otherwise it will result in failure of data validation.
If a Snowflake Bulk Load operation fails due to inadequate memory space on the JCC node when the Data Sourcesource is Input View and the Staging Locationlocation is Internal Stage, you can store the data on an external staging location (S3, Azure Blob or GCS).
When the bulk load operation fails due to invalid input, the error view does not display the erroneous columns correctly when the input does not contain the default columns.
This is a bug in Snowflake and is being tracked under JIRA SNOW-662311 and JIRA SNOW-640676.

...

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 0
Max: 2

JSON Generator
Binary to Document

Documents containing the data to be uploaded to the target location.

Info

Second Input View

This Snap has one document input view by default.

You can add a second input view for metadata for the table as a document so that when the target table is absent, this table metadata can be created in the database with a similar schema as the source table. This schema is usually from the second output of a database Select Snap. If the schema is from a different database, the data types might not be properly handled.

Learn more about adding metadata for the table in the second input view from the example Providing Metadata For Table Using The Second Input View.

Output

Document

Min: 0
Max: 1

Mapper
Snowflake Execute

If an output view is available, then the output document displays the number of input records and the status of the bulk upload as follows:

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

...

Field Name

Field Type

Field Dependency

Description

Label*

Default Value: Snowflake - Bulk Load
Example: Load Employee Tables

String

N/A

Specify the name for the instance. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

Schema Name

Default Value: N/A
Example: schema_demo

String/Expression/Suggestion

N/A

Specify the database schema name. In case it is not defined, then the suggestion for the Table Name retrieves all tables names of all schemas. The property is suggestible and will retrieve available database schemas during suggest values.

The values can be passed using the Pipeline parameters but not the upstream parameter.

Table Name*

Default Value: N/A
Example: employees_table

String/Expression/Suggestion

N/A

Specify the name of the table to execute bulk load operation on.

The values can be passed using the Pipeline parameters but not the upstream parameter.

Create table if not present

Default Value: Deselected

Checkbox

N/A

Select this checkbox to automatically create the target table if it does not exist.

The data type for the columns in the new table depends on the data type of the input in the upstream Snap. If a second input view exists the data type for the columns is read from the input view.
Due to implementation details a newly created table is not visible to subsequent database Snaps during runtime validation. If you want to immediately use the newly updated data you must use a child Pipeline that is invoked through a Pipeline Execute Snap.
Create table if not present works only when the set Datasource is Inputview. When you select Create table if not present and the Datasource is set to Staged files, the Snowflake Bulk Load Snap throws the configuration exception "Failure: Invalid snap configuration" because this Snap does not support table creation when you upload existing files. You can create a table if a table does not exist and retry.
If you provide a second input view and select Create table if not present when the target table does not exist, the target table is only created from the metadata of the second input view and not from the input document.

Info
This should not be used in production since there are no indexes or integrity constraints on any column and the default varchar() column is over 30k bytes.

Info
The create table operation fails if it contains a geospatial data type column. Workaround: Create the table manually ahead of time, or use the Snowflake - Execute to create the table.

Data source

Default Value: Input view
Example: Staged files

Dropdown list

N/A

Specify the source from where the data should load. The available options are Input view and Staged files.

When the option 'Input View' is selected, leave the Table Columns field empty, and if the 'Staged files' option is selected, provide the column names for the Table Columns to which the records are to be added.

Preserve case sensitivity

Default Value: Deselected

Checkbox

N/A

Select this check box to preserve the case sensitivity of the column names.

If you do not select Preserve case sensitivity, the input documents are loaded to the target table if the key names in the input documents match the target table column names ignoring the case.
If you include a second input view, selecting Preserve case sensitivity has no effect on the column names of the target table, because Snap uses the metadata from the second input view.

Load empty strings

Default Value: Deselected

Checkbox

N/A

Select this check box to load empty string values in the input documents as empty strings to the string-type fields. Else, empty string values in the input documents are loaded as null. Null values are loaded as null regardless.

Truncate data

Default Value: Deselected

Checkbox

N/A

Select this checkbox to truncate existing data before performing data load. With the Bulk Update Snap, instead of doing truncate and then update, a Bulk Insert would be faster.

Staging Locationlocation

Default Value: Internal
Example: External

Dropdown list/Expression

N/A

Select the type of staging location that is to be used for data loading:

External: Location that is not managed by Snowflake. The location should be an AWS S3 Bucket or Microsoft Azure Storage Blob or Google Cloud Storage. These credentials are mandatory while validating the Account.
Internal: Location that is managed by Snowflake.

The Snap creates temporary files in JCC when the Staging location is internal and the Data source is input view. These temporary files are removed automatically once the Pipeline completes execution.

Info
If you want to delete the temporary files from the S3 Bucket, we recommend you assign the delete object permission policy to the S3 user to delete the files. Learn how to assign delete object permission to an S3 user in AWS S3. If you do not want to delete the temporary files, you can add an error view to the Snap and run the Pipeline.

Target

Default Value: N/A
Example: s3://test_bucket

String/Expression

N/A

Specify an internal or external location to load the data. If you select External for Staging Location, a staging area is created in Azure, GCS, or S3 as configured. Otherwise, a staging area is created in Snowflake's internal location.

This field accepts the following input:

Named Stage: The name for user-defined named stage. This should be used only when a Staging location is set as Internal.
Format: @<Schema>.<StageName>[/path]

Internal Path: The staging location represent by a path.
Format: @~/[path]

S3 Url: The external S3 URL that specifies an S3 storage bucket.
Format: s3://[path]

Microsoft Azure Storage Blob URL: The external URL required to connect to the Microsoft Azure Storage.
Folder Name: Anything else (including no input). This is regarded as a Folder name under the Internal Home Path (@~) if using internal staging or under the S3 bucket and folder specified in the Snowflake account.

The value for the expression has to be provided as a Pipeline parameter and cannot be provided from the Upstream Snap for performance reasons when you use expression values.

Storage Integration

Default Value: N/A
Example:

String/Expression

Appears when you select Staged files for Data source and External for Staging location.

Specify the pre-defined storage integration that is used to authenticate the external stages.

The value for the expression has to be provided as a Pipeline parameter and cannot be provided from the Upstream Snap for performance reasons when you use expression values.

Staged file list

Use this field set to define staged file(s) to be loaded to the target file.

Staged file

String/Expression

Appears when you select Staged files for Data source.

Specify the staged file to be loaded to the target table.

File name pattern

Default Value: N/A

Example: .length

String/Expression

Appears when you select Staged files for Data source.

Specify a regular expression pattern string, enclosed in single quotes with the file names and /or path to match.

File format object

Default Value: None

Example: jsonPath()

String/Expression/Suggestion

N/A

Specify an existing file format object to use for loading data into the table. The specified file format object determines the format type such as CSV, JSON, XML, AVRO, or other format options for data files.

File format type

Default Value: None
Example: CSV

Dropdown list

N/A

Specify a predefined file format object to use for loading data into the table. The available file formats include CSV, JSON, XML, and AVRO.

File format option

Default value: N/A
Example: BINARY_FORMAT=UTF8

String/Expression

N/A

Specify the file format option. Separate multiple options by using blank spaces and commas.

Excerpt

Info

You can use various file format options including a binary format which passes through in the same way as other file formats. Learn more: File Format Type Options.

Before loading binary data into Snowflake, you must specify the binary encoding format, so that Snap can decode the string type to binary types before loading it into Snowflake. This can be done by selecting the following binary file format:

BINARY_FORMAT=xxx (Where XXX = HEX|BASE64|UTF8)

However, the file you upload and download must be in similar formats. For instance, if you load a file in HEX binary format, you should specify the HEX format for download as well.

When using external staging locations

When loading numeric data from staged files, you must provide file format options depending on the user's data.
Do not use the following file format options if you selected Input view in the Data source field.
- FIELD_DELIMITER
- RECORD_DELIMITER
- FIELD_OPTIONALLY_ENCLOSED_BY

Table Columns

Use this field set to specify the columns to be used in the COPY INTO command. This only applies when the Data source is Staged files.

Columns
Default value: None

String/Expression/Suggestion

N/A

Specify the table columns to use in the Snowflake COPY INTO query. This configuration is valid when the staged files contain a subset of the columns in the Snowflake table. For example, if the Snowflake table contains columns A, B, C, and D, and the staged files contain columns A and D then the Table Columns field would have two entries with values A and D. The order of the entries should match the order of the data in the staged files.

If the Data source is Input view, the snap displays the following error:

Select Query

Default Value: N/A
Example:
select substr(t.$2,4), t.$1, t.$5, t.$4
from @mystage t

String/Expression

Appears when the Data source is Staged files.

Specify the SELECT query to transform data before loading it into the Snowflake database.

The SELECT statement transform option enables querying the staged data files by either reordering the columns or loading a subset of table data from a staged file. For example, select $1:location, $1:dimensions.sq_ft, $1:sale_date, $1:price from @mystage/sales.json.gz t
This query loads the file sales.json from the internal stage mystage, (which stores the data files internally); wherein location, dimensions.sq_ft, and sale_date are the objects.

(OR)

select substr(t.$2,4), t.$1, t.$5, t.$4 from @mystage t
This query reorders the column data from the internal stage mystage before loading it into a table. The (SUBSTR), SUBSTRING function removes the first few characters of a string before inserting it.

We recommend you not use a temporary stage while loading your data.

Encryption type

Default Value: None
Example: Server-Side Encryption

Dropdown list

N/A

Specify the type of encryption to be used on the data. The available encryption options are:

None: Files do not get encrypted.
Server Side Encryption: The output files on Amazon S3 are encrypted with server-side encryption.
Server-Side KMS Encryption: The output files on Amazon S3 are encrypted with an Amazon S3-generated KMS key.

The KMS Encryption option is available only for S3 Accounts (not for Azure Accounts) with Snowflake.

If Staging Location is set to Internal, and when Data source is Input view, the Server Side Encryption and Server-Side KMS Encryption options are not supported for Snowflake snaps:

This happens because Snowflake encrypts loading data in its internal staging area and does not allow the user to specify the type of encryption in the PUT API. Learn more: Snowflake PUT Command Documentation.

KMS key

Default Value: N/A
Example: <Encrypted>

String/Expression

N/A

Specify the KMS key that you want to use for S3 encryption. Learn more about the KMS key: AWS KMS Overview and Using Server Side Encryption.

Buffer size (MB)

Default Value: 10MB
Example: 20MB

String/Expression

N/A

Specify the data in MB to be loaded into the S3 bucket at a time. This property is required when bulk loading to Snowflake using AWS S3 as the external staging area.

Minimum value: 5 MB

Maximum value: 5000 MB

S3 allows a maximum of 10000 parts to be uploaded so this property must be configured accordingly to optimize the bulk load. Refer toUpload Part for more information on uploading to S3.

Manage Queued Queries

Default Value: Continue to execute queued queries when the Pipeline is stopped or if it fails
Example: Cancel queued queries when the Pipeline is stopped or if it fails

Dropdown list

N/A

Select this property to determine whether the Snap should continue or cancel the execution of the queued Snowflake Execute SQL queries when you stop the pipeline.

If you select Cancel queued queries when the Pipeline is stopped or if it fails, then the read queries under execution are canceled, whereas the write type of queries under execution are not canceled. Snowflake internally determines which queries are safe to be canceled and cancels those queries.

Additional Options

On Error

Default Value: ABORT_STATEMENT
Example: CONTINUE

Dropdown list

N/A

Select an action to perform when errors are encountered in a file. The available actions are:

ABORT_STATEMENT: Aborts the COPY statement if any error is encountered. The error will be thrown from the Snap or routed to the error view.
CONTINUE: Continues loading the file. The error will be shown as a part of the output document.
SKIP_FILE: Skips file if any errors encountered in the file.
SKIP_FILE_*error_limit*: Skips file when the number of errors in the file exceeds the number specified in Error Limit.
SKIP_FILE_*error_percent_limit*%: Skips file when the percentage of errors in the file exceeds the percentage specified in Error percentage limit.

Error Limit

Default Value: 0
Example: 3

Integer

Appears when you select SKIP_FILE_*error_limit* for On Error.

Specify the error limit to skip file. When the number of errors in the file exceeds the specified error limit or when SKIP_FILE_number is selected for On Error.

Error Percentage Limit

Default Value: 0
Example: 1

Integer

Appears when you select SKIP_FILE_*error_percent_limit*%
for On Error.

Specify the percentage of errors to skip file. If the file exceeds the specified percentage when SKIP_FILE_number% is selected for On Error.

Size Limit

Default Value: 0
Example: 5

Integer

N/A

Specify the maximum size (in bytes) of data to be loaded.

At least one file is loaded regardless of the value specified for SIZE_LIMIT unless there is no file to be loaded. A null value indicates no size limit.

Purge

Default value: Deselected

Checkbox

N/A

Specify whether to purge the data files from the location automatically after the data is successfully loaded.

Return Failed Only

Default Value: Deselected

Checkbox

N/A

Specify whether to return only files that have failed to load while loading.

Force

Default Value: Deselected

Checkbox

N/A

Specify if you want to load all files, regardless of whether they have been loaded previously and have not changed since they were loaded.

Truncate Columns

Default Value: Deselected

Checkbox

N/A

Select this checkbox to truncate column values that are larger than the maximum column length in the table.

Validation Mode

Default Value: None
Example: RETURN_n_ROWS

Dropdown list

N/A

Select the validation mode for visually verifying the data before unloading it. The available options are:

NONE
RETURN_n_ROWS
RETURN_ERRORS
RETURN_ALL_ERRORS

Validation Errors Type

Default Value: Full error
Example: Do not show errors

Dropdown list

Appears when you select NONE for Validation Mode.

Select one of the following methods for displaying the validation errors:

Aggregate errors per row: Provides a summary view of the errors. You can expand rows to reveal a detailed view of the errors.
Full error: Provides the complete error message.
Do not show errors

Rows to Return

Default Value: 0
Example: 5

Integer

Appears when you select RETURN_n_ROWS, RETURN_ERRORS, and RETURN_ALL_ERRORS for Validation Mode.

Specify the number of rows not loaded into the corresponding table. Instead, the data is validated to be loaded and returns results based on the validation option specified. It can be one of the following values: RETURN_n_ROWS | RETURN_ERRORS | RETURN_ALL_ERRORS

Snap Execution

Default Value: Execute only
Example: Validate & Execute

Dropdown list

N/A

Select one of the three modes in which the Snap executes. Available options are:

Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
Disabled: Disables the Snap and all Snaps that are downstream from it.

Instead of building multiple Snaps with interdependent DML queries, we recommend you use the Stored Procedure or the Multi Execute Snaps.
In a scenario where the downstream Snap depends on the data processed on an upstream database Bulk Load Snap, use the Script Snap to add delay for the data to be available.

For example, when performing a create, insert, and delete function sequentially on a pipeline, the Script Snap helps to create a delay between the insert and delete function. Otherwise, the delete function may get triggered before inserting the records into the table.

Examples

Providing Metadata For Table Using The Second Input View

This example Pipeline demonstrates how to provide metadata for the table definition through the second input view, to enable the Bulk Load Snap to create a table according to the definition.

...

Output Preview	Data in Snowflake
Image RemovedImage Added	Image RemovedImage Added

Finally, connect the JSON Formatter Snap to the Snowflake - Bulk Load Snap to transform the binary data to JSON format, and finally write this output in S3 using the File Writer Snap.

...

Expand

title	Snowflake Bulk Load Snap removes spaces from input documents

You can remove empty spaces from input documents. When you select Input view as Data Source, enter TRIM_SPACE=TRUE in the File Format Option field to remove empty spaces, if any.

Download this pipeline.

In this example, the Pipeline uses the following Snaps:

Data (JSON Generator): Generates a JSON document for the Snowflake - Bulk Load Snap in the Pipeline. In this example, the JSON document contains an empty space.
BL: Executes a Snowflake Bulk Load, writing data into an Amazon S3 bucket or a Microsoft Azure Storage Blob. Also, it enables you to remove all spaces from the input document.
Schema (JSON Generator): Provides the schema to interpret the document passed to the Snowflake - Bulk Load Snap.
Snowflake - Execute: Reads the newly uploaded document and enables you to check whether the spaces were removed as expected.

Anchor
DATA
DATA
Data (JSON Generator)

Data streams from your database source, and you do not necessarily need a Snap to provide input documents. In this example, however, we use the JSON Generator Snap to provide the input document.

Input:

Output:

As you can see, the value listed against the key ACCOUNT_NAME has empty spaces in it.

Anchor
BL
BL
Snowflake - Bulk Load

Input:

Note
Notice that the File Format Option is TRIM_SPACE=TRUE.

Output:

Anchor
GEN
GEN
Schema (JSON Generator)

Table schema is taken from your database source, and you do not necessarily need a Schema Snap to provide the table schema. In this example, however, we use the JSON Generator Snap to provide the table schema.

Input:

Output:

Anchor
EXEC
EXEC
Snowflake - Execute

Input:

Output:

As you can see, the data no longer contains any spaces.

Expand

title	Snowflake Bulk Load with Table Columns on Staged files

In this pipeline, the Snowflake Bulk Load Snap loads the records from a staged file 'employee.csv content' on to a table on Snowflake.

The staged file 'employee.csv content' is passed via the Upstream Snap:

The Snowflake Bulk Load Snap is configured with Data source as Staged files and the Table Columns added as ID, FIRSTNAME, CITY, to be loaded into a table "PRASANNA"."EMPLOYEE" on Snowflake.

The successful execution of the pipeline displays the below output preview:

If the 'employee.csv content' (Staged file)has the below details:

1,PRASANNA,Hyderabad
2,Aparna,hyderabad

Table Columns added are:
ID, FIRSTNAME, CITY

then the table, "PRASANNA"."EMPLOYEE" on Snowflake is as below:

The Snowflake table "PRASANNA"."EMPLOYEE"

Code Block
Create table "PRASANNA"."EMPLOYEE" (ID int, FIRSTNAME varchar(30), LASTNAME varchar(30), CITY varchar(30), ADDRESS varchar(30), JOIN_DATE date)

Note the columns ID, FIRSTNAME and CITY are populated as provided and the LASTNAME, ADDRESS and JOIN DATE are null.

...

Expand

title	Loading Data from S3

This example demonstrates how you can use the Snowflake Bulk Snap to load files from an external staging location such as S3. It further shows the configuration required when loading numeric data.

Download this Pipeline.

The Snowflake Bulk Load Snap has a minimum of 1 Input view. This is useful when the data source is Input view. Even though, within the scope of this Pipeline, the Snap does not require any input from an upstream Snap, the view cannot be disabled.

Data is to be loaded from the staged file threecolumns.csv present in a S3 folder into the table "EXTR_PERF_01_SC"."THREECOLUMNS".

Below is a screenshot of the data in threecolumns.csv:

Image Removed

Image Added

The Snowflake Bulk Load Snap is configured accordingly as follows:

Furthermore, since this data has numeric values, the Snowflake Bulk Load Snap is configured with the following file format options to handle any string/NULL values that may be present in the dataset:

SKIP_HEADER = 1: Specifies that the first row is a header row so that all rows after the first row are loaded.
NULL_IF = "": Convert empty spaces to SQL NULL.
FIELD_OPTIONALLY_ENCLOSED_BY ="": Specifies the character used to enclose strings.

See Format Type Options for a detailed explanation of the above file format options.

Upon execution, the Snowflake Bulk Load Snap loads three rows:

To confirm that three rows were loaded, we use a Snowflake Execute Snap configured to count the number of rows in the target table:

Below is a preview of the output from the Snowflake Execute Snap.

We can see that the count is 3, thereby confirming a successful bulk load.

You can also modify the SELECT query in the Snowflake Execute Snap to read the data in the table and thus verify the data loaded into the target table.

...

Versions Compared

Old Version 145

New Version 146

Key

Support for Ultra Pipelines

Limitations

Second Input View

When using external staging locations

Examples

Providing Metadata For Table Using The Second Input View

Anchor
DATA
DATA
Data (JSON Generator)

Anchor
BL
BL
Snowflake - Bulk Load

Anchor
GEN
GEN
Schema (JSON Generator)

Anchor
EXEC
EXEC
Snowflake - Execute

The Snowflake table "PRASANNA"."EMPLOYEE"

Page Comparison

Versions Compared

Old Version 145

New Version 146

Key

Support for Ultra Pipelines

Limitations

Second Input View

When using external staging locations

Examples

Providing Metadata For Table Using The Second Input View

AnchorDATADATAData (JSON Generator)

AnchorBLBLSnowflake - Bulk Load

AnchorGENGENSchema (JSON Generator)

AnchorEXECEXECSnowflake - Execute

The Snowflake table "PRASANNA"."EMPLOYEE"

Anchor
DATA
DATA
Data (JSON Generator)

Anchor
BL
BL
Snowflake - Bulk Load

Anchor
GEN
GEN
Schema (JSON Generator)

Anchor
EXEC
EXEC
Snowflake - Execute