In this article

An account for the Snap

You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:

Overview

You can use this Snap to load files/data from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to load or transfer data from:

Amazon S3 buckets to the Amazon Redshift database.
Amazon Redshift or Google Cloud Storage to BigQuery tables.
A DBFS folder to a DLP table.

Refer to ELT Source-Target-Hosting Support Matrix for complete information.

If you want to use the COPY INTO command in ELT Execute Snap for loading data into the target database, you need to pass (expose) these account credentials inside the SQL statement. You can use the ELT Load Snap to prevent this security issue.

Snap Type

The ELT Load Snap is a Write-type Snap that writes/loads data from a source file/table to a table in the destination/target CDW.

Prerequisites

Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):
- Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage
- Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery

Limitations

Known Issues

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 0
Max: 1

ELT Transform
ELT Copy
ELT Select

A document containing the SQL query that you can use from the upstream ELT Snaps.

Output

Document

Min: 0
Max: 1

ELT Transform
ELT Copy

A document containing the status of the load operation and an SQL query that you can use in any downstream ELT Snaps.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the by choosing one of the following options from the When errors occur list under the Views tab:

Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in pipelines.

Snap Settings

Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the field set.
Remove icon ( ): Indicates that you can remove fields from the field set.
Upload icon ( ): Indicates that you can upload files.

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol ( ) enabled, where available. This list is common to all supported target CDWs. You can also use other expressions/functions that your target CDW supports.

Field Name		Field Type	Field Dependency	Description
Label* Default Value: ELT Load Example: S3 Load		String	None.	Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
Get preview data Default Value: Not selected Example: Selected		Checkbox	None.	Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during pipeline validation. For ELT pipelines, only the SQL query flows through the Snaps but not the actual source data. Therefore, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the pipeline. The number of records displayed in the preview (on validation) is the smaller of the following: Number of records available on execution of the SQL query generated by the Snap. The value set in the Preview Data Count setting (default is 50 records). Rendering Complex Data Types in DLP Based on the data types of the fields in the input schema, the Snap renders the complex data types such as map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.
Database Name Default Value: N/A Example: TEST_DB		String	None.	Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the Account settings.
Schema Name (Not applicable to Databricks Lakehouse Platform)* Default Value: N/A Example: SALES		String	Not applicable to DLP.	Enter the name of the database schema. In case the schema name is not defined, then the suggestions retrieved for the schema name contain all schema names in the specified database. Ensure that you include the exactly same schema name including the double quotes, if used, when you repeat the schema name in the Target Table Name field. Leave this field blank if your target database is Databricks Lakehouse Platform.
Target Table Name* Default Value: N/A Example: SALES_ORDERS		String	None.	Enter the name of the table or view in which you want to perform the load operation. Only views that can be updated (have new rows) are listed as suggestions. So, Join views are not included. This also implies that the Snap account user has the Insert privileges on the views listed as suggestions. Ensure that you include the exact same schema name, if at all, including the double quotes as specified in the Schema Name field. If the target table or view does not exist during run-time, the Snap creates one with the name that you specify in this field and writes the data into it. During pipeline validation, the Snap creates the new table or view but does not write any records into it. This new table or view is not be dropped if a subsequent/downstream Snap failure occurs during validation. You must use double quotes (““) when including special characters, such as hyphens (-) in the name. A table or view name must always start with a letter. Integers and underscores (_) can also be a part of the table or view name. All characters are automatically converted to uppercase by the Snap. Use double-quotes to retain lowercase.
Target Table Hash Distribution Column (Azure Synapse Only) Default Value: N/A Example: Var_table		String/Expression	Applicable to Azure Synapse only.	Specify the hash distribution column name for the target table in Azure Synapse, if you choose the Load Action as Drop and Create table. Azure Synapse needs a table to be always hash-distributed for improved query performance. If the target table is created outside the Snap, you need not specify the target table column name here.
Load Action Default Value: Append rows to existing table Example: Alter table		Dropdown list	None.	Select the load action to perform on the target table. Available options are: Append rows to existing table. Loads the data from the source files into the specified target table. Overwrite existing table. Overwrites the target table with the data from the source files. Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Data Type field in the Table Columns field set. Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Data Type and Modifier fields in the Table Columns field set. The Snap does not modify the existing tables during pipeline validation, but if you choose the Drop and Create table option in the Load Action field and the target table does not exist, it creates a new (empty) target table based on the schema specified in the Snap.
Table Columns	This field set enables you to configure the schema of the target table. You can use this field set to add/drop columns in the target table if you select the Drop and Create table or Alter table options in the Load Action field. This field set consists of the following fields: Column Data Type Modifier
	Column Default Value: N/A Example: ID, FNAME	String	Load Action is Drop and Create table or Alter table.	Enter the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Load Action field.
	Data Type Default Value: N/A Example: INT, VARCHAR	String	Load Action is Drop and Create table or Alter table.	Enter the data type of the values in the specified column. Use the VARIANT data type for target table columns that correspond to a map (JSON OBJECT) or array (JSON LIST) in your nested Parquet files.
	Modifier Default Value: Add Example: Drop	String	Load Action is Alter table.	Select whether you want to add/drop the specified column. Available options are: Add Drop
Table Option List	This field set enables you to define the table options for creating the target table before performing the load operation. These options vary based on your target CDW. Refer to the Table options for Load action for the complete list of permitted table options. You must specify each table option in a separate row. Table Option
Table Option List	Table Option Default Value: N/A Example: OPTIONS(custID=03761)	Expression/Suggestion		Choose a table option that you want to use from the suggested options. Specify the required values for the selected option.
Redshift Schema Name (Only when BigQuery with Source Location as Redshift) Default Value: N/A Example: PUBLIC, DEF_SCHEMA		String/Expression	Database Type is BigQuery and Source Location is Amazon Redshift in the Snap Account.	Specify the schema name of the source table in Redshift.
Redshift Table Name (Only when BigQuery with Source Location as Redshift) Default Value: N/A Example: DataLoad_Source01		String/Expression	Database Type is BigQuery and Source Location is Amazon Redshift in the Snap Account.	Specify the name of the Redshift source table (from the schema selected above).
Cloud Storage Path Default Value: N/A Example: s3://bigbucket/app_data		String/Expression		Specify the fully qualified path to the folder in the Cloud storage in this field, if you want to override the default path defined in the Snap's Account settings. For example: AWS S3 path: s3://<bucket name>/<folder> DBFS folder: /<dbfs folder name> Azure: azure://<accountname>.blob.core.windows.net/<containername>/<azurefolder> Google Cloud Storage: gcs://<bucketname>/<folder>/
File List	This field set enables you to specify the staged files to load. You must specify each file in a separate row. Click + to add a new row. Use File List for up to 2 files For the best performance of your load operation, we recommend that you use the File Name Pattern field to load data from multiple files at a source location (S3 or Redshift) instead of specifying the individual file names/paths in the File List field set. This field set consists of the following field: File
File List	File Default Value: N/A Example: TestData.csv	String	None.	Enter the name of the staged file to load.
File Name Pattern Default Value: N/A Example: Snowflake: '.key.[.]csv' Redshift: './key.csv'*		String	None.	Enter the pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed into use the key-based search mechanism. If your search is based on AWS IAM roles, you cannot use regular expressions with Redshift tables. Using Wildcard Patterns in BigQuery BigQuery has its own set of rules to resolve wildcard patterns such as . (that you provide in this field). Refer to Batch loading data > Load Wildcards in the BigQuery documentation for the rules on using wildcards in BigQuery. Refer to Loading Using Pattern Matching for details on Snowflake regular expressions, Examples of COPY INTO (Delta Lake on Databricks) for DLP, or Regular Expressions and the Java Programming Language for using regular expressions with Redshift, Azure Synapse, or BigQuery in this field.
File Format Type Default Value: None Example: AVRO		String/Expression/Suggestion	None.	Select the format for the file. The options vary based on the database type in your account settings. The supported file formats are: Snowflake: None, AVRO, CSV, JSON, ORC, PARQUET, XML Redshift: None, AVRO, CSV, JSON, ORC, PARQUET Azure Synapse: CSV, ORC, PARQUET DLP: AVRO, CSV, JSON, ORC, PARQUET BigQuery: AVRO, CSV, JSON, ORC, PARQUET
File Format Option List	This field set enables you to define the file format options for the load operation. You must specify each file format option in a separate row. Format Option Refer to Snowflake Data Loading Commands, Redshift Data Format Parameters, COPY INTO (Transact-SQL) for Azure Synapse Analytics, COPY INTO Format options for DLP, or the BigQuery’s Batch load data guide specific to your source file format type for the list of options that you can use based on the target database. Automatic Inferencing of Source Table Schema The ELT Load Snap allows you to infer the table schema from the source files and then apply the auto-inferred schema to your target tables. This feature is supported on the Redshift target CDW with the source files in an S3 location and on the Snowflake target CDW with the source files on an Amazon S3, Azure Blob Storage (WASB), ADLS Gen2, or Google Cloud Storage location. Learn more at Automatic Schema Inference with the ELT Load Snap.
File Format Option List	Format Option Default Value: N/A Example: SKIP_HEADER = 1	String/Expression/Suggestion	Based on the combination of Database Type and File Format Type.	Choose a file format option that you want to use during the load operation from the suggested values. Specify the required value for the selected option. Loading a CSV file with nulls into a DLP instance If the CSV file you want to load into a DLP instance contains null values, ensure that you define sep = ',' as the Format Option without which the load operation fails.
Encryption Type Default Value: None Example: Server-Side Encryption		String/Expression/Suggestion	None.	Select the encryption type that you want to use for the loaded data and/or files. Available options are: None Server-Side Encryption (Not supported by Azure Synapse) Server-Side KMS Encryption (Not supported by Azure Synapse) Server-side encryption is available only for S3 accounts and Snowflake.
KMS Key Default Value: N/A Example: WC2YY-2BFMV-KHKB3-FDDGQ-8NRQY		String/Expression	Encryption Type is Server-Side KMS Encryption.	Enter the KMS key to use to encrypt the files. Not required for Redshift databases. The load operation automatically recognizes and loads the encrypted files. Refer to Loading encrypted files from Amazon S3 for details.
Copy Option List	This field set enables you to define the COPY options for the load operation. You must specify each COPY option in a separate row. Click + to add a new row. This field set comprises the following field: Copy Option See COPY Options for Snowflake, COPY Parameter Reference for Redshift, COPY INTO (Transact-SQL) for Azure Synapse Analytics, or COPY_OPTIONS under COPY INTO Parameters for DLP.
Copy Option List	Copy Option Default Value: N/A Example: ON_ERROR = ABORT_STATEMENT	String/Expression/Suggestion	None.	Choose a COPY option from the suggested values to use if an error occurs during the load operation. Specify the required value for the selected option.
Validation Mode Default Value: None Example: RETURN_n_ROWS		String/Expression/Suggestion	None.	Select an output preview type for the data validation performed as part of the pipeline validation. The options vary based on the database type (Snowflake/Redshift) in your account settings. Available options for Snowflake are: None RETURN_n_ROWS RETURN_ERRORS RETURN_ALL_ERRORS Available options for Redshift are: None NOLOAD For more details, refer to Validating Staged Files for Snowflake and Validating Input Data for Redshift. No specific validation modes are available for Azure Synapse, DLP, and Google BigQuery.
Rows To Return Default Value: N/A Example: 20		String (Number)	Validation Mode is RETURN_n_ROWS.	Enter the number of records to retrieve from the target table to validate whether the load operation was successful.
Enable source column position (Snowflake and Azure Synapse Only) Default Value: Not selected Example: Selected		Checkbox	Database Type is Snowflake or Azure Synapse.	Select this checkbox to specify the source table column’s position in the Target to Source Column Map instead of its name. This is applicable to only Snowflake and Azure Synapse target CDWs. For other target CDWs (Redshift, DLP, and BigQuery), you can leave this checkbox unchecked and specify source column names. Selecting this checkbox renames the Source File Column Name field in the Target to Source Column Map fieldset below to Source Column Position (Valid Only For Snowflake and Azure Synapse).
Target to Source Column Map	This field set enables you to define the mapping between the columns in the target table and the corresponding columns in the source file. You must define each mapping in a separate row. The available options are: Target Table Column Name Source File Column Name
	Target Table Column Name Default Value: N/A Example: ORDER_ID	String/Expression/Suggestion	None.	Choose a column from the list of target table columns displayed. Ensure that you have defined the Schema Name or the Target Table Name to view the list of columns in this field's suggestions.
	Source File Column Name Default Value: N/A Example: ORD_ID	String/Expression/Suggestion	None.	Choose a column from the list of source file columns displayed. Ensure that you have defined the Schema Name (or source file path) to view the list of columns in this field's suggestions.
	Source Column Position (Valid Only For Snowflake and Azure Synapse) Default Value: N/A Example: (An integer) 1, 3, 4	Integer/Expression	Database Type is Snowflake or Azure Synapse.	Specify the column’s position in the source file.

Table options for Load action

Using file format options with ELT Load

Here are a few common file format options that can be used for the corresponding CDW when loading data from CSV files in a storage location to a table in that CDW.

Target CDW	Exclude header row from the input data stream	Specify row delimiters
Azure Synapse	FIRSTROW = 2	ROWTERMINATOR = '`0X0A`' (CSV file created in MacOS) ROWTERMINATOR = '`\n`' (CSV file created in Windows)
DLP	'header' = 'true'	lineSep=None (`\r`) \| `\r\n` \| `\n`
Google BigQuery	skip_leading_rows = 1	N/A (allow_quoted_newlines = true if there are quoted data sections in a CSV file, that contain newline characters.)
Redshift	IGNOREHEADER 1	N/A
Snowflake	SKIP_HEADER = 1	RECORD_DELIMITER = '<character>' \| NONE

Load data from Parquet files

Data in your Parquet files can exist as any combination of the following structures: Flat data, nested lists, and nested maps.

Flat Parquet fields contain data in a simple two-dimensional spreadsheet-like format (Level 0). Such fields can be read and loaded using the standard mapping and loading methods.
Parquet file fields in nested lists or maps contain structured and hierarchical data where values for each field can be:
- A series of repeated fields (Level 1) or
- A series of values (Level 1)
- Another set of key-value pairs (Level 2), or
- A set of keys with each or some of them containing a nested set of key-value pairs (Level 3), and so on.

Lists vs. Maps in nested Parquet files

The lists are very similar to the JSON arrays in their notation and are characterized by square brackets [ and ] in the Parquet files.
The maps are very similar to the JSON objects in their notation and are characterized by curly braces { and } in the Parquet files.
However, we recommend that you reference the actual Parquet file to create your source-to-target field mappings and not its JSON-converted version.

Different Parquet file structures

The table below depicts the possible schema variations within a Parquet file with examples of source table data. It also indicates how to reference the fields (keys in the source Parquet files) and map them with the corresponding target table columns to load your data.

Parquet Files	Canonical format			Non-canonical format
Parquet Files	Flat data	Nested List	Nested Map	Nested Map
Sample schema (column definitions)	Field(name: "name", datatype: String) Field(name: "age", datatype: Int32) Field(name: "city", datatype: String)	Field(name: "tourist", datatype: Struct[ Field(name: "name", datatype: String), Field(name: "visits", datatype: List[String, String]) ])	Field(name: "tourist", datatype: Struct{ Field(name: "name", datatype: String), Field(name: "visits", datatype: Map[String, String]) })	Field(name: "tourist", datatype: Struct{ Field(name: "fname.lname", datatype: String), Field(name: "visits.made", datatype: Map[String, String]) })
Example Data	+-------+----+--------+ \| name \| age\| city \| +-------+----+--------+ \| Mason \| 54 \| London \| \| Cindy \| 73 \| Austin \| \| Lewis \| 28 \| Sydney \| +-------+----+--------+	{ name: "Mason", visits: [ {"cname":"Sweden", "pname": "Berne"}, {"cname": "Mexico", "pname": "Mexico City"}, {"cname": "Russia", "pname": "Moscow"} ] } Lines 3 through 8 in this example denote the nested list “visits”.	{ name: "Mason", visits: { "place": [ {"pname": Berne", "year": "2017"}, {"pname": "Mexico City", "year": "2019"}, {"pname": "Moscow", "year": "2020"} ] } } Lines 3 through 11 in this example denote the nested map “visits”.	{ fname.lname: "Kyle Mason", visits.made: { "place": [ {"pname": Berne", "year": "2017"}, {"pname": "Mexico City", "year": "2019"}, {"pname": "Moscow", "year": "2020"} ] } }
Referencing columns from source table schema in the Snap	`name` `age` `city`	`name` `visits:cname` `visits:pname`	`name` `visits.place:pname` `visits.place:year`	`"fname.lname"` `"visits.made".place:pname` `"visits.made".place:year`
	`name` `age` `city`	Use colon ‘:' to read and load data from child columns of a nested list in your Parquet file.	Use period '.’ to read and load data from child columns of a nested map structure in your Parquet file.	Use double quotes " for column names that contain special characters, to read and load data from these columns. Use period '.’ to read and load data from child columns of a nested map structure in your Parquet file.

Consider the following ELT Load Snap configuration:

Besides the standard and mandatory settings in this Snap, the following settings are unique and pivotal to reading and loading data from Parquet files:

Load Action
Target Table Columns (for the first load and for any alter table actions thereafter)
- New target table definition and table column alterations.
File List > File (for individual file loads)
File Name Pattern (to load multiple Parquet files that follow a specific naming and path patterns)
File Format Type = PARQUET
Copy Option List (to load data based on the MATCH_BY_COLUMN_NAME = Case-sensitive or Case-Insensitive with no column mapping specified) - Applies to flat files and not files with nested data.
Target to Source Column Map
- Mapping some or all source columns to corresponding columns in the target table.

You must decipher the structure (canonical or non-canonical) and schema of your Parquet files (mentioned above) and fill the details - Column names, data types, and source-to-target column mapping.

Learn more at Load data from a canonical format Parquet file.

Snap Behavior During Pipeline Validation

This Snap's behavior during pipeline validation varies for each Load Action selected.

Load Action	Target Table exists?	Validation Result	Action on Target Table^#
Append rows to existing table	No	Success	Creates an empty target table based on the schema information available in the Snap settings.
Append rows to existing table	Yes	Success	No updates made.
Overwrite existing table	No	Success	Creates an empty target table based on the schema information available in the Snap settings.
Overwrite existing table	Yes	Success	No updates made.
Drop and Create table	No	Success	Creates an empty target table based on the schema information available in the Snap settings.
Drop and Create table	Yes	Success	No updates made.
Alter table	No	Success	Creates an empty target table based on the schema information available in the Snap settings.
Alter table	Yes	Success	No updates made.
^#Bulk load operations are not done during pipeline validation.

Troubleshooting

Examples

Loading Files from S3 Bucket to Snowflake

We need to perform Snowflake ELT transformations upon a file present in an S3 bucket. Therefore, we must first load the file into the Snowflake database from the source location. This pipeline shows how we can use the ELT Load Snap to accomplish this task.

Before we can load the file, we must configure the ELT account to read from the source S3 bucket by configuring the Source Location, Authentication, Access Key, Secret Key, S3 Bucket, and S3 Folder fields. The fields vary based on the source location and authentication. See Configuring ELT Database Accounts for details.

We configure the ELT Load Snap as shown below:

Let us understand this Snap's configuration:

The Database Name, Schema Name, and Target Table Name specify the target database, schema, and table into which we want to write the file's data.
Table Action indicates that if a table with the same name exists, we configure the Snap to drop the table and create a new one in its place.
Since this is a new table, the Table Columns field set specifies the table's schema.
The File List field set specifies the source file's path in the S3 bucket and the file's format.
The Format Option field set specifies that the Snap must ignore the file's header. Otherwise, the header is also loaded into the target table.
The Copy Option field set specifies that the Snap must abort the loading operation if it encounters an error.
The Validation Mode and Rows to Return fields specify that 10 rows from the target table be retrieved upon successful completion.

To validate a successful execution, the ELT Load Snap executes a SELECT * command on the target table and returns 10 rows:

We can either create a separate pipeline to perform ELT operations on the table or add appropriate Snaps downstream of the ELT Load Snap in the same pipeline.

Download this pipeline.

Uploading data from a CSV File in Azure Data Lake Gen2 Storage to Azure Synapse

The following one-Snap (ELT Load) pipeline connects to an Azure Data Lake Gen2 Storage using an ADLS Gen2 Basic Auth account to upload a CSV file from the storage to an Azure Synapse database. This CSV file is created on MacOS with the default line endings.

Account Configuration	Snap Configuration

For the pipeline to work as expected when the CSV file created on MacOS with the default (MacOS) line endings, define the value for the file format option ROWTERMINATOR as '0X0A' instead of the usual '\n'. With '\n' as ROWTERMINATOR, the pipeline runs successfully, but does not insert the data from the CSV file into the newly-created table in Azure Synapse.

Snap Output with ROWTERMINATOR = '\n'	Snap Output with ROWTERMINATOR = '0X0A'

Download this pipeline.

Load data from a canonical format Parquet file

Suppose that your Parquet file contains a few rows of data split into three columns (id, batters, and topping) as shown below:

You can configure the ELT Load Snap in the following ways to load this data into a two-dimensional table on your Snowflake instance.

Create a target table or update an existing table with the same three columns - id, batters, and topping. In this case, the data type of the batters and topping columns must be VARIANT. Or
Create a (flat) target table with the following columns and data types, and load values into the respective columns using the Parquet file data identifiers.

Load data from a non-canonical format Parquet file