ELT Load

ELT Load

 

Overview

You can use this Snap to load files/data from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to load or transfer data from:

  • Amazon S3 buckets to the Amazon Redshift database.

  • Amazon Redshift or Google Cloud Storage to BigQuery tables. 

  • A DBFS folder to a DLP table.

Refer to ELT Source-Target-Hosting Support Matrix for complete information. 

If you want to use the COPY INTO command in ELT Execute Snap for loading data into the target database, you need to pass (expose) these account credentials inside the SQL statement. You can use the ELT Load Snap to prevent this security issue.

Snap Type

The ELT Load Snap is a Write-type Snap that writes/loads data from a source file/table to a table in the destination/target CDW.

Prerequisites

  • Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):

    • Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage

    • Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery

Limitations

Known Issues

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • ELT Transform

  • ELT Copy

  • ELT Select

A document containing the SQL query that you can use from the upstream ELT Snaps. 

Output

Document

  • Min: 0

  • Max: 1

  • ELT Transform

  • ELT Copy

A document containing the status of the load operation and an SQL query that you can use in any downstream ELT Snaps. 

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the field set.

  • Remove icon ( ): Indicates that you can remove fields from the field set.

  • Upload icon ( ): Indicates that you can upload files.

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol ( ) enabled, where available. This list is common to all supported target CDWs. You can also use other expressions/functions that your target CDW supports.

Field Name

Field Type

Field Dependency

Description

Field Name

Field Type

Field Dependency

Description

Label*

 

Default ValueELT Load

ExampleS3 Load

String

None.

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

 

Get preview data

 

Default ValueNot selected

ExampleSelected

Checkbox

None.

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during pipeline validation.

For ELT pipelines, only the SQL query flows through the Snaps but not the actual source data. Therefore, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the pipeline.

The number of records displayed in the preview (on validation) is the smaller of the following:

  • Number of records available on execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in DLP

Based on the data types of the fields in the input schema, the Snap renders the complex data types such as map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Database Name

 

Default Value: N/A

ExampleTEST_DB

String

None.

Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the Account settings.

Schema Name (Not applicable to Databricks Lakehouse Platform)*

 

Default Value: N/A

ExampleSALES

String

Not applicable to DLP.

Enter the name of the database schema. In case the schema name is not defined, then the suggestions retrieved for the schema name contain all schema names in the specified database.

  • Ensure that you include the exactly same schema name including the double quotes, if used, when you repeat the schema name in the Target Table Name field.

  • Leave this field blank if your target database is Databricks Lakehouse Platform.

Target Table Name*

 

Default Value: N/A

ExampleSALES_ORDERS

String

None.

Enter the name of the table or view in which you want to perform the load operation. 

Only views that can be updated (have new rows) are listed as suggestions. So, Join views are not included. This also implies that the Snap account user has the Insert privileges on the views listed as suggestions.

  • Ensure that you include the exact same schema name, if at all, including the double quotes as specified in the Schema Name field.

  • If the target table or view does not exist during run-time, the Snap creates one with the name that you specify in this field and writes the data into it. During pipeline validation, the Snap creates the new table or view but does not write any records into it.

    • This new table or view is not be dropped if a subsequent/downstream Snap failure occurs during validation.

  • You must use double quotes (““) when including special characters, such as hyphens (-) in the name.

  • A table or view name must always start with a letter.

  • Integers and underscores (_) can also be a part of the table or view name.

  • All characters are automatically converted to uppercase by the Snap. Use double-quotes to retain lowercase.

Target Table Hash Distribution Column (Azure Synapse Only)

 

Default Value: N/A

ExampleVar_table

String/Expression

Applicable to Azure Synapse only.

Specify the hash distribution column name for the target table in Azure Synapse, if you choose the Load Action as Drop and Create table.

Azure Synapse needs a table to be always hash-distributed for improved query performance. If the target table is created outside the Snap, you need not specify the target table column name here.

Load Action

 

Default ValueAppend rows to existing table

ExampleAlter table

Dropdown list

None.

Select the load action to perform on the target table.

Available options are:

  • Append rows to existing table. Loads the data from the source files into the specified target table.

  • Overwrite existing table. Overwrites the target table with the data from the source files.

  • Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Data Type field in the Table Columns field set.

  • Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Data Type and Modifier fields in the Table Columns field set. 

The Snap does not modify the existing tables during pipeline validation, but if you choose the Drop and Create table option in the Load Action field and the target table does not exist, it creates a new (empty) target table based on the schema specified in the Snap.

Table Columns

This field set enables you to configure the schema of the target table. You can use this field set to add/drop columns in the target table if you select the Drop and Create table or Alter table options in the Load Action field. 

This field set consists of the following fields:

  • Column

  • Data Type

  • Modifier

Column

 

Default Value: N/A

ExampleID, FNAME

String

Load Action is Drop and Create table or Alter table.

Enter the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Load Action field. 

Data Type

 

Default Value: N/A

ExampleINT, VARCHAR

String

Load Action is Drop and Create table or Alter table.

Enter the data type of the values in the specified column. 

Use the VARIANT data type for target table columns that correspond to a map (JSON OBJECT) or array (JSON LIST) in your nested Parquet files.

Modifier

 

Default ValueAdd

ExampleDrop

String

Load Action is Alter table.

Select whether you want to add/drop the specified column.

Available options are:

  • Add

  • Drop

Table Option List

This field set enables you to define the table options for creating the target table before performing the load operation. These options vary based on your target CDW. Refer to the Table options for Load action for the complete list of permitted table options. You must specify each table option in a separate row.

  • Table Option

Table Option

 

Default Value: N/A

Example: OPTIONS(custID=03761)

Expression/Suggestion

 

Choose a table option that you want to use from the suggested options. Specify the required values for the selected option.

Redshift Schema Name (Only when BigQuery with Source Location as Redshift)

 

Default Value: N/A

ExamplePUBLIC, DEF_SCHEMA

String/Expression

Database Type is BigQuery and Source Location is Amazon Redshift in the Snap Account.

Specify the schema name of the source table in Redshift.

Redshift Table Name (Only when BigQuery with Source Location as Redshift)

 

Default Value: N/A