ELT Load
In this article
Overview
You can use this Snap to load files/data from Amazon S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or a Databricks Lakehouse Platform (DLP) instance. You can also use this Snap to load or transfer data from:
Amazon S3 buckets to the Amazon Redshift database.
Amazon Redshift or Google Cloud Storage to BigQuery tables.
A DBFS folder to a DLP table.
Refer to ELT Source-Target-Hosting Support Matrix for complete information.
If you want to use the COPY INTO command in ELT Execute Snap for loading data into the target database, you need to pass (expose) these account credentials inside the SQL statement. You can use the ELT Load Snap to prevent this security issue.
Snap Type
The ELT Load Snap is a Write-type Snap that writes/loads data from a source file/table to a table in the destination/target CDW.
Prerequisites
Valid accounts and access permissions to connect to one source and one target in the following source and target locations (Refer to the ELT Source-Target-Hosting Support Matrix for the supported source-target combinations):
Source: Amazon S3, Azure Cloud Storage, DBFS location, or Google Cloud Storage
Target: Snowflake, Redshift, Azure Synapse, DLP, or BigQuery
Limitations
Known Issues
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| A document containing the SQL query that you can use from the upstream ELT Snaps. |
Output | Document |
|
| A document containing the status of the load operation and an SQL query that you can use in any downstream ELT Snaps. |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the field set.
Remove icon ( ): Indicates that you can remove fields from the field set.
Upload icon ( ): Indicates that you can upload files.
SQL Functions and Expressions for ELT
You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol ( ) enabled, where available. This list is common to all supported target CDWs. You can also use other expressions/functions that your target CDW supports.
Field Name | Field Type | Field Dependency | Description | |
---|---|---|---|---|
Label*
Default Value: ELT Load Example: S3 Load | String | None. | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
| |
Get preview data
Default Value: Not selected Example: Selected | Checkbox | None. | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during pipeline validation. The number of records displayed in the preview (on validation) is the smaller of the following:
| |
Database Name
Default Value: N/A Example: TEST_DB | String | None. | Enter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the Account settings. | |
Schema Name (Not applicable to Databricks Lakehouse Platform)*
Default Value: N/A Example: SALES | String | Not applicable to DLP. | Enter the name of the database schema. In case the schema name is not defined, then the suggestions retrieved for the schema name contain all schema names in the specified database. | |
Target Table Name*
Default Value: N/A Example: SALES_ORDERS | String | None. | Enter the name of the table or view in which you want to perform the load operation. | |
Target Table Hash Distribution Column (Azure Synapse Only)
Default Value: N/A Example: Var_table | String/Expression | Applicable to Azure Synapse only. | Specify the hash distribution column name for the target table in Azure Synapse, if you choose the Load Action as Drop and Create table. | |
Load Action
Default Value: Append rows to existing table Example: Alter table | Dropdown list | None. | Select the load action to perform on the target table. Available options are:
| |
Table Columns | This field set enables you to configure the schema of the target table. You can use this field set to add/drop columns in the target table if you select the Drop and Create table or Alter table options in the Load Action field. This field set consists of the following fields:
| |||
Column
Default Value: N/A Example: ID, FNAME | String | Load Action is Drop and Create table or Alter table. | Enter the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Load Action field. | |
Data Type
Default Value: N/A Example: INT, VARCHAR | String | Load Action is Drop and Create table or Alter table. | Enter the data type of the values in the specified column. | |
Modifier
Default Value: Add Example: Drop | String | Load Action is Alter table. | Select whether you want to add/drop the specified column. Available options are:
| |
Table Option List | This field set enables you to define the table options for creating the target table before performing the load operation. These options vary based on your target CDW. Refer to the Table options for Load action for the complete list of permitted table options. You must specify each table option in a separate row.
| |||
Table Option
Default Value: N/A Example: OPTIONS(custID=03761) | Expression/Suggestion |
| Choose a table option that you want to use from the suggested options. Specify the required values for the selected option. | |
Redshift Schema Name (Only when BigQuery with Source Location as Redshift)
Default Value: N/A Example: PUBLIC, DEF_SCHEMA | String/Expression | Database Type is BigQuery and Source Location is Amazon Redshift in the Snap Account. | Specify the schema name of the source table in Redshift. | |
Redshift Table Name (Only when BigQuery with Source Location as Redshift)
Default Value: N/A Example: DataLoad_Source01 | String/Expression | Database Type is BigQuery and Source Location is Amazon Redshift in the Snap Account. | Specify the name of the Redshift source table (from the schema selected above). | |
Cloud Storage Path
Default Value: N/A Example: s3://bigbucket/app_data | String/Expression |
| Specify the fully qualified path to the folder in the Cloud storage in this field, if you want to override the default path defined in the Snap's Account settings. For example:
| |
File List | This field set enables you to specify the staged files to load. You must specify each file in a separate row. Click + to add a new row. This field set consists of the following field:
| |||
File
Default Value: N/A Example: TestData.csv | String | None. | Enter the name of the staged file to load. | |
File Name Pattern
Default Value: N/A Example:
| String | None. | Enter the pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed into use the key-based search mechanism. Refer to Loading Using Pattern Matching for details on Snowflake regular expressions, Examples of COPY INTO (Delta Lake on Databricks) for DLP, or Regular Expressions and the Java Programming Language for using regular expressions with Redshift, Azure Synapse, or BigQuery in this field. | |
File Format Type
Default Value: None Example: AVRO | String/Expression/Suggestion | None. | Select the format for the file. The options vary based on the database type in your account settings. The supported file formats are:
| |
File Format Option List | This field set enables you to define the file format options for the load operation. You must specify each file format option in a separate row.
Refer to Snowflake Data Loading Commands, Redshift Data Format Parameters, COPY INTO (Transact-SQL) for Azure Synapse Analytics, COPY INTO Format options for DLP, or the BigQuery’s Batch load data guide specific to your source file format type for the list of options that you can use based on the target database. | |||
Format Option
Default Value: N/A Example: SKIP_HEADER = 1 | String/Expression/Suggestion | Based on the combination of Database Type and File Format Type. | Choose a file format option that you want to use during the load operation from the suggested values. Specify the required value for the selected option. | |
Encryption Type
Default Value: None Example: Server-Side Encryption | String/Expression/Suggestion | None. | Select the encryption type that you want to use for the loaded data and/or files. Available options are:
| |
KMS Key
Default Value: N/A Example: WC2YY-2BFMV-KHKB3-FDDGQ-8NRQY | String/Expression | Encryption Type is Server-Side KMS Encryption. | Enter the KMS key to use to encrypt the files. | |
Copy Option List | This field set enables you to define the COPY options for the load operation. You must specify each COPY option in a separate row. Click + to add a new row. This field set comprises the following field:
See COPY Options for Snowflake, COPY Parameter Reference for Redshift, COPY INTO (Transact-SQL) for Azure Synapse Analytics, or COPY_OPTIONS under COPY INTO Parameters for DLP. | |||
Copy Option
Default Value: N/A Example: ON_ERROR = ABORT_STATEMENT | String/Expression/Suggestion | None. | Choose a COPY option from the suggested values to use if an error occurs during the load operation. Specify the required value for the selected option. | |
Validation Mode
Default Value: None Example: RETURN_n_ROWS | String/Expression/Suggestion | None. | Select an output preview type for the data validation performed as part of the pipeline validation. The options vary based on the database type (Snowflake/Redshift) in your account settings. Available options for Snowflake are:
Available options for Redshift are:
For more details, refer to Validating Staged Files for Snowflake and Validating Input Data for Redshift. | |
Rows To Return
Default Value: N/A Example: 20 | String (Number) | Validation Mode is RETURN_n_ROWS. | Enter the number of records to retrieve from the target table to validate whether the load operation was successful. | |
Enable source column position (Snowflake and Azure Synapse Only)
Default Value: Not selected Example: Selected | Checkbox | Database Type is Snowflake or Azure Synapse. | Select this checkbox to specify the source table column’s position in the Target to Source Column Map instead of its name. | |
Target to Source Column Map | This field set enables you to define the mapping between the columns in the target table and the corresponding columns in the source file. You must define each mapping in a separate row. The available options are:
| |||
Target Table Column Name Default Value: N/A | String/Expression/Suggestion | None. | Choose a column from the list of target table columns displayed. Ensure that you have defined the Schema Name or the Target Table Name to view the list of columns in this field's suggestions. | |
Source File Column Name Default Value: N/A | String/Expression/Suggestion | None. | Choose a column from the list of source file columns displayed. Ensure that you have defined the Schema Name (or source file path) to view the list of columns in this field's suggestions. | |
Source Column Position (Valid Only For Snowflake and Azure Synapse) Default Value: N/A | Integer/Expression | Database Type is Snowflake or Azure Synapse. | Specify the column’s position in the source file. |
Table options for Load action
Using file format options with ELT Load
Here are a few common file format options that can be used for the corresponding CDW when loading data from CSV files in a storage location to a table in that CDW.
Target CDW | Exclude header row from the input data stream | Specify row delimiters |
---|---|---|
Azure Synapse | FIRSTROW = 2 | ROWTERMINATOR = ' ROWTERMINATOR = ' |
DLP | 'header' = 'true' | lineSep=None ( |
Google BigQuery | skip_leading_rows = 1 | N/A (allow_quoted_newlines = true if there are quoted data sections in a CSV file, that contain newline characters.) |
Redshift | IGNOREHEADER 1 | N/A |
Snowflake | SKIP_HEADER = 1 | RECORD_DELIMITER = '<character>' | NONE |
Load data from AVRO/ORC/JSON/Parquet files
Data in your AVRO/ORC/JSON/Parquet files can exist as any combination of the following structures: Flat data, nested lists, and nested maps.
Flat file fields contain data in a simple two-dimensional spreadsheet-like format (Level 0). Such fields can be read and loaded using the standard mapping and loading methods.
File fields in nested lists or maps contain structured and hierarchical data where values for each field can be:
A series of repeated fields (Level 1) or
A series of values (Level 1)
Another set of key-value pairs (Level 2), or
A set of keys with each or some of them containing a nested set of key-value pairs (Level 3), and so on.
Different Parquet file structures
The table below depicts the possible schema variations within a Parquet file with examples of source table data. It also indicates how to reference the fields (keys in the source files) and map them with the corresponding target table columns to load your data.
Source Files (AVRO/ORC/JSON/Parquet) | Canonical format | Non-canonical format | ||
---|---|---|---|---|
Flat data | Nested List | Nested Map | Nested Map | |
Sample schema (column definitions) | Field(name: "name", datatype: String)
Field(name: "age", datatype: Int32)
Field(name: "city", datatype: String) | Field(name: "tourist", datatype: Struct[
Field(name: "name", datatype: String),
Field(name: "visits", da |