Skip to end of banner
Go to start of banner

ELT Load

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 95 Next »

In this article

Overview

Use this Snap to load files/data from AWS S3 buckets or Azure Cloud Storage containers to a Snowflake, Azure Synapse database, or Databricks Lakehouse Platform (DLP). You can also use this Snap to transfer files from the AWS S3 buckets to the Redshift database, and to load data from Amazon Redshift or Google Cloud Storage to BigQuery.  See ELT Source-Target-Hosting Support Matrix for complete information. 

If you want to use the COPY INTO command in ELT Execute Snap for loading data into the target database, you need to pass (expose) these account credentials inside the SQL statement. You can use the ELT Load Snap to prevent this security issue.


Prerequisites

Valid accounts and access permissions to connect to the following:

  • Source: AWS S3, Azure Cloud Storage, or Google Cloud Storage

  • Target: Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery

Limitations

  • ELT Snap Pack  does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.
  • Unlike other Write-type Snaps, the Snap does not display the end SQL query in its output preview when connected to a BigQuery instance. This is because BigQuery makes use of APIs instead of an SQL query to load data.
  • If the CSV file you want to load into a DLP instance contains null values, you need to define the Format Option in the File Format Option List fieldset as sep = ','  without which the load operation does not succeed.
  • Due to a limitation in Snowflake in inferring the source files (and thereby the source schema) based on file name patterns in a Storage Integration-based session, the Snap fails to load data from any supported source location into your Snowflake target instance based on any File Name Pattern that you specify. As a workaround, we recommend that you use the File List field set to define the source data file locations for the Snap to load the data.
  • Snowflake does not support COPY statements that result in data transformation in any of the validation modes it allows. Therefore, the ELT Load Snap fails during Pipeline validation when the number of columns you define in the Target to Source Column Map field set is not the same as the number of columns in the target table. But the same Pipeline runs successfully because this limitation does not apply to Pipeline execution.

    • As a workaround for this limitation, ensure that you specify error_on_column_count_mismatch=false in the Format Option field under the File Format Option List field set.

Known Issues

  • Due to an issue with the Redshift COPY INTO SQL for PARQUET input files table schema management, the Snap fails to perform ALTER TABLE actions before loading data from a PARQUET source file to the target table in Redshift. This issue does not occur when the source file is in CSV or another format.

  • ELT Pipelines built using an ELT Load Snap fail to perform the DROP AND CREATE TABLE and ALTER TABLE operations on Delta tables when working with a Databricks SQL persona on the AWS Cloud with the error Operation not allowed: ALTER TABLE RENAME TO is not allowed for managed Delta tables on S3. However, the same action runs successfully with the Data Science and Engineering persona on the AWS Cloud.
    • This is due to the limitation with the Databricks SQL Admin Console that does not allow you to add the configuration parameter spark.databricks.delta.alterTable.rename.enabledOnAWS true to its SQL Warehouse Settings.

  • Because BigQuery does not support Automatic Schema Inferencing (ASI) from source files in Amazon S3, the ELT Load Snap fails to produce the required output when you choose Drop and Create table as the Load Action and define the Target to Source Column Map in the Snap’s settings. However, the same configuration works fine for Pipeline validation purposes.

  • The ELT Load Snap cannot automatically infer schema from GZIP-compressed source files located in an ADLS Gen2 storage location for loading the data into Snowflake target tables.

  • The ELT Load Snap fails to complete the load operation on the target Snowflake instance when you choose to automatically infer the schema from the source data available in Microsoft Azure Storage Blob (WASB), specified as a File Name Pattern, and accessed using the Source Location Credentials. The Snap returns the error: Azure null Storage Account incorrectly configured. But the same Snap configuration with the source files specified using File List instead of a File Name Pattern succeeds without any issues.

  • Due to an issue with BigQuery table schema management (the time travel feature), an ALTER TABLE action (Add or Update column) that you attempt after deleting a column (DROP action) in your BigQuery target table causes the table to break and the Snap to fail.

    • As a workaround, you can consider either avoiding ALTER TABLE actions on your BigQuery instance or creating (CREATE) a temporary copy of your table and deleting (DROP) it after you use it.

  • Due to an issue with the Simba Spark JDBC driver for DLP, you cannot insert a NULL value in the nullable columns of Boolean data type in your DLP instance using any of the Write-type Snaps—ELT Load, ELT SCD2, ELT Insert Select, ELT Merge Into, and ELT Execute.

    • The only workaround currently available for this issue is to upgrade your JDBC driver to databricks-jdbc-2.6.25-1.jar, and use the corresponding JDBC driver class (com.databricks.client.jdbc.Driver) and JDBC URL in your Snap account.

      This latest JDBC driver for DLP uses a JDBC URL structure and driver class that is different from the Simba Spark JDBC driver.


  • You cannot add a column to your BigQuery target table with a deleted column name using the ELT Load Snap, as BigQuery reserves deleted column names and data until the pre-configured time travel duration (from 2 through 7 days).

  • When loading data from a CSV file to a target DLP table, the header names in the file must exactly match the column names in the target table. Otherwise, the Snap returns the error—Column names should be the same in the target table and CSV file and aborts the load operation.

  • When you are enclosing the column names within backticks (`<Column_Name>`) for creating a new target table in a DLP instance using the ELT Load Snap—typically to ensure that the column names are used verbatim—you may encounter the error: ERROR processing query/statement: no viable alternative at input 'FROM'(line X, pos XX). To prevent this error, edit the header in your source file to exclude any special characters, thereby avoiding the use of backticks in the target table column names.

  • When your Databricks Lakehouse Platform instance uses Databricks Runtime Version 8.4 or lower, ELT operations involving large amounts of data might fail due to the smaller memory capacity of 536870912 bytes (512MB) allocated by default. This issue does not occur if you are using Databricks Runtime Version 9.0.
  • ELT Pipelines created prior to 4.24 GA release using one or more of the ELT Insert Select, ELT Merge Into, ELT Load, and ELT Execute Snaps may fail to show expected preview data due to a common change made across the Snap Pack for the 4.26 GA release. In such a scenario, replace the Snap in your Pipeline with the same Snap from the Asset Palette and configure the Snap's Settings again.
  • When loading data from a JSON file into a target Databricks Lakehouse Platform (DLP) instance using this Snap, if you choose the Drop and Create Table option as the Load Action and specify an additional column (that is not available in the JSON file) for the new table, it results in one more column null added to the new target table. In case of CSV files, this Snap behaves the same way as mentioned above, but only when the CSV file uses a comma separator.
  • Suggestions displayed for the Schema Name field in this Snap are from all databases that the Snap account user can access, instead of the specific database selected in the Snap's account or Settings.
  • For Redshift and Azure Synapse databases, the ELT Load Snap displays JSON values for fields with datatypes Date and Time, when validating the Pipeline.

    SnowflakeRedshiftAzure Synapse
    ”time_data” : “09:23:56”,”time_data” : {“time”: “09:23:56”},”time_data” : {“time”: “09:23:56”},
    ”date_data” : “2020-12-20”,”date_data” : “2020-12-20”,”date_data” : {“date”: “2020-12-20”},

    However, the ELT Load Snap outputs the correct date and time values when executing the Pipeline.

    To prevent these values from showing up in JSON format in the previews and ELT Transform Snap, you can cast the time or date values to a VARCHAR type in your custom query as follows: 
    CAST (<time_column_type_name> AS VARCHAR(<size_based_on_precision_required>)).
    Example: CAST (time_data AS VARCHAR(20)).

  • If you have dropped one or more columns from your target table outside of the Snap and then use it again to add these columns back and load data into the table, the Snap adds the new columns to the target table, but not the corresponding data from the source file.

  • The ELT Load Snap does not infer accurate Date & Timestamp data types for Redshift.

In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):

  • WHERE clause (ELT Filter Snap)
  • WHEN clause
  • ON condition (ELT Join, ELT Merge Into Snaps)
  • HAVING clause
  • QUALIFY clause
  • Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
  • Update expressions list (column names and values in ELT Merge Into Snap)
  • Secondary AND condition
  • Inside SQL query editor (ELT Select and ELT Execute Snaps)

Workaround

As a workaround while using these SQL query constructs, you can:

  • Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
  • In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like SELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:
    • Write the data in the CSV file to a DLP table beforehand, as in: CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.
    • Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
  • In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.

Snap Input and Output

Input/OutputType of ViewNumber of ViewsExamples of Upstream and Downstream SnapsDescription
Input 

None

  • Min: 0
  • Max: 0
N/AThis Snap does not require any input from upstream Snaps.  
Output

Document

  • Min: 1
  • Max: 1
  • ELT Transform
  • ELT Copy
A document containing the status of the load operation and an SQL query that you can use in downstream ELT Snaps if any. 

Snap Settings

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

Parameter NameData TypeDescriptionDefault ValueExample 
LabelString
Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
ELT LoadS3 Load
Get preview dataCheck box

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in Databricks Lakehouse Platform

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Not selected Selected
Database NameStringEnter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings.N/ATESTDB
Schema Name (Not applicable to Databricks Lakehouse Platform)String

Required. Enter the name of the database schema. In case the schema name is not defined, then the suggestion for the schema name retrieves all schema names in the specified database when you click the suggest button.

  • Ensure that you include the exactly same schema name including the double quotes, if used, when you repeat the schema name in the Target Table Name field.
  • Leave this field blank if your target database is Databricks Lakehouse Platform.

N/ASALES
Target Table NameString

Required. Enter the name of the table or view in which you want to perform the load operation. 

Only views that can be updated (have new rows) are listed as suggestions. So, Join views are not included. This also implies that the Snap account user has the Insert privileges on the views listed as suggestions.


  • Ensure that you include the exactly same schema name, if at all, including the double quotes as specified in the Schema Name field.

  • If the target table or view does not exist during run-time, the Snap creates one with the name that you specify in this field and writes the data into it. During Pipeline validation, the Snap creates the new table or view but does not write any records into it.

    • The new table or view thus created will not be dropped if a subsequent/downstream Snap failure occurs during validation.

  • You must use double quotes (““) when including special characters, such as hyphens (-) in the name.

  • A table or view name must always start with a letter.

  • Integers and underscores (_) can also be a part of the table or view name.

  • All characters are automatically converted to uppercase by the Snap. Use double-quotes to retain lowercase.

N/ASALES_ORDERS
Target Table Hash Distribution Column (Azure Synapse Only)String/Expression

Specify the hash distribution column name for the target table in Azure Synapse, if you choose the Load Action as Drop and Create table.

This field is applicable only for Azure Synapse. Azure Synapse needs a table to be always hash-distributed for improved query performance. If the target table is created outside the Snap, you need not specify the target table column name.

N/Avar table
Load ActionDrop-down list

Select the load action to perform on the target table.

Available options are:

  • Append rows to existing table. Loads the data from the source files into the specified target table.
  • Overwrite existing table. Overwrites the target table with the data from the source files.
  • Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Data Type field in the Table Columns field set.
  • Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Data Type and Modifier fields in the Table Columns field set. 

The Snap does not modify the existing tables during Pipeline validation, but if you choose the Drop and Create table option in the Load Action field and the target table does not exist, it creates a new (empty) target table based on the schema specified in the Snap.

Append rows to existing tableAlter table
Table Option List

This field set enables you to define the table options for creating the target table before performing the load operation. You must specify each table option in a separate row. Click to add a new row.

  • Table Option
Table OptionExpression/Suggestion

Click the Suggest icon () and choose a table option that you want to use, from the suggested values. Specify the required values for the selected option.

The table options you can use are:

  • PARTITION BY
  • CLUSTER BY
  • OPTIONS
  • COMMENT
N/AOPTIONS(custID=03761)
Table Columns

This field set enables you to configure the schema of the target table. You can use this field set to add/drop columns in the target table if you select the Drop and Create table or Alter table options in the Load Action field. 

This field set consists of the following fields:

  • Column
  • Data Type
  • Modifier
ColumnStringEnter the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Load Action field. N/A

ID

FNAME

Data TypeString

Activates when you select the Drop and Create table or Alter table options in the Load Action field.

Enter the data type of the values in the specified column. 

N/A

INT

VARCHAR

ModifierString

Activates when you select the Alter table option in the Load Action field.

Select whether you want to add/drop the specified column.

Available options are:

  • Add
  • Drop
AddDrop
Redshift Schema Name (Only when BigQuery with Source Location as Redshift)String/Expression

Applicable only when you are loading data from a Redshift source to a BigQuery target.

Specify the schema name of the source table in Redshift.

N/APUBLIC, DEF_SCHEMA
Redshift Table Name (Only when BigQuery with Source Location as Redshift)String/Expression

Applicable only when you are loading data from a Redshift source to a BigQuery target.

Specify the name of the Redshift source table (from the schema selected above).

N/ADataLoad_Source01
Cloud Storage PathString/Expression

Specify the fully-qualified path to the folder in the Cloud storage in this field, if you want to override the default path defined in the Snap's Account settings. For example:

  • AWS S3 path: s3://<bucket name>/<folder>
  • DBFS folder: /<dbfs folder name>
  • Azure: azure://<accountname>.blob.core.windows.net/<containername>/<azurefolder>
  • Google Cloud Storage: gcs://<bucketname>/<folder>/
N/As3://bigbucket/app_data
File List

This field set enables you to specify the staged files to load. You must specify each file in a separate row. Click + to add a new row.

Use File List for up to 2 files

For the best performance of your load operation, we recommend that you use the File Name Pattern field to load data from multiple files at a source location (S3 or Redshift) instead of specifying the individual file names/paths in the File List fieldset.


This field set consists of the following field:

  • File 
FileStringEnter the name of the staged file to load.N/ATestData.csv
File Name PatternString

Enter the pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed in single quotes.

For Redshift tables

  • Redshift does not natively support use of regular expressions. But, Snaps in the ELT platform provide you with this ability for key-based search mechanism. 
  • If your search is based on AWS IAM roles, you cannot use regular expressions with Redshift tables.

Using Wildcard Patterns in BigQuery

BigQuery has its own set of rules to resolve wildcard patterns like *.* (that you provide in this field). Refer Batch loading data > Load Wildcards in BigQuery documentation for the rules around using wildcards in BigQuery.

See Loading Using Pattern Matching for details on Snowflake regular expressions, Examples of COPY INTO (Delta Lake on Databricks) for DLP, or Regular Expressions and the Java Programming Language for using regular expressions with Redshift, Azure Synapse, or BigQuery in this field.

N/A

Snowflake: '.*key.*[.]csv'

Redshift: './key.*csv'

File Format TypeString/Expression/Suggestion

Select the format for the file. The options vary based on the database type (Snowflake/Redshift/Azure Synapse) in your account settings.

The supported file formats are: 

  • Snowflake: AVRO, CSV, JSON, ORC, PARQUET, XML
  • Redshift: AVRO, CSV, JSON, ORC, PARQUET
  • Azure Synapse: CSV, ORC, PARQUET
NoneAVRO
File Format Option List

This field set enables you to define the file format options for the load operation. You must specify each file format option in a separate row. Click + to add a new row.

  • Format Option

See Snowflake Data Loading Commands, Redshift Data Format Parametersor COPY INTO (Transact-SQL) for Azure Synapse Analytics for the list of of options that you can use based on the target database.

Auto-Inferencing Source Table Schema

ELT Load Snap allows you to infer the table schema from the source files and then apply the auto-inferred schema to your target tables. As of the July 2022 release, this feature is supported on the Redshift target CDW and with the source files in an S3 location. Learn more at Automatic Schema Inference with ELT Load Snap.

Format OptionString/Expression/Suggestion

Click the Suggest icon () and choose a file format option that you want to use during the load operation, from the suggested values. Specify the required value for the selected option.

Loading a CSV file with nulls into DLP

If the CSV file you want to load into a DLP instance contains null values, ensure that you define the Format Option in the File Format Option List fieldset as sep = ','  without which the load operation does not succeed.

N/ASKIP_HEADER = 1
Encryption TypeDrop-down list

Select the encryption type that you want to use for the loaded data and/or files.

Available options are:

  • None
  • Server-Side Encryption (Not supported by Azure Synapse)
  • Server-Side KMS Encryption (Not supported by Azure Synapse)

Server-side encryption is available only for S3 accounts and Snowflake.

NoneServer-Side Encryption
KMS KeyString

Activates when you select the Server-side KMS Encryption option in the Encryption Type field.

Enter the KMS key to use to encrypt the files. 

Not required for Redshift databases. The load operation automatically recognizes and loads the encrypted files. See Loading encrypted files from Amazon S3 for details. 

N/A<require input>
Copy Option List

This field set enables you to define the COPY options for the load operation. You must specify each COPY option in a separate row. Click + to add a new row.

This field set comprises the following field:

  • Copy Option

See COPY Options for Snowflake, COPY Parameter Reference for Redshift, or COPY INTO (Transact-SQL) for Azure Synapse Analytics.

Copy OptionString/Expression/SuggestionClick the Suggest icon () and choose a COPY option that you want to use if there is an error during the load operation, from the suggested values. Specify the required value for the selected option.N/AON_ERROR = ABORT_STATEMENT
Validation ModeString

Select an output preview type for the data validation performed as part of the Pipeline validation. The options vary based on the database type (Snowflake/Redshift) in your account settings.

Available options for Snowflake are:

  • None
  • RETURN_n_ROWS
  • RETURN_ERRORS
  • RETURN_ALL_ERRORS

Available options for Redshift are:

  • None
  • NOLOAD

No specific validation modes are available for Azure Synapse.

For more details, see Validating Staged Files for Snowflake and Validating Input Data for Redshift.

NoneRETURN_n_ROWS
Rows To ReturnInteger

Activates when you select the RETURN_n_ROWS option in the Validation Mode field.

Enter the number of records to retrieve from the target table to validate whether the load operation was successful.

N/A20
Enable source column position (Snowflake and Azure Synapse Only)Checkbox

Select this checkbox to include the column position from the source table in the case of Snowflake or Azure Synapse.

In the case of other target CDWs (Redshift, DLP, and BigQuery), you can leave this checkbox unchecked.

Not selectedSelected
Target to Source Column Map

This field set enables you to define the mapping between the columns in the target table and the corresponding columns in the source file. You must define each mapping in a separate row. Click to add a new row.

The available options are:

  • Target Table Column Name
  • Source File Column Name
Target Table Column NameString/Expression/Suggestion

Click the Suggest icon ()  and choose a column from the list of target table columns displayed. Ensure that you have defined the Schema Name or the Target Table Name to see the list of columns in this field's suggestions.

N/AORDER_ID
Source File Column NameString/Expression/Suggestion

Click the Suggest icon ()  and choose a column from the list of source file columns displayed. Ensure that you have defined the Schema Name or source file path) to see the list of columns in this field's suggestions.

N/AORD_ID

Snap Behavior During Pipeline Validation

This Snap's behavior during Pipeline validation varies for each Load Action selected.

Load ActionTarget Table exists?Validation ResultAction on Target Table#
Append rows to existing tableNoSuccessCreates an empty target table based on the schema information available in the Snap settings.
YesSuccessNo updates made.
Overwrite existing tableNoSuccessCreates an empty target table based on the schema information available in the Snap settings.
YesSuccessNo updates made.
Drop and Create tableNoSuccessCreates an empty target table based on the schema information available in the Snap settings.
YesSuccessNo updates made.
Alter tableNoSuccessCreates an empty target table based on the schema information available in the Snap settings.
YesSuccessNo updates made.
Bulk load operations are not carried out during Pipeline validation.

Snowflake

The data is not loaded into the Snowflake table. It is just validated using the RETURN_n_ROWS validation mode. The number of records in the output is determined by the preview count set in the Pipeline settings. The Snap creates the target table (if not existing already) in the database when you select Load Action as Drop and Create the table. For all other load actions, the Snap verifies if the specified target table exists in the database.

Redshift

The data is not loaded into the Redshift table. The files in S3 are just validated using the NOLOAD validation mode to verify data integrity of the files. The target table is also affected based on your selection in the Load Action field. The Snap creates the target table (if not existing already) in the database when you select Load Action as Drop and Create table. For all other load actions, the Snap verifies if the specified target table exists in the database.

Azure Synapse

The data is not loaded into the Azure Synapse table. The Snap creates the target table (if not existing already) in the database when you select Load Action as Drop and Create table. For all other load actions, the Snap verifies if the specified target table exists in the database.

Troubleshooting

ErrorReasonResolution
Time '"01:01:23"' is not recognized.The CSV file you are trying to load into the Snowflake database contains single or double quotes for time and/or date values. Hence, the ELT Load Snap is unable to read and insert these values into the target table.After you specify the CSV files in the ELT Load Snap’s settings, add an entry in the File Format Options field set as FIELD_OPTIONALLY_ENCLOSED_BY='\'' or FIELD_OPTIONALLY_ENCLOSED_BY='\”' to match the quotes used in the CSV files for date and/or time values in it. The Snap would then ignore these quotes used for the values in the CSV files.
File read error while uploading data from a CSV fileThis can happen when the source CSV file contains a header row which the target database cannot recognize/interpret. 

Depending on your target database, use the respective File Format Option in the ELT Load Snap.

  • For SnowflakeSKIP_HEADER = 1
  • For Azure SynapseFIRSTROW = 2
  • For Databricks Lakehouse Platform'header' = 'true'

Invalid DBFS folder name

(Target CDW: Databricks Lakehouse Platform)

You have not specified an appropriate DBFS path in the DBFS Folder path field in the Snap's (DLP) account settings.Edit the Snap's account settings to ensure that you have provided a valid DBFS Folder path.

Database cannot be blank.

(when seeking the suggested list for Schema Name field)

Suggestions in the Schema Name and Target Table Name fields do not work when you have not specified a valid value for the Database Name field in this Snap.

Specify the target Database Name in this Snap to view and choose from a suggested list in the Schema Name and Target Table Name fields respectively.

Unknown Source / Not found: <path>

(Target CDW: BigQuery)

BigQuery has a unique way of resolving wildcard patterns like *.* (that you provide in the File or File Name Pattern fields of this Snap).

Make sure that the file name pattern you provide conforms to the rules defined in Batch loading data > Load Wildcards section of BigQuery documentation.

Loading data into a Redshift table using the RegEx pattern *.* in File Name Pattern field results in the operation being performed twice.

Since the Redshift Copy Into SQL does not natively support File Pattern option unlike the other CDWs, SnapLogic supports this externally through Java Regular Expressions (RegEx).

The RegEx *.* returns matching results from the root location instead of the folder location specified before *.* in the File Name Pattern field.

Use the RegEx csv/person/.+\..+ in the File Name Pattern field to load data from all the files available in a specific source folder.

Keyword RANGE is not acceptable as a column name.

(CDW: Databricks Lakehouse Platform)

This can happen with any reserved keyword if it is used as a column/field name in the table to be created.Ensure the enclose such column names (reserved keywords) between backticks (`). For example: `RANGE' STRING.

Keyword ORDER is not acceptable as a column name in the JSON file.

(Target CDW: BigQuery)

This can happen with any reserved keyword if it is used as a column/field name in the table to be created.Ensure that you enclose such column names (reserved keywords) between backticks (`). For example: `ORDER`.
SQL exception from Snowflake: Syntax error in one or more positions in the SQL query.
Column names in Snowflake tables are case-sensitive. It stores all columns in uppercase unless they are surrounded by quotes during the time of creation in which case, the exact casing is preserved. See, Identifier Requirements — Snowflake Documentation.Ensure that you follow the same casing for the column table names across the Pipeline.

[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0

Cannot create table ('<schema name>`.`<table name>`'). The associated location (`…<table name>`) is not empty but it's not a Delta table

(Target CDW: Databricks Lakehouse Platform)

The specified location contains one of the following:

  • A non-Delta table (such as CSV, ORC, JSON, PARQUET)

  • A corrupted table

  • A Delta table with a different table schema

So, the Snap/Pipeline cannot overwrite this table with the target table as needed.

Ensure that you take appropriate action (mentioned below) on the existing table before running your Pipeline again (to create another Delta table at this location).

Move or drop the existing table from the schema manually using one of the following commands:

Access the DBFS through a terminal and run:

  • dbfs mv dbfs:/<current_table_path> dbfs:/<new_table_path> to move the table or

  • dbfs rm -r dbfs:/<table_path> to drop the table.

OR

Use a Python notebook and run:

  • dbutils.fs.mv(from: String, to: String, recurse: boolean = false): boolean to move the table/file/directory or

  • dbutils.fs.rm(dir: String, recurse: boolean = false): boolean to drop the table/file/directory.

Column names should be the same in the target table and CSV file. 

(Target: DLP)

One or more header names in your source CSV file do not match with the corresponding target table column names. Ensure that each of the column names in your target table exactly matches the corresponding header name used in the CSV file or vice versa before you retry running the Pipeline.
Error code 400: Column <column_name> was recently deleted in the table <table_name>.BigQuery reserves deleted column names for up to the time travel duration (2 through 7 days) as set by your DBA.Use a different column name to complete your Load action.

Examples

To skip the header row while uploading data from a CSV file, use the following File Format Options that suit your target database.

Target File Format Options to use
Azure Synapse 

FIRSTROW = 2

Databricks Lakehouse Platform'header' = 'true'
Google BigQueryskip_leading_rows = 1
RedshiftIGNOREHEADER 1
SnowflakeSKIP_HEADER = 1

Loading Files from S3 Bucket to Snowflake

We need to perform Snowflake ELT transformations upon a file present in an S3 bucket. Therefore, we must first load the file into the Snowflake database from the source location. This Pipeline shows how we can use the ELT Load Snap to accomplish this task.

Before we can load the file, we must configure the ELT account to read from the source S3 bucket by configuring the Source Location, Authentication, Access Key, Secret Key, S3 Bucket, and S3 Folder fields. The fields vary based on the source location and authentication. See Configuring ELT Database Accounts for details. 

We configure the ELT Load Snap as shown below:

Let us understand the Snap's configuration:

  • The Database Name, Schema Name, and Target Table Name specify the target database, schema, and table into which we want to write the file's data.
  • Table Action indicates that if a table with the same name exists, we configure the Snap to drop the table and create a new one in its place.
  • Since this is a new table, the Table Columns field set specifies the table's schema.
  • The File List field set specifies the source file's path in the S3 bucket and the file's format.
  • The Format Option field set specifies that the Snap must ignore the file's header. Otherwise, the header is also loaded into the target table.
  • The Copy Option field set specifies that the Snap must abort the loading operation if it encounters an error.
  • The Validation Mode and Rows to Return fields specify that 10 rows from the target table be retrieved upon successful completion.

To validate a successful execution, the ELT Load Snap executes a SELECT * command on the target table and returns 10 rows:

We can either create a separate Pipeline to perform ELT operations on the table or add appropriate Snaps downstream of the ELT Load Snap in the same Pipeline.

Download this Pipeline.

Uploading data from a CSV File in Azure Data Lake Gen2 Storage to Azure Synapse

The following one-Snap (ELT Load) Pipeline connects to an Azure Data Lake Gen2 Storage using an ADLS Gen2 Basic Auth account to upload a CSV file from the storage to an Azure Synapse database. This CSV file is created on MacOS with the default line endings.

Account ConfigurationSnap Configuration

For the Pipeline to work as expected in case of the CSV file created on MacOS with the default (MacOS) line endings, define the value for the file format option ROWTERMINATOR as '0X0A' instead of the usual '\n'. With '\n' as ROWTERMINATOR, the Pipeline runs successfully, but does not insert the data from the CSV file into the newly-created table in Azure Synapse.

Snap Output with ROWTERMINATOR = '\n'

Snap Output with ROWTERMINATOR = '0X0A'

Download this Pipeline.

Downloads

Important Steps to Successfully Reuse Pipelines

  1. Download and import the Pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide Pipeline parameters as applicable.

  File Modified
You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.
No files shared here yet.
  • Drag and drop to upload or browse for files
  • Snap Pack History

     Click here to expand...

    Release

    Snap Pack Version 

    Date

    Type

    Updates

    February 2024436patches25953  Latest

    Enhanced the ELT Load Snap’s capabilities to allow loading flat and nested data sets (from canonical and non-canonical formats) from your Parquet files to the target tables in Snowflake. Learn more about the usage of this feature at Load data from Parquet files and in the following example Pipelines:

    February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
    November 2023435patches24461 LatestFixed an issue with the ELT Merge Into Snap that caused inaccurate aliases and table identifiers in its generated SQL statement.
    November 2023435patches23671 Latest and Stable
    • Enhanced the ELT Merge Into Snap to support defining the update expressions for updating tables in your target Databricks Lakehouse Platform instance using the WHEN NOT MATCHED BY SOURCE clause, besides other supported clauses and expression lists.

    November 2023main23721 Stable
    • Enhanced the ELT Merge Into and ELT SCD2 Snaps to display detailed pipeline execution statistics about the data loaded to target tables on a Databricks Lakehouse Platform instance.

      • In addition to the individual counts of rows inserted, updated, and deleted that the ELT Merge Into Snap covers, the ELT SCD2 Snap also reports the count of source rows rejected.

    August 2023main22460 Stable
    • Upgraded the JDBC driver support for Snowflake to snowflake-jdbc-3.13.33.jar.

    • Updated the ELT Snowflake Account to append the connection parameter application in its JDBC URL with the value SnapLogic_iPaaS.

    May 2023N/A Stable

    Fixed a null pointer exception so no 5XX errors can occur if you download non-existent query details from the Pipeline Execution Statistics of an ELT (write-type) Snap.

    May 2023main21015 StableUpgraded with the latest SnapLogic Platform release.
    February 2023432patches20978 LatestFixed an issue with the ELT SCD2 Snap where the COLLATE column constraint (used in the new target table definition for Snowflake) resulted in an incorrect syntax internally, causing the pipeline to fail. The load operation succeeds with this fix.
    February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
    November 2022431patches19240

     

    Latest
    • SnapLogic upgraded the default JDBC Driver versions used to connect the ELT Snaps with the supported CDWs.
    • The ELT Merge Into Snap displays the individual record counts inserted, updated, and deleted for Snowflake targets in the respective Records Inserted, Records Updated, and Records Deleted parameters of the Snap Statistics tab (on Pipeline execution).
    • With Google deprecating the OAuth out-of-band (OOB) flow, the Refresh Token Accounts defined for connecting your ELT Snaps to the BigQuery instances start failing in a phased manner. We recommend that you immediately modify these Snap account configurations to switch to an Access Token Account or a Service Account from the Refresh Token Account.
    November 2022main18944 Stable

    ELT Insert-Select, ELT Merge Into, and ELT SCD2 Snaps show the following statistics on execution.

    • Records Added

    • Records Updated

    • Records Deleted

    September 2022 430patches18196 Latest

    New Snap

    The ELT Create View Snap enables you to create a new view when the view does not exist in the target database and/or schema or if the view already exists in the database and/or schema, and you choose to drop the existing view and re-create it.

    Enhancements

    • The ELT Insert-Select Snap is more flexible and easier to use, especially if the number of columns in your source data set is very large. You can choose to update values only in a subset of columns in the target table.

    • The ELT Execute Snap can retrieve and execute SQL queries from the upstream Snap's output when referenced in the SQL Statement Editor using the Expression language (with the Expression button enabled).

    • The ELT Load Snap can infer the schema from the source files in Amazon S3, ADLS Gen2, Microsoft Azure Blob Storage, or Google Cloud Storage location and use it to create, overwrite, and append the target table in your Snowflake instance with the source data. The source files can be in the AVRO, CSV, JSON, ORC, or PARQUET format. Learn more at Automatic Schema Inference with ELT Load Snap.

    • Target Table Name in the following Snaps supports retrieving editable views with the table names from the selected target schema:

    • The pivot values in the ELT PivotSnap turns dynamic when you select Enable dynamic pivot values. The following field settings are added as part of this dynamic pivot values feature:

      • Filter Predicate List: A field set to filter the predicate list of the pivot values.

        • Pivot Values Filter: Condition required to filter the pivot values.

        • Boolean Operator: Predicate condition type through AND or OR Boolean operators

      • Sort Order: Sorting order of the pivot values.

    • You can specify the type of Microsoft Azure external storage location (source)—an Azure Data Lake Gen2 or a Blob Storage—to access your source data using the Storage Integration type of authentication and load it to your target Snowflake instance.

    August 2022main17386 StableUpgraded with the latest SnapLogic Platform release.
    4.29-Patch429patches16665 Latest
    • Enhanced the ELT Snap Pack to support the latest JDBC drivers across CDWs—Azure Synapse, BigQuery, DLP, Redshift, and Snowflake. See Configuring ELT Database Accounts or the respective Account page for the exact versions.

    • Enhanced the ELT Pivot Snap to make the Value List field dynamic.

    • Enhanced the ELT DLP Account to configure S3 Bucket, Azure Storage, and DataLake Storage Gen2 Mounts.
    • Enhanced the ELT Snowflake Account with support for Key Pair Authentication.

    • Enhanced the ELT SCD2 Snap:

      • To include a new option Overwrite existing table in the Target Table action field.

      • To display the final SQL query in its output preview upon Pipeline validation.

    • Enhanced the end Snap SQL query in the ELT Insert Select Snap’s preview output to display the CREATE TABLE... or the DELETE/DROP TABLE statements to be run before the query that inserts/loads data into a new table in the Snowflake target CDW.

    • Fixed an issue with ELT Insert Select and ELT Merge Into Snaps where they cause the Pipeline to fail when the specified target table does not exist. After this fix, the Snaps create a new target table if it does not exist during Pipeline validation.

      The new table that is created will not be dropped in the event of a subsequent/downstream Snap failing during validation.

    • Fixed the issue with ELT Load Snap where the Snap caused an SQL exception—[Simba][SparkJDBCDriver](500051) ERROR processing query/statement when reading from a CSV file in S3 mount point on DBFS in the case of a DLP target instance.

    • The ELT Insert Select, ELT Merge Into, ELT Load, and the ELT SCD2 Snaps now run successfully even when the specified target table does not exist. These Snaps now create a new target table if it does not exist during Pipeline validation.

      The new table thus created will not be dropped in the event of a subsequent/downstream Snap failure during validation.

    • Updated the field names in ELT Aggregate, ELT Cast Function, ELT String Function, and ELT Transform Snaps to maintain consistency.
    4.29-Patch

    4.29patches16287

     Latest

    Fixed an issue with the ELT SCD2 Snap where the Snap was rounding off decimal values to the nearest integer—the value 57.601000000000 in the source table was written to the target table as 58.000000000.

    4.29main15993 Stable
    • Introduced the following new ELT Snaps:
      • ELT Cast Function: Snap to convert a data type of a column in the input SQL string into other supported data types.
      • ELT String Function: Snap to support the various string functions supported by the different databases.
      • ELT Router: Snap to enable routing input SQL queries into multiple output views based on the given conditional expressions.
    • Enhanced the following Snaps to display the final SQL query in their output preview upon Pipeline validation.
    • Enhanced the ELT Database Account to support OAuth2-based authentication on the target Snowflake database.

    • Enhanced the ELT Select, ELT Insert Select, ELT SCD2, ELT Merge Into, and ELT Load Snaps to display suggestions on the Schema Name field based on the Default Database Name provided in the Snap Account configuration when the Database Name is not specified in the respective Snap.

      • Improved usability of the suggestions features for these Snaps by making them case-insensitive. For example, typing default in the Schema Name field displays both default and DEFAULT, if they co-exist. You do not need to type DEFAULT to invoke and select the schema name DEFAULT from the suggestions list.

    • Enhanced the ELT SCD2 Snap to address different feature requests and issues raised by multiple customers. These changes provide more flexibility in configuring your SCD2 operations using this Snap.

    • Removed Check for nulls and duplicates in the source field and added two dropdown lists - Null Value Behavior and Invalid Row Handling.

    • Made the following items in the Meaning field of the Target Table Temporal Fields fieldset mandatory while making the Invalid historical rows flag optional.

      • Current row 

      • Historical row

    • Enhanced the ELT Aggregate and ELT Window Functions Snaps to support the following functions across all supported CDWs:

      • KURTOSIS

      • MODE

      • SKEW

    • Enhanced the ELT Aggregate Snap to support the following GROUP BY features across all supported CDWs:

      • Group by Cube

      • Group by Grouping Sets

      • Group by Rollup

      • Automatic GROUP BY for all input columns.

    • Fixed an issue with ELT Merge Into Snap where the Snap erroneously modified the target table column name when the column name contained the target table name.

    • Fixed an issue in ELT SCD2 Snap where the Snap causes incorrect results with Snowflake targets, when:

      • The Historical Row End Date value is provided.

      • Nulls and Invalid rows are recognized, but one or more start dates in the source are null.

    • Fixed the issue in ELT Transform Snap where the Output Schema of the Snap does not populate all the column names from its Input Schema.

    4.28-Patch428patches15638 Latest

    Fixed the issue with ELT Merge Into Snap where the Snap erroneously modified the target table column name when it contained the target table name, due to a misinterpretation of the target table name aliases.

    4.28-Patch428patches15290 Latest
    • Updated ELT SCD2 Snap to address different feature requests and issues raised by multiple customers. These changes provide more flexibility in configuring your SCD2 operations using this Snap.

      • Removed Check for nulls and duplicates in source field and added two dropdown lists - Null Value Behavior and Invalid Row Handling.

      • Refer to the ELT SCD2 scenarios to learn more.

    • Introduced a new Snap ELT Router to enable routing input SQL queries into multiple output views based on the given conditional expressions.

    4.28main14627 Stable
    • Subquery Pushdown Optimization: SnapLogic now optimizes SQL queries before they are passed to the CDW to ensure the queries are performant and cost-efficient in the respective CDW. An SQL subquery means a query inside a query. Pushdown optimization refers to rewriting these incremental (nested) SQL queries produced in your ELT Pipeline to form a more optimal/performant version.

    • Introduced the following new ELT Snaps:

      • ELT Case Expression: Snap to return the action to perform on an event based on a list of events and respective expected actions.

      • ELT Coalesce: Snap to return the first non-NULL value from a list of arguments.

      • ELT Conditional Functions: Snap to perform unary and binary conditional operations on data.

      • ELT Math Functions: Snap to perform mathematical—arithmetic, logarithmic, trigonometric, exponent, root, rounding, and truncation—operations on data.

    • Enhanced all the expression-enabled fields in ELT Snaps to display suggestions from the Input Schema (emanating from the upstream Snaps) in addition to the existing standard SQL expressions and functions list.

    • Enhanced the ELT Aggregate Snap to support:

      • HAVING clause within GROUP BY clause, when the WHERE clause cannot be used.

      • GROUP BY ROLLUP.

      • New aggregate functions for DLP: ANY, SOME, KURTOSIS, and STDDEV.

    • Enhanced the ELT Load Snap to support loading data into BigQuery targets from S3 buckets and Redshift CDW. These load operations use the BigQuery Data Transfer Service (DTS) client libraries and are carried out in asynchronous mode.

    • Enhanced the ELT Load and ELT Insert Select Snaps with a new fieldset Table Options List to support defining the Table Options for creating a new table in your target CDW.

    • Enhanced the ELT Select Snap to support Common Table Expression (CTE)-based SQL queries that contain a WITH clause inside the SQL Query Editor field, when your target CDW is Azure Synapse.

    4.27-Patch

    427patches13923

     Latest
    • Fixed the issue with ELT SCD2 Snap where the Snap did not equate null values in the corresponding cause-historization rows of both the source and target tables (with no other changes to data in the remaining fields) as the same and produced duplicate rows in the target table, as a result. After this fix, the Snap does not cause any new duplicate rows in the target table.

    • Fixed the issue with ELT Load Snap where the Snap fails with the error Database encountered an error during Bulk Load process when you specify a CSV file to load data from, with the Load Action as Alter Table. The Snap now performs the specified ALTER TABLE actions—ADD/DROP columns—and loads the data into the target table accordingly (without the need to manually modify the source or target tables beforehand).

    4.27-Patch427patches13539 Latest
    • Fixed the issue with ELT SCD2 Snap where the Snap failed when you define more than one TargetTable Natural Key in the Snap configuration to load SCD2 data into the target CDW instance.

    • Fixed an issue with the ELT SCD2 Snap where the Snap failed to update the previous current rows in the target SCD2 table to historical rows when you define an End Date of Historical Row in Target Table Temporal Fields.

    • Fixed an issue with the ELT SCD2 Snap where the Snap failed to insert new rows when you define the values that exist in the most recent historical rows of the target SCD2 table as the cause-historization values.

    • Fixed the issue with ELT Transform Snap where the Snap does not omit the source columns marked for removal from the output view—using an empty Target Path for one or more columns selected in the Expression field of the Snap’s Mapping Table. See Using Empty Target Paths to Omit Rows from the Snap Output to understand how to perform this operation.

    4.27-Patch

    427patches13030

     Latest
    • Fixed the following issues with the ELT SCD2 Snap:

      • Where the Snap failed to get the right data type due to column name case mismatches between what is used in the Snap and what is actually used in the Azure Synapse tables (returned by the JDBC driver). You no longer need to type the column names in the exact case that Azure Synapse expects.

      • The Snap failed with the error—start_date does not exist—while writing SCD2 data to a Redshift table column start_date that is specified as the Start Date of Current Row in the Target Table Temporal Field(s) field set.

      • The Snap failed with the error—Reference 'END_DATE' is ambiguous—while merging SCD2 updates into DLP tables.

      • Where the Snap failed due to lack of required access privileges on the target database (for example, create table rights to create temporary tables as needed). The Snap now runs the input SQL statement and the elaborate sub-queries instead of attempting to create a temporary table in such scenarios.

    • Fixed the issue with the ELT SCD2 and ELT Load Snaps that fail to perform an add/drop operation involving multiple columns on a Redshift target table.

    4.27main12833 Stable
    • Enhanced the ELT Aggregate Snap to support COUNT_IF aggregate function for Redshift and Azure Synapse target databases.
      Eliminating duplicates with COUNT_IF aggregate function
      Note: Selecting the Eliminate Duplicates checkbox while using COUNT_IF aggregate function does not eliminate duplicate records in case of Snowflake and BigQuery databases, as there is no native support for this feature. However, for Redshift, Azure Synapse and Databricks Lakehouse Platform (DLP), the duplicates are eliminated from the list of records when you select this checkbox for COUNT_IF function.

    • Enhanced the ELT Database Account connecting to a Databricks Lakehouse Platform (DLP) to support two new options - Optimize Write and Auto Compact for creating/replacing a table using any of the ELT Insert SelectELT Merge IntoELT Load, and ELT SCD2 Snaps.

    • Updated the Expressions and Functions Supported for ELT in the Snap and Account configuration sections. This list is common to all target CDWs supported. You can use these expressions to define your Snap or Account settings with the Expression symbol = enabled, where available.

    • Enhanced the ELT Load Snap to ensure that the Snap uses the default S3 Folder name specified in the Snap's account to accurately resolve the defined File Name Pattern.

    • Enhanced the ELT Select and ELT Execute Snaps to allow SQL comments inside the SQL Query Editor and SQL Statement Editor fields respectively.

    • Enhanced the ELT Transform Snap to display the exact data types of fields listed in the Input Schema and Target Schema in case of Azure Synapse.

    • Enhanced the ELT Merge Into Snap to to support MERGE INTO ALL option and automatic source table aliasing.

    • Enhanced the ELT Load Snap by adding the Source File to Target Table Columns Map field set to enable mapping of columns between the source file and the target table. In case of Databricks Lakehouse Platform (DLP) the Snap is enhanced further to support delimiters other than the comma in the source CSV files.

    • Enhanced the ELT Join Snap to support Left Anti and Left Semi join types in BigQuery though it does not natively support these join types.

    • Enhanced the underlying load mechanism for ELT SCD2 Snap from Insert-and-Update mode to Merge-into mode to substantially improve the Snap's performance while working with large and very large volumes of data (upwards of 500M rows or 50GB size).

    4.26-Patch426patches12534 Latest
    • Fixed an issue with ELT Transform Snap where it may display incorrect schema only in the previews (during Pipeline validation). This occurs especially when the incoming SQL statement (defined in the SQL Statement Editor of the upstream Snap) contains one or more of the WHERE, GROUP BY, HAVING , ORDER BY, LIMIT, LIMIT followed by OFFSET, and SAMPLE clauses. Here are a few Pipeline scenarios where this issue might surface:

    4.26-Patch426patches12021 Latest
    • Fixed an issue where the ELT Load Snap connecting to a Databricks Lakehouse Platform (DLP) instance failed to perform the load operation. Ensure that you provide a valid DBFS Folder path in the Snap's account settings as the Snap requires this folder path.

    4.26-Patch426patches11646 Latest
    • Enhanced the ELT Database Account to support token-based authentication (Source Location Session Credentials) to S3 locations for Snowflake and Redshift target databases.
    • Enhanced the ELT Aggregate Snap with the following changes:
      • Revised the field labels from:
        • GROUP BY Fields List field set > Output Field to GROUP BY Field.
        • ORDER-By Fields to ORDER-BY Fields (Aggregate Concatenation Functions Only).
      • Removed the Suggestion option for Field Name field under General Aggregate Functions List field.
      • Made the Alias Name fields in the Aggregate Concatenation Functions List and the Percentile Distribution Functions List field sets mandatory.
    • If your target database is a Databricks Lakehouse Platform (DLP) instance, then the ELT Load Snap supports loading data from source CSV files that contain only comma as the separator between values.
    4.26-Patch426patches11323 

    Latest

    • Enhanced the ELT Database Account to allow parameterization of field values using Pipeline Parameters. You can define and use these parameters in expression-enabled fields to pass values during runtime.
    4.26-Patch426patches11262Latest
    • Fixed the following Known Issues recorded in the 4.26 GA version:
      • For a Snowflake target instance, the ELT Insert Select Snap does not suggest column names to select for the Insert Column field in the Insert Expression List.
      • The Snaps—ELT Merge Into, ELT Select, ELT Join, and ELT Filter—do not prevent the risk of SQL injection when your target database is Databricks Lakehouse Platform (DLP).
      • Intermittent null-pointer exceptions in the ELT Load Snap on Databricks Lakehouse Platform (DLP).

      • The ELT Insert Select Snap attempts to create the target table even when it exists in the Snowflake database.
      • When loading data from a JSON file into a target Databricks Lakehouse Platform (DLP) instance using an ELT Load Snap, if you choose the Drop and Create Table option as the Load Action and specify an additional column (that is not available in the JSON file) for the new table, it results in one more column null added to the new target table.
      • When you use the SQL editor in the ELT Select Snap configuration to define your SQL query, the Pipeline validation fails due to a syntax error in the following scenarios. However, the Pipeline execution works as expected. The only workaround is to drop the LIMIT clause and the optional OFFSET clause from the SQL query during Pipeline validation.
        • The query contains a LIMIT clause on a Snowflake, Redshift or Databricks Lakehouse Platform target instance: The SQL query created during Pipeline validation includes an additional LIMIT clause, for example: SELECT * FROM "STORE_DATA"."ORDERS" LIMIT 10 LIMIT 990

        • The query contains an OFFSET clause (supported in case of Snowflake and Redshift): The SQL query created during Pipeline validation looks like SELECT * FROM "STORE_DATA"."ORDERS" LIMIT 10 OFFSET 4 LIMIT 990
    4.26main11181 Stable
    • Enhanced the ELT Snap preview to support the following Snowflake data types: array, object, variant, and timestamp.

      • The Snaps convert the values to hexadecimal (HEX) equivalents—the default setting for the session parameter BINARY_OUTPUT_FORMAT in Snowflake. See Session Parameters for Binary Values for more information.

      • If this setting is different from hexadecimal (such as base64) in the Snowflake table, the Snaps still convert the values to hexadecimal equivalents for rendering them in the Snap preview.

    • Enhanced all ELT Snaps to display the Get preview data checkbox below the Snap's Label field.
    • The ELT Database account is now mandatory for all Snaps in the ELT Snap Pack.

      Breaking change

      Starting with the 4.26 release, all Snaps in the ELT Snap Pack (except the ELT Copy Snap) require an account to connect to the respective target database. Your existing Pipelines that do not use an account may fail. We recommend you to associate an ELT Database Account to each of the ELT Snaps (except ELT Copy Snap) for your Pipelines.

    • Enhanced the ELT Aggregate Snap to support Linear Regression functions on Redshift and Azure Synapse. The Snap also supports these functions on Databricks Lakehouse Platform.
    • Enhanced the ELT Execute Snap to enable running multiple DML, DDL, and DCL SQL statements from the same Snap instance.
    • Enhanced the ELT Join Snap to:
      • Support LEFT ANTI JOIN and LEFT SEMI JOIN types on all supported databases.
      • Display or hide the Resultant Column Names Prefix Type field based on the target database selected in the Snap's account.
    • Enhanced the ELT Load and ELT SCD2 Snaps to provide a list of suggested data types, while adding columns to or creating a table.
    4.25-Patch425patches10017 Latest
    • Updated the ELT SCD2 Snap to replace End date of historical row option in the Meaning field of Target Table SCD2 Fields field set with End Date of Current Row.

      Breaking change

      This may cause the existing Pipelines to fail as the End date of historical row option no longer exists.

      You need to make the following update in the ELT SCD2 Snap's settings across your Pipelines after upgrading your Snap Pack to this patch:

      • Select End Date of Current Row from the Meaning drop-down list in the second entry (highlighted in the image).
    • Fixed the issue with the ELT Insert-Select Snap containing an open output preview that fails to retrieve output preview data in case of Redshift and Azure Synapse databases, though the Pipeline runs work as expected.
    • Fixed an issue where the ELT Execute Snap does not error out (Snap turns Green) even when running an SQL query to drop a non-existent table from a Snowflake or Azure Synapse database.
    • [Update on ]: Enhanced the ELT Snap previews to support the following data types: array, object, variant, and timestamp.
      • The Snaps convert the values to hexadecimal (HEX) equivalents—the default setting for the session parameter BINARY_OUTPUT_FORMAT in Snowflake. See Session Parameters for Binary Values for more information.
      • If this setting is different from hexadecimal (such as base64) in the Snowflake table, the Snaps still convert the values to hexadecimal equivalents for rendering them in the Snap previews.
    4.25-Patch425patches9725 Latest
    • Enhanced the ELT Snap preview to display the exact binary and varbinary values from Snowflake database during Pipeline validation, by converting the values to hexadecimal equivalents—the default setting in SnowflakeIf the setting is different from hexadecimal in the Snowflake table, then the Snaps still convert the values to hexadecimal for rendering the Snap preview.
    • Enhanced the ELT Transform Snap to display the appropriate data type (binary or varbinary) for the column names populated in the output schema.
    • Enhanced the ELT Window Functions Snap to address potential issues due to an incorrect definition for MINUS function in case of Redshift and Azure Synapse databases.
    4.25main9554 Stable
    • Starting with the 4.25 release, SnapLogic has now certified the ELT Snap Pack to work with Snowflake hosted on Google Cloud Platform (GCP) as the target database, in addition to the other flavors of Snowflake hosted on AWS and Microsoft Azure
    • Introduced the ELT Execute Snap to enable you to run DML, DDL, and DCL SQL queries in Snowflake in Snowflake, Redshift, and Azure Synapse.
    • Introduced the ELT SCD2 Snap to support Type 2 Slowly Changing Dimensions (SCD2) updates to the target databases—Snowflake, Redshift, and Azure Synapse.
    • Enhanced the ELT Database Account to introduce:
      • Support for Google Cloud Storage as a storage location (source) in addition to AWS S3 and Azure Data Lake Storage (ADLS) when your target database is Snowflake.
      • Automatic download of the JDBC driver required for the selected Database Type using the new Download JDBC Driver Automatically check box.
    • Enhanced the ELT Load Snap to prevent changes to existing tables during Pipeline validation. If you set the Load Action as Drop and Create table, and the target table does not exist, the Snap creates a new (empty) target table based on the schema specified in its settings.
    • Enhanced the ELT Window Functions Snap to support Covariance, Correlation, and Linear Regression Functions on Snowflake, Redshift, and Azure Synapse databases. The Snap uses function-specific query re-writes to support these functions on Redshift and Azure Synapse databases.
    • Enhanced the ELT Merge Into and ELT Insert Select Snaps to support up to one output view, and added the Get Preview Data check box to these Snaps. You can now connect downstream ELT Snaps to these Snaps.
    4.24-Patch424patches8793 Latest
    • Fixes the issue of production job failures due to ELT Insert Select Snap after upgrading to 4.24 GA by updating the ELT Transform Snap to continue allowing duplication of fields in the Expression list for the Pipeline to complete successfully.

    No changes are needed to your existing Pipelines.

    • Fixes the column name collision issue in the Snap's output when the two tables being joined have columns with the same/identical names. You can specify the extent of prefix (that is, to prefix all columns, only duplicate columns, or no prefix) using the Resultant Column Names Prefix Type drop-down list. Based on the prefix you choose, a table alias name is prefixed to the identical columns in the output.

    Behavior Change

    The behavior of ELT Load Snap for Load Action during Pipeline validation across the supported databases is as follows:

    Append rows to existing table: Does not append the data from the source files into the target table.

    Overwrite existing table: Does not overwrite the data.

    Drop and Create table: Does not drop the target table even if it exists, but the Snap creates a new target table if a table does not exist.

    Alter table: Does not modify the schema of the target table.

    4.24main8556 Stable
    • Adds support for Azure Synapse database. You can now use the ELT Snap Pack to transform tables in the Snowflake, Redshift as well as Azure Synapse databases.

    Updates the Snap Pack with the following features:

    • ELT Database Account: Enhances the ELT Database Account to support the Azure Synapse database.
    • ELT Aggregate: Enhances the Snap to:
    • Support Azure Synapse's T-SQL aggregate functions and the aggregate functions in Snowflake and Redshift databases.
      • General Aggregate Function COUNT_IF in Snowflake.
      • General Aggregate Functions in Snowflake.
      • Linear Regression Aggregate Functions in Snowflake.
      • Aggregate Concatenation Functions in Snowflake, Redshift, and Azure Synapse.
      • Percentile Distribution Functions in Snowflake and Redshift.
    • Suggest appropriate column names to select from, in the Snap fields. This applies to Snowflake, Redshift, and Azure Synapse databases.
    • ELT Insert Select: Enhances the Snap to:
      • Suggest appropriate column names to select from, in the Snap fields.
      • Create Hash-distributed tables using the Target Table Hash Distribution Column (Azure Synapse Only) field when the Load Action is selected as Drop and Create table and a condition like WHEN NOT MATCHED BY TARGET.
    • ELT Join
      • Enhances the Snap to support Natural JOINS (NATURAL INNER JOIN, NATURAL LEFT OUTER JOIN, NATURAL RIGHT OUTER JOIN, and NATURAL FULL OUTER JOIN) in addition to the INNER, LEFT OUTER, RIGHT OUTER, FULL OUTER, and CROSS Joins in Azure Synapse Database. This enhancement also makes account configuration mandatory when using this Snap.
      • Fixes the column name collision issue in the Snap's result set when the two tables being joined have columns with the same/identical names.  You can specify the Resultant Column Names Prefix Type drop-down list. Based on the prefix type you choose, a table alias name is prefixed to identical columns in the output.
    • ELT Load: Enhances the Snap to:
      • Support the File Name Pattern option using Key Based Mechanism for Redshift database. 
      • Suggest appropriate column names to select from, in the Snap fields. This applies to Snowflake, Redshift, and Azure Synapse databases.
      • Create Hash-distributed tables using the Target Table Hash Distribution Column (Azure Synapse Only) field when the Load Action is selected as Drop and Create table.
    • ELT Merge Into: Enhances the Snap to:
      • Suggest appropriate column names to select from, in the Snap fields. This applies to Snowflake, Redshift, and Azure Synapse databases.
      • Include the Target Table Hash Distribution Column (Azure Synapse Only) field for the Snap to create hash-distributed tables always.
      • Include the Update Expression List - When Not Matched By Source field set to allow defining one or more Update Expressions for the WHEN clause - WHEN NOT MATCHED BY SOURCE. This applies to Azure Synapse database.
      • Include the Target Table Alias field to specify the alias name required for the target table. The Snap is also equipped with the ability to auto-replace the actual table names (with the alias name), if any, used in the ON clause condition, secondary AND conditions, Update Expression list, or Insert Expression list. This applies to Snowflake, Redshift, and Azure Synapse databases.
    • ELT Transform: Enhances the Snap to:
      • Display input schema and output schema based on the upstream and downstream Snaps connected to this Snap.
      • Delete fields mentioned in the Expression field from the Snap's output when the mappings have an empty Target Path
    • ELT Window Functions: Enhances the Snap to support the following Window Functions in addition to the existing ones:
      • Value Based Analytic Functions
      • LEAD and LAG Analytic Functions
    • Fixes the issue of displaying generic error messages for Triggered Task failures with ELT Pipelines by displaying detailed error messages for ease in debugging.
    4.23main7430 StableIntroduces the following Snaps:
    • ELT Load: Loads data from AWS S3 buckets and Azure clusters into the Snowflake and Redshift tables.
    • ELT Sample: Generates a data subset from the source table. 
    • ELT Pivot: Converts row data into column data.
    • ELT Unpivot: Converts column data into row data.
    • ELT Window Functions: Provides support for SQL Window Functions in ELT Pipelines.

    4.22

    main6403

     

    Stable

    Introduces the ELT Snap Pack that provides you with the Extract, Load, and Transform (ELT) capabilities. Use the following Snaps to build SQL queries that are executed in the Snowflake database:

    • ELT Aggregate : Builds SQL query to perform aggregate functions such as SUM, COUNT, MIN, and MAX. Also offers the GROUP BY functionality.
    • ELT Copy: Creates copies of the input SQL query. 
    • ELT Filter: Adds a WHERE clause in the input SQL query. Use this capability to create filters/conditions for your data set. 
    • ELT Insert Select: Performs the INSERT INTO SELECT operation on the specified table. 
    • ELT Intersect: Adds an INTERSECT SQL operator in the input queries.
    • ELT Join: Builds SQL query with a JOIN clause.
    • ELT Limit: Adds a LIMIT clause in the incoming SQL query.
    • ELT Merge Into: Performs the MERGE INTO operation on the specified table.
    • ELT Minus: Adds a MINUS SQL operator in the input queries.
    • ELT Select: Builds an SQL SELECT query and provides a built-in SQL query editor that enables you to construct complex queries.
    • ELT Sort: Adds the ORDER BY keyword in the input query. 
    • ELT Transform: Builds transformation-based SQL queries for the specified table.
    • ELT Union: Adds a UNION ALL or UNION DISTINCT operator in the input queries.
    • ELT Unique: Builds a SELECT DISTINCT SQL query. 


    See Also

    • No labels