ELT Snap Pack

ELT Snap Pack

In this article

Overview

SnapLogic’s ELT Snap Pack is a full pushdown optimization (FPDO) solution that enables you to read, extract data from different cloud storage source locations, load, and transform this data in the target database through ELT Pipelines that parse SQL queries. As such, an ELT Pipeline generates and executes SQL on the target database. ELT Snaps help you to make data available to the enterprise in quick time, eliminating delays in transforming data for your specific purpose. //mm-dd-yyyy

  • Read data from a source cloud storage location (such as Amazon S3) or Amazon Redshift (to Google BigQuery only)

  • Load it into your Data Warehouse in its raw format

  • Transform the data using the power of the cloud

The ELT Snap Pack is fundamentally different from all other Standard mode Snap Packs in SnapLogic as the ELT Snaps incrementally build an optimized SQL query (and do not process any data) in between them. Snaps like ELT Load, ELT SCD2, and ELT Insert-Select execute the final SQL query on your target CDW to load the source data and then transform it.

Automating Data Integration with ELT

Features

  • Ease of Use

    • SQL passed between Snaps; execution on CDW (cloud data warehouse)

    • Visually build SQL Pipelines

    • Selectively Preview Data

    • Suggest capability

    • Extensive transformation capabilities

    • Load data using merge into 

  • Enhanced Connectivity

    • Independent Snap Pack

    • Accounts specific to each target database.

  • Security

    • Prevent SQL Injection

  • Write transformed data to target databases hosted in the following cloud locations:

Standard Snap Pack Pipelines can execute ELT Pipelines

You can also execute an ELT Pipeline from a standard-mode Pipeline by utilizing the Pipeline Execute Snap.

ELT Source-Target-Hosting Support Matrix

(Source) Data Storage

Cloud Data Warehouse (Target)

(Target) Cloud Host 

AWS Simple Storage Service (S3)

Amazon Redshift

Microsoft Azure External Location

DBFS (Databricks File System)

Google Cloud  Storage

Amazon AWS

Microsoft Azure

Google Cloud Platform (GCP)

Blob Storage

ADLS Gen2

Snowflake

Amazon Redshift

✔ *

✔ *

Azure Synapse

Databricks Lakehouse Platform (DLP)

Google BigQuery

The ELT Snap Pack supports only the source cloud storage locations and target CDWs mentioned above. In case your source data currently resides in any other storage system, you need to port the data into one of the above data storage locations that fits into your infrastructure policies and preferences.

* ELT Snap Pack supports Azure Blob Storage and ADLS Gen2 as the external (source) locations for loading and transforming data into Azure Synapse.

See Configuring ELT Database Accounts for more details.

Supported JDBC JAR Versions

You can configure your Configuring ELT Database Accounts to automatically use an appropriate JDBC JAR file for connecting to your target CDW and performing the load and transform operations.

Supported CDW

Certified JDBC JAR File

Supported CDW

Certified JDBC JAR File

Azure Synapse

mssql-jdbc-11.2.1.jre11.jar

BigQuery

SimbaJDBCDriverforGoogleBigQuery42_1.6.3.1004.zip

Databricks Lakehouse Platform (DLP)

databricks-jdbc-2.6.29.jar

Redshift

redshift-jdbc42-2.1.0.9.jar

Snowflake

snowflake-jdbc-3.13.33.jar

Using Alternate JDBC JAR File Versions

We recommend you to let the ELT Snaps use the listed JAR file versions. However, you may use a different JAR file version of your choice.

Limitations

  • With the basic authentication type for Databricks Lakehouse Platform (DLP) reaching its end of life on July 10, 2024, SnapLogic ELT pipelines designed to use this authentication type to connect to DLP instances would cease to succeed. We recommend that you reconfigure the corresponding Snap accounts to use the Personal access tokens (PAT) authentication type.

Known Issues

  • In the ELT Merge Into Snap, the Records Updated parameter in the Snap Statistics tab (on Pipeline execution) displays the combined count of records that were inserted, updated, and deleted for target CDWs: Azure Synapse, Google BigQuery, and Amazon Redshift. The Snap displays individual counts for target tables in Snowflake and DLP

  • When your Databricks Lakehouse Platform instance uses Databricks Runtime Version 8.4 or lower, ELT operations involving large amounts of data might fail due to the smaller memory capacity of 536870912 bytes (512MB) allocated by default. This issue does not occur if you are using Databricks Runtime Version 9.0.

  • In case of Databricks Lakehouse Platform (DLP), when you specify a CSV file to load data from and select the Load Action as Alter Table in the ELT Load Snap, the Snap fails with the error Database encountered an error during Bulk Load process.

  • When you configure an ELT Merge Into Snap to perform an Update or Delete operation or an ELT Execute Snap with a MERGE INTO statement that performs Update or Delete operation on a Databricks Lakehouse Platform cluster, it may return an error if multiple source rows attempt to update or delete the same target row. To prevent such errors, you need to preprocess the source table to have only unique rows.

  • While using the SQL Expressions and Functions Supported for ELT to define the field values in ELT Snaps and accounts:

    • DO NOT use regular JavaScript expressions (supported in the Standard-mode Snaps and not in the above list).

    • DO NOT prefix variable names with $ symbol.

  • The Snaps - ELT LoadELT Merge-IntoELT Insert-Select and ELT Select, display the Schema Name field suggestions from all databases that the Snap account user can access, instead of the database selected in the Snap account or the Snap Settings.

In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):

  • WHERE clause (ELT Filter Snap)

  • WHEN clause

  • ON condition (ELT Join, ELT Merge Into Snaps)

  • HAVING clause

  • QUALIFY clause

  • Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)

  • Update expressions list (column names and values in ELT Merge Into Snap)

  • Secondary AND condition

  • Inside SQL query editor (ELT Select and ELT Execute Snaps)

Workaround

As a workaround while using these SQL query constructs, you can:

  • Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.

  • In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like SELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:

    • Write the data in the CSV file to a DLP table beforehand, as in: CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.

    • Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.

  • In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.

Snap Pack History

Related Links