ELT Snap Pack
In this article
Overview
SnapLogic’s ELT Snap Pack is a full pushdown optimization (FPDO) solution that enables you to read, extract data from different cloud storage source locations, load, and transform this data in the target database through ELT Pipelines that parse SQL queries. As such, an ELT Pipeline generates and executes SQL on the target database. ELT Snaps help you to make data available to the enterprise in quick time, eliminating delays in transforming data for your specific purpose. //mm-dd-yyyy
Read data from a source cloud storage location (such as Amazon S3) or Amazon Redshift (to Google BigQuery only)
Load it into your Data Warehouse in its raw format
Transform the data using the power of the cloud
The ELT Snap Pack is fundamentally different from all other Standard mode Snap Packs in SnapLogic as the ELT Snaps incrementally build an optimized SQL query (and do not process any data) in between them. Snaps like ELT Load, ELT SCD2, and ELT Insert-Select execute the final SQL query on your target CDW to load the source data and then transform it.
Automating Data Integration with ELT
Articles in this section
- Best Practices for Using the ELT Snap Pack
- SQL Expressions and Functions Supported for ELT
- Subquery Pushdown Optimization (SPDO)
- Configuring ELT Database Accounts
- ELT Aggregate
- ELT Case Expression
- ELT Cast Function
- ELT Coalesce
- ELT Conditional Function
- ELT Copy
- ELT Create View
- ELT Execute
- ELT Filter
- ELT Insert-Select
- ELT Intersect
- ELT Join
- ELT Limit
- ELT Load
- ELT Math Function
- ELT Merge Into
- ELT Minus
- ELT Pivot
- ELT Router
- ELT SCD2
- ELT Sample
- ELT Select
- ELT Sort
- ELT String Function
- ELT Transform
- ELT Union
- ELT Unique
- ELT Unpivot
- ELT Window Functions
- SnapLogic ELT Use Cases
Features
Ease of Use
SQL passed between Snaps; execution on CDW (cloud data warehouse)
Visually build SQL Pipelines
Selectively Preview Data
Suggest capability
Extensive transformation capabilities
Load data using merge into
Enhanced Connectivity
Independent Snap Pack
Accounts specific to each target database.
Security
Prevent SQL Injection
Write transformed data to target databases hosted in the following cloud locations:
AWS (For Snowflake, Amazon Redshift, and Databricks Lakehouse Platform (DLP))
Microsoft Azure (For Snowflake, Azure Synapse, and DLP)
Google Cloud Platform (For Snowflake and BigQuery)
Standard Snap Pack Pipelines can execute ELT Pipelines
You can also execute an ELT Pipeline from a standard-mode Pipeline by utilizing the Pipeline Execute Snap.
ELT Source-Target-Hosting Support Matrix
(Source) Data Storage | Cloud Data Warehouse (Target) | (Target) Cloud Host | |||||||
|---|---|---|---|---|---|---|---|---|---|
AWS Simple Storage Service (S3) | Amazon Redshift | Microsoft Azure External Location | DBFS (Databricks File System) | Google Cloud Storage | Amazon AWS | Microsoft Azure | Google Cloud Platform (GCP) | ||
Blob Storage | ADLS Gen2 | ||||||||
✔ | ✖ | ✔ | ✖ | ✖ | ✔ | Snowflake | ✔ | ✔ | ✔ |
✔ | ✖ | ✖ | ✖ | ✖ | ✖ | Amazon Redshift | ✔ | ✖ | ✖ |
✖ | ✖ | ✔ * | ✔ * | ✖ | ✖ | Azure Synapse | ✖ | ✔ | ✖ |
✔ | ✖ | ✔ | ✔ | ✔ | ✖ | Databricks Lakehouse Platform (DLP) | ✔ | ✔ | ✖ |
✔ | ✔ | ✖ | ✖ | ✖ | ✔ | Google BigQuery | ✖ | ✖ | ✔ |
The ELT Snap Pack supports only the source cloud storage locations and target CDWs mentioned above. In case your source data currently resides in any other storage system, you need to port the data into one of the above data storage locations that fits into your infrastructure policies and preferences.
* ELT Snap Pack supports Azure Blob Storage and ADLS Gen2 as the external (source) locations for loading and transforming data into Azure Synapse.
See Configuring ELT Database Accounts for more details.
Supported JDBC JAR Versions
You can configure your Configuring ELT Database Accounts to automatically use an appropriate JDBC JAR file for connecting to your target CDW and performing the load and transform operations.
Supported CDW | Certified JDBC JAR File |
|---|---|
Azure Synapse | mssql-jdbc-11.2.1.jre11.jar |
BigQuery | SimbaJDBCDriverforGoogleBigQuery42_1.6.3.1004.zip |
Databricks Lakehouse Platform (DLP) | databricks-jdbc-2.6.29.jar |
Redshift | redshift-jdbc42-2.1.0.9.jar |
Snowflake | snowflake-jdbc-3.13.33.jar |
Using Alternate JDBC JAR File Versions
We recommend you to let the ELT Snaps use the listed JAR file versions. However, you may use a different JAR file version of your choice.
Limitations
With the basic authentication type for Databricks Lakehouse Platform (DLP) reaching its end of life on July 10, 2024, SnapLogic ELT pipelines designed to use this authentication type to connect to DLP instances would cease to succeed. We recommend that you reconfigure the corresponding Snap accounts to use the Personal access tokens (PAT) authentication type.
Known Issues
In the ELT Merge Into Snap, the Records Updated parameter in the Snap Statistics tab (on Pipeline execution) displays the combined count of records that were inserted, updated, and deleted for target CDWs: Azure Synapse, Google BigQuery, and Amazon Redshift. The Snap displays individual counts for target tables in Snowflake and DLP
When your Databricks Lakehouse Platform instance uses Databricks Runtime Version 8.4 or lower, ELT operations involving large amounts of data might fail due to the smaller memory capacity of 536870912 bytes (512MB) allocated by default. This issue does not occur if you are using Databricks Runtime Version 9.0.
In case of Databricks Lakehouse Platform (DLP), when you specify a CSV file to load data from and select the Load Action as Alter Table in the ELT Load Snap, the Snap fails with the error
Database encountered an error during Bulk Load process.When you configure an ELT Merge Into Snap to perform an Update or Delete operation or an ELT Execute Snap with a MERGE INTO statement that performs Update or Delete operation on a Databricks Lakehouse Platform cluster, it may return an error if multiple source rows attempt to update or delete the same target row. To prevent such errors, you need to preprocess the source table to have only unique rows.
While using the SQL Expressions and Functions Supported for ELT to define the field values in ELT Snaps and accounts:
DO NOT use regular JavaScript expressions (supported in the Standard-mode Snaps and not in the above list).
DO NOT prefix variable names with $ symbol.
The Snaps - ELT Load, ELT Merge-Into, ELT Insert-Select and ELT Select, display the Schema Name field suggestions from all databases that the Snap account user can access, instead of the database selected in the Snap account or the Snap Settings.
In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios): Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps) Update expressions list (column names and values in ELT Merge Into Snap) Secondary Inside SQL query editor (ELT Select and ELT Execute Snaps) Workaround As a workaround while using these SQL query constructs, you can: Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed. In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like Write the data in the CSV file to a DLP table beforehand, as in: Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement. In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.WHERE clause (ELT Filter Snap)WHEN clauseON condition (ELT Join, ELT Merge Into Snaps)HAVING clauseQUALIFY clauseAND conditionSELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.