ELT Sample

In this article

An account for the Snap

You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:

Overview

Use this Snap to add the SAMPLE clause to the incoming query. The SAMPLE clause creates a subset of the source table based on the specified number of rows. Alternatively, you can also create a subset of rows as a percentage of the number of rows in the source table. You can use this Snap to reduce the cost and resource overhead on the target database and file storage services such as S3 by testing the SQL query on a subset of the data instead of the whole table.

Prerequisites

None.

Limitation

ELT Snap Pack  does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.

Known Issues

In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):

  • WHERE clause (ELT Filter Snap)
  • WHEN clause
  • ON condition (ELT Join, ELT Merge Into Snaps)
  • HAVING clause
  • QUALIFY clause
  • Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
  • Update expressions list (column names and values in ELT Merge Into Snap)
  • Secondary AND condition
  • Inside SQL query editor (ELT Select and ELT Execute Snaps)

Workaround

As a workaround while using these SQL query constructs, you can:

  • Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
  • In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like SELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:
    • Write the data in the CSV file to a DLP table beforehand, as in: CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.
    • Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
  • In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.

Snap Input and Output

Input/OutputType of ViewNumber of ViewsExamples of Upstream and Downstream SnapsDescription
Input 

Document

  • Min: 1
  • Max: 1
  • ELT Select
  • ELT Transform
  • ELT Filter
The SQL query in which you want to add the SAMPLE clause. Typically, it would be a SELECT query reading the source table. 
Output

Document

  • Min: 1
  • Max: 1
  • ELT Insert-Select
  • ELT Transform

The incoming SQL query with the SAMPLE clause and a preview of the query's result if the Get preview data check box is selected.

Snap Settings

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

Field NameTypeField DependencyDescription
LabelStringNone

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Default ValueELT Sample

ExampleSample Order Data

Get preview dataCheckboxNone

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in Databricks Lakehouse Platform

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Default Value: Not selected
Example: Selected
Sampling TypeDrop-down list

None

Choose the sampling type to query your data.

Available options are:

  • Number of Records
  • Percentage of Records

Default ValueNumber of Records

ExamplePercentage of Records
Number of RecordsString/ExpressionSampling Type is Number of Records

Enter the number of records to output.

The ELT Sample Snap outputs the entire table if the total number of records in the table is less than the value specified here.

Default Value: NA

Example10
Percentage of RecordsString/ExpressionSampling Type is Percentage of Records

Enter the percentage number of records to sample your data based on a percentage value of the total records in the table.

  • The number of records in the output is rounded off to the lower value if the percentage leads to a decimal. For example, if you enter 75 and if the table has 10 records, the output yields 7 records since 75% of 10 is 7.5.
  • The number of records in the output is randomized and is up to a maximum of the percentage calculation specified above. Therefore, in the above instance, the output may contain 5, 6, or 7 records.

Default Value: NA

Example25

Troubleshooting

None.

Examples

Creating a Subset Table

We want to create a subset of a table for testing various SQL queries. We need a query with the SAMPLE clause. This Pipeline shows how we can use the ELT Sample Snap to accomplish this task.

First, we build a SELECT query to read the target table, ORDERS3. To do so, we use the ELT Select Snap and configure it as shown below. We also configure the Snap to show a preview of the SELECT query's execution:

Upon execution, the ELT Select Snap builds a SELECT query as shown along with a preview of the query's output below:

This table contains 34 records. We only want a sample of 10 records in the subset table. Therefore, we use the ELT Sample Snap and configure it as shown below:

This Snap builds the following query based on this configuration:

From the output preview, we can see that the subset contains 10 records.

Alternatively, if you want to fetch a percentage of records of the original table, you can configure the Snap as shown below:

The number of records in the output is randomized to a maximum of the percentage specified here. In this example, since the original table has 34 records, specifying 50 in the Percentage of Records field yields a maximum of 17 records.  

Based on the Snap's configuration, the output contains a random sample of 14 records.

We can also add an ELT Insert-Select Snap downstream and write the result of this query into another table.

Download this Pipeline.

Downloads

Important steps to successfully reuse pipelines

  1. Download and import the Pipeline into SnapLogic.
  2. Configure Snap accounts as applicable.
  3. Provide Pipeline parameters as applicable.

  File Modified

File Example_ELT_Sample.slp

Feb 12, 2021 by Kalpana Malladi


Snap Pack History

 Click here to expand...

Release

Snap Pack Version 

Date

Type

Updates

August 2024

438patches28010

 Latest

The ELT Insert-Select Snap no longer fails to execute SQL statements that contain multiple multiline comment character pairs (/* and */) and/or multiple quoted substrings. Quoted substrings refer to schema, database, table, or column identifiers, which are delimited to allow special characters.

  • We recommend that you upgrade your main27765 (August 2024 GA release) ELT Snap Pack to this latest version.

August 2024main27765 Stable

Upgraded the jOOQ library for the ELT Snap Pack from v3.9.1 to v3.17.x.

May 2024437patches27372 Latest

Enhanced the pipeline execution statistics of ELT Insert-Select Snap to be displayed in its output view and to allow downloading detailed stats as a JSON file that includes additional statistics (extraStats) on DML statement executions on target Databricks Lakehouse Platform (DLP) tables.

May 2024437patches27246 Latest

Enhanced the ELT Execute Snap to display SQL execution statistics in the pipeline execution statistics and the output view of the Snap for all SQL statements executed. The Snap also allows you to download detailed stats as a JSON file that includes additional statistics (extraStats) on DML statement executions on target Databricks Lakehouse Platform (DLP) tables.

May 2024437patches26846 LatestFixed the issue with the ADLS Gen 2 account connection where some conflicts between internally used Azure libraries prevented the ELT Load Snap from reading files.

Fixed an issue with the ELT Merge Into Snap where the Snap’s SELECT SQL statement could not fetch the target tables information from the PG_TABLE_DEF catalog table of the Amazon Redshift instance.

  • The SELECT SQL statement now uses lowercase for both non-delimited and delimited table names and excludes double quotes from delimited table names to fetch the values from the catalog table.

PG_TABLE_DEF is a Redshift system catalog table that contains information about the tables including table names, column names, data types among their other metadata.

Enhanced the ELT Load Snap to support loading data from nested AVRO, JSONLines, ORC, or Parquet files in Azure storage to the target tables in a Databricks Lakehouse Platform (DLP) instance. Only two Load actions are supported: Drop and create table and Append table.
May 2024main26341 Stable

Fixed an issue where the ELT Merge Into Snap failed to load data into the Google BigQuery target table because of the error: Number of source column expressions in the input SQL is greater than the target table columns (even after altering the target table schema to match the source table columns). The resolution was to modify an SQL statement that was used internally to achieve the merge action.

February 2024436patches25953  Latest

Enhanced the ELT Load Snap’s capabilities to allow loading flat and nested data sets (from canonical and non-canonical formats) from your Parquet files to the target tables in Snowflake. Learn more about the usage of this feature at Load data from Parquet files and in the following example Pipelines:

February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023435patches24461 LatestFixed an issue with the ELT Merge Into Snap that caused inaccurate aliases and table identifiers in its generated SQL statement.
November 2023435patches23671 Latest and Stable
  • Enhanced the ELT Merge Into Snap to support defining the update expressions for updating tables in your target Databricks Lakehouse Platform instance using the WHEN NOT MATCHED BY SOURCE clause, besides other supported clauses and expression lists.

November 2023main23721 Stable
  • Enhanced the ELT Merge Into and ELT SCD2 Snaps to display detailed pipeline execution statistics about the data loaded to target tables on a Databricks Lakehouse Platform instance.

    • In addition to the individual counts of rows inserted, updated, and deleted that the ELT Merge Into Snap covers, the ELT SCD2 Snap also reports the count of source rows rejected.

August 2023main22460 Stable
  • Upgraded the JDBC driver support for Snowflake to snowflake-jdbc-3.13.33.jar.

  • Updated the ELT Snowflake Account to append the connection parameter application in its JDBC URL with the value SnapLogic_iPaaS.

May 2023N/A Stable

Fixed a null pointer exception so no 5XX errors can occur if you download non-existent query details from the Pipeline Execution Statistics of an ELT (write-type) Snap.

May 2023main21015 StableUpgraded with the latest SnapLogic Platform release.
February 2023432patches20978 LatestFixed an issue with the ELT SCD2 Snap where the COLLATE column constraint (used in the new target table definition for Snowflake) resulted in an incorrect syntax internally, causing the pipeline to fail. The load operation succeeds with this fix.
February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022431patches19240

 

Latest
  • SnapLogic upgraded the default JDBC Driver versions used to connect the ELT Snaps with the supported CDWs.
  • The ELT Merge Into Snap displays the individual record counts inserted, updated, and deleted for Snowflake targets in the respective Records Inserted, Records Updated, and Records Deleted parameters of the Snap Statistics tab (on Pipeline execution).
  • With Google deprecating the OAuth out-of-band (OOB) flow, the Refresh Token Accounts defined for connecting your ELT Snaps to the BigQuery instances start failing in a phased manner. We recommend that you immediately modify these Snap account configurations to switch to an Access Token Account or a Service Account from the Refresh Token Account.
November 2022main18944 Stable

ELT Insert-Select, ELT Merge Into, and ELT SCD2 Snaps show the following statistics on execution.

  • Records Added

  • Records Updated

  • Records Deleted

September 2022 430patches18196 Latest

New Snap

The ELT Create View Snap enables you to create a new view when the view does not exist in the target database and/or schema or if the view already exists in the database and/or schema, and you choose to drop the existing view and re-create it.

Enhancements

  • The ELT Insert-Select Snap is more flexible and easier to use, especially if the number of columns in your source data set is very large. You can choose to update values only in a subset of columns in the target table.

  • The ELT Execute Snap can retrieve and execute SQL queries from the upstream Snap's output when referenced in the SQL Statement Editor using the Expression language (with the Expression button enabled).

  • The ELT Load Snap can infer the schema from the source files in Amazon S3, ADLS Gen2, Microsoft Azure Blob Storage, or Google Cloud Storage location and use it to create, overwrite, and append the target table in your Snowflake instance with the source data. The source files can be in the AVRO, CSV, JSON, ORC, or PARQUET format. Learn more at Automatic Schema Inference with ELT Load Snap.

  • Target Table Name in the following Snaps supports retrieving editable views with the table names from the selected target schema:

  • The pivot values in the ELT PivotSnap turns dynamic when you select Enable dynamic pivot values. The following field settings are added as part of this dynamic pivot values feature:

    • Filter Predicate List: A field set to filter the predicate list of the pivot values.

      • Pivot Values Filter: Condition required to filter the pivot values.

      • Boolean Operator: Predicate condition type through AND or OR Boolean operators

    • Sort Order: Sorting order of the pivot values.

  • You can specify the type of Microsoft Azure external storage location (source)—an Azure Data Lake Gen2 or a Blob Storage—to access your source data using the Storage Integration type of authentication and load it to your target Snowflake instance.

August 2022main17386 StableUpgraded with the latest SnapLogic Platform release.
4.29-Patch429patches16665 Latest
  • Enhanced the ELT Snap Pack to support the latest JDBC drivers across CDWs—Azure Synapse, BigQuery, DLP, Redshift, and Snowflake. See Configuring ELT Database Accounts or the respective Account page for the exact versions.

  • Enhanced the ELT Pivot Snap to make the Value List field dynamic.

  • Enhanced the ELT DLP Account to configure S3 Bucket, Azure Storage, and DataLake Storage Gen2 Mounts.
  • Enhanced the ELT Snowflake Account with support for Key Pair Authentication.

  • Enhanced the ELT SCD2 Snap:

    • To include a new option Overwrite existing table in the Target Table action field.

    • To display the final SQL query in its output preview upon Pipeline validation.

  • Enhanced the end Snap SQL query in the ELT Insert Select Snap’s preview output to display the CREATE TABLE... or the DELETE/DROP TABLE statements to be run before the query that inserts/loads data into a new table in the Snowflake target CDW.

  • Fixed an issue with ELT Insert Select and ELT Merge Into Snaps where they cause the Pipeline to fail when the specified target table does not exist. After this fix, the Snaps create a new target table if it does not exist during Pipeline validation.

    The new table that is created will not be dropped in the event of a subsequent/downstream Snap failing during validation.

  • Fixed the issue with ELT Load Snap where the Snap caused an SQL exception—[Simba][SparkJDBCDriver](500051) ERROR processing query/statement when reading from a CSV file in S3 mount point on DBFS in the case of a DLP target instance.

  • The ELT Insert Select, ELT Merge Into, ELT Load, and the ELT SCD2 Snaps now run successfully even when the specified target table does not exist. These Snaps now create a new target table if it does not exist during Pipeline validation.

    The new table thus created will not be dropped in the event of a subsequent/downstream Snap failure during validation.

  • Updated the field names in ELT Aggregate, ELT Cast Function, ELT String Function, and ELT Transform Snaps to maintain consistency.
4.29-Patch

4.29patches16287

 Latest

Fixed an issue with the ELT SCD2 Snap where the Snap was rounding off decimal values to the nearest integer—the value 57.601000000000 in the source table was written to the target table as 58.000000000.

4.29main15993 Stable
  • Introduced the following new ELT Snaps:
    • ELT Cast Function: Snap to convert a data type of a column in the input SQL string into other supported data types.
    • ELT String Function: Snap to support the various string functions supported by the different databases.
    • ELT Router: Snap to enable routing input SQL queries into multiple output views based on the given conditional expressions.
  • Enhanced the following Snaps to display the final SQL query in their output preview upon Pipeline validation.
  • Enhanced the ELT Database Account to support OAuth2-based authentication on the target Snowflake database.

  • Enhanced the ELT Select, ELT Insert Select, ELT SCD2, ELT Merge Into, and ELT Load Snaps to display suggestions on the Schema Name field based on the Default Database Name provided in the Snap Account configuration when the Database Name is not specified in the respective Snap.

    • Improved usability of the suggestions features for these Snaps by making them case-insensitive. For example, typing default in the Schema Name field displays both default and DEFAULT, if they co-exist. You do not need to type DEFAULT to invoke and select the schema name DEFAULT from the suggestions list.

  • Enhanced the ELT SCD2 Snap to address different feature requests and issues raised by multiple customers. These changes provide more flexibility in configuring your SCD2 operations using this Snap.

  • Removed Check for nulls and duplicates in the source field and added two dropdown lists - Null Value Behavior and Invalid Row Handling.

  • Made the following items in the Meaning field of the Target Table Temporal Fields fieldset mandatory while making the Invalid historical rows flag optional.

    • Current row 

    • Historical row

  • Enhanced the ELT Aggregate and ELT Window Functions Snaps to support the following functions across all supported CDWs:

    • KURTOSIS

    • MODE

    • SKEW

  • Enhanced the ELT Aggregate Snap to support the following GROUP BY features across all supported CDWs:

    • Group by Cube

    • Group by Grouping Sets

    • Group by Rollup

    • Automatic GROUP BY for all input columns.

  • Fixed an issue with ELT Merge Into Snap where the Snap erroneously modified the target table column name when the column name contained the target table name.

  • Fixed an issue in ELT SCD2 Snap where the Snap causes incorrect results with Snowflake targets, when:

    • The Historical Row End Date value is provided.

    • Nulls and Invalid rows are recognized, but one or more start dates in the source are null.

  • Fixed the issue in ELT Transform Snap where the Output Schema of the Snap does not populate all the column names from its Input Schema.

4.28-Patch428patches15638 Latest

Fixed the issue with ELT Merge Into Snap where the Snap erroneously modified the target table column name when it contained the target table name, due to a misinterpretation of the target table name aliases.

4.28-Patch428patches15290 Latest
  • Updated ELT SCD2 Snap to address different feature requests and issues raised by multiple customers. These changes provide more flexibility in configuring your SCD2 operations using this Snap.

    • Removed Check for nulls and duplicates in source field and added two dropdown lists - Null Value Behavior and Invalid Row Handling.

    • Refer to the ELT SCD2 scenarios to learn more.

  • Introduced a new Snap ELT Router to enable routing input SQL queries into multiple output views based on the given conditional expressions.

4.28main14627 Stable
  • Subquery Pushdown Optimization: SnapLogic now optimizes SQL queries before they are passed to the CDW to ensure the queries are performant and cost-efficient in the respective CDW. An SQL subquery means a query inside a query. Pushdown optimization refers to rewriting these incremental (nested) SQL queries produced in your ELT Pipeline to form a more optimal/performant version.

  • Introduced the following new ELT Snaps:

    • ELT Case Expression: Snap to return the action to perform on an event based on a list of events and respective expected actions.

    • ELT Coalesce: Snap to return the first non-NULL value from a list of arguments.

    • ELT Conditional Functions: Snap to perform unary and binary conditional operations on data.

    • ELT Math Functions: Snap to perform mathematical—arithmetic, logarithmic, trigonometric, exponent, root, rounding, and truncation—operations on data.

  • Enhanced all the expression-enabled fields in ELT Snaps to display suggestions from the Input Schema (emanating from the upstream Snaps) in addition to the existing standard SQL expressions and functions list.

  • Enhanced the ELT Aggregate Snap to support:

    • HAVING clause within GROUP BY clause, when the WHERE clause cannot be used.

    • GROUP BY ROLLUP.

    • New aggregate functions for DLP: ANY, SOME, KURTOSIS, and STDDEV.

  • Enhanced the ELT Load Snap to support loading data into BigQuery targets from S3 buckets and Redshift CDW. These load operations use the BigQuery Data Transfer Service (DTS) client libraries and are carried out in asynchronous mode.

  • Enhanced the ELT Load and ELT Insert Select Snaps with a new fieldset Table Options List to support defining the Table Options for creating a new table in your target CDW.

  • Enhanced the ELT Select Snap to support Common Table Expression (CTE)-based SQL queries that contain a WITH clause inside the SQL Query Editor field, when your target CDW is Azure Synapse.

4.27-Patch

427patches13923

 Latest
  • Fixed the issue with ELT SCD2 Snap where the Snap did not equate null values in the corresponding cause-historization rows of both the source and target tables (with no other changes to data in the remaining fields) as the same and produced duplicate rows in the target table, as a result. After this fix, the Snap does not cause any new duplicate rows in the target table.

  • Fixed the issue with ELT Load Snap where the Snap fails with the error Database encountered an error during Bulk Load process when you specify a CSV file to load data from, with the Load Action as Alter Table. The Snap now performs the specified ALTER TABLE actions—ADD/DROP columns—and loads the data into the target table accordingly (without the need to manually modify the source or target tables beforehand).

4.27-Patch427patches13539 Latest
  • Fixed the issue with ELT SCD2 Snap where the Snap failed when you define more than one TargetTable Natural Key in the Snap configuration to load SCD2 data into the target CDW instance.

  • Fixed an issue with the ELT SCD2 Snap where the Snap failed to update the previous current rows in the target SCD2 table to historical rows when you define an End Date of Historical Row in Target Table Temporal Fields.

  • Fixed an issue with the ELT SCD2 Snap where the Snap failed to insert new rows when you define the values that exist in the most recent historical rows of the target SCD2 table as the cause-historization values.

  • Fixed the issue with ELT Transform Snap where the Snap does not omit the source columns marked for removal from the output view—using an empty Target Path for one or more columns selected in the Expression field of the Snap’s Mapping Table. See Using Empty Target Paths to Omit Rows from the Snap Output to understand how to perform this operation.

4.27-Patch

427patches13030

 Latest
  • Fixed the following issues with the ELT SCD2 Snap:

    • Where the Snap failed to get the right data type due to column name case mismatches between what is used in the Snap and what is actually used in the Azure Synapse tables (returned by the JDBC driver). You no longer need to type the column names in the exact case that Azure Synapse expects.

    • The Snap failed with the error—start_date does not exist—while writing SCD2 data to a Redshift table column start_date that is specified as the Start Date of Current Row in the Target Table Temporal Field(s) field set.

    • The Snap failed with the error—Reference 'END_DATE' is ambiguous—while merging SCD2 updates into DLP tables.

    • Where the Snap failed due to lack of required access privileges on the target database (for example, create table rights to create temporary tables as needed). The Snap now runs the input SQL statement and the elaborate sub-queries instead of attempting to create a temporary table in such scenarios.

  • Fixed the issue with the ELT SCD2 and ELT Load Snaps that fail to perform an add/drop operation involving multiple columns on a Redshift target table.

4.27main12833 Stable
  • Enhanced the ELT Aggregate Snap to support COUNT_IF aggregate function for Redshift and Azure Synapse target databases.
    Eliminating duplicates with COUNT_IF aggregate function
    Note: Selecting the Eliminate Duplicates checkbox while using COUNT_IF aggregate function does not eliminate duplicate records in case of Snowflake and BigQuery databases, as there is no native support for this feature. However, for Redshift, Azure Synapse and Databricks Lakehouse Platform (DLP), the duplicates are