In this article

1 Overview
- 1.1 Snap Type
- 1.2 Prerequisites
- 1.3 Limitation
- 1.4 Known Issues
2 Snap Views
3 Snap Settings
4 Pipeline Execution Statistics
5 Troubleshooting
6 Examples
- 6.1 Loading Historical Data From an S3 Bucket to a Snowflake Table
- 6.2 Downloads
7 Snap Pack History

Overview

You can use this Snap to implement the Slowly Changing Dimension Type 2 (SCD2) function in your Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery table. Configure the database type and the access to corresponding tables in the Configuring ELT Database Accounts for the Snap.

Switching the Snap behavior

The ELT SCD2 Snap performs a MERGE INTO-based SCD2 operation as opposed to its initial INSERT-and-then-UPDATE approach. If you prefer to use the INSERT-and-then-UPDATE approach for your SCD2 operation, define the Pipeline parameter SCD2_INSERTUPDATE_FALLBACK_MODE with a value greater than zero (0), before running your Pipeline.

Snap Type

The ELT SCD2 Snap is a write-type Snap that writes SCD2 data from a source location to a supported CDW destination.

Prerequisites

Valid accounts and access permissions to connect to the following:

Source: AWS S3, Azure Cloud Storage, or Google Cloud Storage.
Target: Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery.

Limitation

Known Issues

Snap Views

Type	Format	Number of Views	Examples of Upstream and Downstream Snaps	Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 0
Max: 1

ELT Join
ELT Copy
ELT Transform

A document containing an SQL query that yields the historization data to load.

Output

Document

Min: 0
Max: 1

ELT Select

A document containing the status of the SCD2 Load operation along with the SQL query that you can use in downstream ELT Snaps if any.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

Stop Pipeline Execution: Stops the current Pipeline execution if the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Upload icon ( ): Indicates that you can upload files.
You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression icon enabled, where available. Though this list covers only expressions that are common to all supported target CDWs, you can use (type-in manually) other expressions/functions permitted by your target CDW.
See ELT SCD2 Scenarios | Terminology for definitions of the ELT SCD2 terms used in this Snap's configuration.

Field Name		Field Type	Field Dependency	Description

Field Name		Field Type	Field Dependency	Description
Label* Default Value: N/A Example: Orders_SCD2_Data		String	None.	Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.
Get preview data Default Value: Not selected Example: Selected		Checkbox	None.	Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following: Number of records available upon execution of the SQL query generated by the Snap. The value set in the Preview Data Count setting (default is 50 records).
Database Name Default Value: N/A Example: TestDB		String/Expression	None.	Specify the name of the database in which the target table exists. Leave this field blank if you want to use the database name specified in the Default Database Name field of your Snap Account settings.
Schema Name (Not applicable to Databricks Lakehouse Platform)* Default Value: N/A Example: Sales		String/Expression/Suggestion	None.	Specify the name of the database schema. Alternatively, choose from the suggested list of schemas associated with the database.
Target Table Name* Default Value: N/A Example: SALES_ORDERS		String/Expression/Suggestion	None.	Specify the name of the SCD2 table or view in which you want to load the data. Alternatively, choose from the suggested list of table names associated with the schema.
Input Source Alias Default Value: N/A Example: src		String/Expression	None.	Specify an alias name for your source data table.
Target Table Action Default Value: Merge data into target table Example: Alter table		Drop-down list	None.	Select the SCD2 action to perform on the target table. Available options are: Merge data into target table. Loads the data from the source files into the specified target table. Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Table Columns field set. Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Table Columns field set with the Column, Data Type, and Modifier fields. Overwrite existing table. Overwrites the target table with the data from the source table/files.
Table Columns		Field set	Target Table Action is Drop and Create table or Alter table.	This field set enables you to configure the schema of the target table. You can use this field set to create the target table or to add or drop columns from it. This field set consists of the following fields: Column Data Type Modifier
	Column Default Value: N/A Example: ID, FNAME	String/Suggestion	Same as Table Columns.	Specify the name of the column that you want to load in the target table.
	Data Type Default Value: N/A Example: INT, VARCHAR	String	Same as Table Columns.	Specify the data type for the values in the specified column.
	Modifier Default Value: Add Example: Drop	Drop-down list	Target Table Action is Alter table.	Select whether you want to add/drop the specified column. Available options are: Add. To add a new column to the target table with the specified column name and data type. Drop. To delete the column from the target table. This option deletes all data existing in the specified column.
Null Value Behavior Default Value: Honor nulls as distinct values Example: Ignore nulls		Drop-down list	None.	Select an option to specify how you want the Snap to treat rows with null values in Natural Key or Cause-Historization fields of the source table. The available options are: Honor nulls as distinct values: To treat each null as a value in the field. And as a result, equate all null values (as the same value) for all practical purposes. Ignore nulls: To treat any rows with nulls in Natural Key and Cause-Historization fields of the source table as absent and proceed with the SCD2 operation based on the values available in the remaining rows. Error on nulls: To abort the Snap execution when a null value is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap. Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.
Invalid Row Handling Default Value: Ignore invalid rows Example: Insert duplicate invalid rows		Drop-down list	None.	Select an option to specify how you want the Snap to treat invalid rows, if any, in the source data. A row is considered invalid when it does not result in a new, valid, and coherent SCD2 entry/row in the target table. The available options are: Ignore invalid rows: To treat invalid rows as absent and proceed with the SCD2 operation based on the valid values available in the remaining fields. The Snap also ignores inserting any potentially invalid rows into the target table. Insert duplicate invalid rows: To treat all rows in a uniform manner—allowing an entry for each invalid row, in the target table— and optionally flagging such invalid entries in the target table as invalid (Invalid = true or 1 or a similar boolean value) based on the Invalid historical row flag specified in the Target Table Temporal Field(s) fieldset. Error on invalid rows: To abort the Snap execution when an invalid row is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap. Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.
Target Table Natural Key(s)*	This field set enables you to configure the names of fields that identify a unique row in the target table. The identity key cannot be used as the Natural key, since a current row and its historical rows cannot have the same identity value. Specify the value in the following field for each new row added in this field set. Target Table Natural Key(s)
Target Table Natural Key(s)*	Target Table Natural Key(s) Default Value: N/A Example: CUSTOMER_ID	String/Expression/Suggestion		Specify the name of the Natural Key column from the target table.
Target Table Cause-Historization Field(s)*	This field set enables you to configure the names of fields where any change in value causes the historization of an existing row and the insertion of a new 'current' row. Specify the value in the following field for each new row added in this field set. Target Table Cause-Historization Field(s)
Target Table Cause-Historization Field(s)*	Target Table Cause-Historization Field(s)* Default Value: N/A Example: LAST_ORDER_AMT	String/Expression/Suggestion	None.	Specify the name of the historization-causing column from the target table.
Target Table Temporal Field(s)*	This field set enables you to configure the historical and updated information for the Cause-historization field. Click to add SCD fields. Specify the values in the following fields for each SCD field row in the field set. Meaning Field Value
	Meaning* Default Value: Five rows with one list item in each row (Without repetition and End date of historical row). Example: Start date of current row	Drop-down list	None.	Select one from the following six permitted values for this field in each row. At the minimum, ensure that you configure appropriate values for the first three meanings, only one of the end dates, and the flag in this list. Current row (Required) Historical row (Required) Start date of current row (Required) End date of current row (Selected, by default) End date of historical row Remove/replace End date of current row with this item, if you want to specify this date. Invalid historical rows flag (Optional)
	Field* Default Value: N/A Examples: Start_Date End_Date Current_Row Historical_Row	String/Expression/Suggestion	None.	Associate with each Meaning selected the field name from the target table that would contain the historical information. For Current row: The name of the column in the target table that holds the flag for the historized field. For Historical row: The name of the column in the target table that holds the flag for the historized field. It has to be the same as the value configured for the Current row field. For Start date of current row: The name of the column in the target table for denoting the start date for the current row. For End date of current row: The name of the column in the target table for denoting the end date for the current row. In this case, the start date of the current row specified above is automatically filled in as the end/closing date for the matching historical record. For End date of historical row: The name of the column in the target table for denoting the end date for the historical rows. For Invalid historical row flag: The name of a nullable column that holds the flag for an invalid historical row. If you specify a non-null value for this field: While inserting an invalid row in the target table, the Snap inserts the specified value in the Invalid Row flag column to indicate that the row is invalid. If no invalid rows are inserted, the Invalid Row flag column value remains empty for all valid rows. See the Troubleshooting section in this document and ELT SCD2 Scenarios to ensure that you have configured these fields in your Snap's settings properly.
	Value or Expression* Default Value: N/A Examples: For date fields: `src.start_date`, '2021-12-31', `CURRENT_DATE`, null For row fields: 'true', 'false'	String/Expression	None.	Configure the value for each of the fields selected in the Field field. Here are a few more examples that you can use to define the expressions in this field based on your target CDW: Snowflake: `CURRENT_DATE()`, `CURRENT_DATE`, or `DATE.NOW()`. Refer to Snowflake Documentation for more information. Amazon Redshift: `CURRENT_DATE`, `GETDATE()`, or `DATE.NOW()`. Refer to Redshift Documentation for more information. Azure Synapse: `currentDate`. Refer to Data flow: Date and time functions for more information. DLP: `current_date`. Refer to Alphabetical list of built-in functions in Databricks SQL for more information. Google BigQuery: `CURRENT_DATE()` or `CURRENT_DATE`. Refer to Date functions for more information.
Target Table Sort Field(s)*	This field set enables you to sort the records in the target table, by one or more historization fields (for example, by natural key). This mapping is for columns that are not part of the Target Table Temporal Fields. Specify the values in the following fields for each sort order row. Sort Field Sort Order Null Value Sort Preference
	Sort Field Default Value: N/A Example: start_date GENDER	String/Expression/Suggestion		Specify the column by which to sort the target table data set.
	Sort Order Default Value: ASC Example: DESC	Drop-down list		Choose either of the sort orders—ASC (ascending) or DESC (descending)—for the target table data set.
	Null Value Sort Preference Default Value: NULLS LAST Example: NULLS FIRST	Drop-down list		Choose the position for the null values, if any, in the selected column while sorting—NULLS FIRST (at the beginning) or NULLS LAST (at the end).
Target-Source Column Mapping	This field set enables you to define a one-to-one mapping between the columns in the source and target tables for inserting new rows into the target table. This field set contains the following fields: Target Table Column Name Source Column Mapping Expression
	Target Table Column Name Default Value: N/A Example: ORD_AMT	String/Expression/Suggestion	None.	Choose a column from the target table's schema to map with a column from the source data set (schema).
	Source Column Mapping Expression Default Value: N/A Example: src.ORD_AMT	String/Expression	None.	Choose a column from the source data set (schema) that corresponds to the respective column in the target table's schema. You can alternatively specify an expression for the source column name.

Pipeline Execution Statistics

As a Pipeline executes, the Snap shows the following statistics updating periodically. You can monitor the progress of the Pipeline as each Snap performs execution.

Records Added
Records Updated
Records Deleted

You can view more information by clicking the Download Query Details link. This downloads a JSON file that contains the count of rejected records besides the above counts.

Troubleshooting

Examples

Loading Historical Data From an S3 Bucket to a Snowflake Table

The following Pipeline is designed to view and capture Type 2 SCD data from a folder in S3 bucket and load it into the target table in Snowflake database.

We configure an ELT Select Snap to capture SCD2 data from the source table SCD2_NEW_DATA in S3. The image below depicts the data available in the source table.

ELT Select Snap

ELT Select Snap

Snap Output

We use this Snap's output as the input to an ELT SCD2 Snap and configure this Snap to update the table SCD2_TRG_TEST5 residing in the Snowflake database.

The image below depicts the data available in the target table, before the data load.

Existing Data in Target Table

Existing Data in Target Table

ELT SCD2 Snap

Target Table after SCD2 operation

Notice the difference in the number of rows and the values in the END_DATE and the CURRENT_ROW columns. There is only one row with CURRENT_ROW = true in the updated table for each unique CUSTOMER_ID.

Download this Pipeline.

Downloads

Snap Pack History

Refer to the Snap Pack History table in ELT Snap Pack.