ELT SCD2

ELT SCD2

Overview

You can use this Snap to implement the Slowly Changing Dimension Type 2 (SCD2) function in your Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery table. Configure the database type and the access to corresponding tables in the Configuring ELT Database Accounts for the Snap.

Switching the Snap behavior

The ELT SCD2 Snap performs a MERGE INTO-based SCD2 operation as opposed to its initial INSERT-and-then-UPDATE approach. If you prefer to use the INSERT-and-then-UPDATE approach for your SCD2 operation, define the Pipeline parameter SCD2_INSERTUPDATE_FALLBACK_MODE with a value greater than zero (0), before running your Pipeline.

Snap Type

The ELT SCD2 Snap is a write-type Snap that writes SCD2 data from a source location to a supported CDW destination.

Prerequisites

Valid accounts and access permissions to connect to the following:

  • Source: AWS S3, Azure Cloud Storage, or Google Cloud Storage.

  • Target: Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery.

Limitation

Known Issues

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • ELT Join

  • ELT Copy

  • ELT Transform

A document containing an SQL query that yields the historization data to load.

Output

Document

  • Min: 0

  • Max: 1

  • ELT Select

A document containing the status of the SCD2 Load operation along with the SQL query that you can use in downstream ELT Snaps if any. 

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current Pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

  • Upload icon ( ): Indicates that you can upload files.

  • You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression icon  enabled, where available. Though this list covers only expressions that are common to all supported target CDWs, you can use (type-in manually) other expressions/functions permitted by your target CDW.

  • See ELT SCD2 Scenarios | Terminology for definitions of the ELT SCD2 terms used in this Snap's configuration.

Field Name

Field Type

Field Dependency

Description

Field Name

Field Type

Field Dependency

Description

Label*

 

Default Value: N/A
ExampleOrders_SCD2_Data

String

None.

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

Get preview data

 

Default ValueNot selected
Example: Selected

Checkbox

None.

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in Databricks Lakehouse Platform

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

Database Name

 

Default Value: N/A
ExampleTestDB

String/Expression

None.

Specify the name of the database in which the target table exists. Leave this field blank if you want to use the database name specified in the Default Database Name field of your Snap Account settings.

Schema Name (Not applicable to Databricks Lakehouse Platform)*

 

Default Value: N/A
ExampleSales

String/Expression/Suggestion

None.

Specify the name of the database schema. Alternatively, choose from the suggested list of schemas associated with the database.

  • Ensure that you include the exactly same schema name including the double quotes, if used, when you repeat the schema name in the Target Table Name field.

  • Leave this field blank if your target database is Databricks Lakehouse Platform.

Target Table Name*

 

Default Value: N/A
ExampleSALES_ORDERS

String/Expression/Suggestion

None.

Specify the name of the SCD2 table or view in which you want to load the data. Alternatively, choose from the suggested list of table names associated with the schema.

Only views that can be updated (have new rows) are listed as suggestions. So, Join views are not included. This also implies that the Snap account user has the Insert privileges on the views listed as suggestions.

  • Ensure that you include the same schema name, if at all, including the double quotes as specified in the Schema Name field.

  • If the target table does not exist during run-time, the Snap creates one with the name that you specify in this field and writes the data into it. During Pipeline validation, the Snap creates the new table but does not write any records into it.

    • The new table thus created will not be dropped in the event of a subsequent/downstream Snap failure during validation.

  • The target table or view should have the following columns for field historization to work:

    • Required. One column to denote the starting date of the current row. For example, "START_DATE".

    • Required. One column to denote when the row was historized. For example, "END_DATE". For an active row, it can be null. For a historical row it has the value that indicates it was effective until that date.

    • Optional. One column to demarcate whether a row is a current row or not. For example, "CURRENT_ROW". For the current row, the value would be true or 1.

    • Required when current row is specified. One column to demarcate whether a row is a historical row or not. For example, "HISTORICAL_ROW". For the historical row, the value would be false or 0.

  • Use the ALTER TABLE command to add these columns in your target table if they are missing. 

Input Source Alias

 

Default Value: N/A
Examplesrc

String/Expression

None.

Specify an alias name for your source data table.

Target Table Action

 

Default ValueMerge data into target table
ExampleAlter table

Drop-down list

None.

Select the SCD2 action to perform on the target table.

Available options are:

  • Merge data into target table. Loads the data from the source files into the specified target table.

  • Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Table Columns field set.

  • Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Table Columns field set with the ColumnData Type, and Modifier fields. 

  • Overwrite existing table. Overwrites the target table with the data from the source table/files.

Selecting the Target Table Action

  • Use the Drop and Create table option in this field only to create the target table for the first time. After the first execution of your Pipeline, ensure that you change the Target Table Action to Merge data into target table.

  • We recommend your utmost discretion in the intermittent use of the Alter table option (schema changes to target table).

During Pipeline Validation

The Snap does not modify the existing tables during Pipeline validation, but if the specified target table does not exist, it creates a new target table.

Table Columns

Field set

Target Table Action is Drop and Create table or Alter table.

This field set enables you to configure the schema of the target table. You can use this field set to create the target table or to add or drop columns from it.

This field set consists of the following fields:

  • Column

  • Data Type

  • Modifier

 

Column

 

Default Value: N/A
ExampleID, FNAME

String/Suggestion

Same as Table Columns.

Specify the name of the column that you want to load in the target table.

You can also specify the columns to drop if you select the Alter table option in the Target Table Action field. 

Data Type

 

Default Value: N/A
ExampleINT, VARCHAR

String

Same as Table Columns.

Specify the data type for the values in the specified column. 

Modifier

 

Default ValueAdd
ExampleDrop

Drop-down list

Target Table Action is Alter table.

Select whether you want to add/drop the specified column.

Available options are:

  • Add. To add a new column to the target table with the specified column name and data type.

  • Drop. To delete the column from the target table. This option deletes all data existing in the specified column.

Null Value Behavior

 

Default ValueHonor nulls as distinct values
ExampleIgnore nulls

Drop-down list

None.

Select an option to specify how you want the Snap to treat rows with null values in Natural Key or Cause-Historization fields of the source table. The available options are:

  • Honor nulls as distinct values: To treat each null as a value in the field. And as a result, equate all null values (as the same value) for all practical purposes.

  • Ignore nulls: To treat any rows with nulls in Natural Key and Cause-Historization fields of the source table as absent and proceed with the SCD2 operation based on the values available in the remaining rows.

  • Error on nulls: To abort the Snap execution when a null value is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap.

Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.

Invalid Row Handling

 

Default Value: Ignore invalid rows

Example: Insert duplicate invalid rows

Drop-down list

None.

Select an option to specify how you want the Snap to treat invalid rows, if any, in the source data. A row is considered invalid when it does not result in a new, valid, and coherent SCD2 entry/row in the target table.

The available options are:

  • Ignore invalid rows: To treat invalid rows as absent and proceed with the SCD2 operation based on the valid values available in the remaining fields. The Snap also ignores inserting any potentially invalid rows into the target table.

  • Insert duplicate invalid rows: To treat all rows in a uniform manner—allowing an entry for each invalid row, in the target table— and optionally flagging such invalid entries in the target table as invalid (Invalid = true or 1 or a similar boolean value) based on the Invalid historical row flag specified in the Target Table Temporal Field(s) fieldset.

  • Error on invalid rows: To abort the Snap execution when an invalid row is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap.

Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.

Target Table Natural Key(s)*

This field set enables you to configure the names of fields that identify a unique row in the target table. The identity key cannot be used as the Natural key, since a current row and its historical rows cannot have the same identity value.

Specify the value in the following field for each new row added in this field set.

  • Target Table Natural Key(s)

Target Table Natural Key(s)

 

Default Value: N/A

Example: CUSTOMER_ID

Example (Dynamic): JSON.parse(_KEY_COLUMNS) where _KEY_COLUMNS = ["CUSTOMER_ID", "ADDRESS"]

String/Expression/Suggestion

 

Specify the name of the Natural Key column from the target table. This field also supports the JSON.parse() function on a pipeline parameter that contains an array of column names, allowing the natural keys to be defined dynamically at runtime.

Target Table Cause-Historization Field(s)*

This field set enables you to configure the names of fields where any change in value causes the historization of an existing row and the insertion of a new 'current' row.

Specify the value in the following field for each new row added in this field set.

  • Target Table Cause-Historization Field(s)

Target Table Cause-Historization Field(s)*

 

Default Value: N/A

Example: LAST_ORDER_AMT

Example (Dynamic): JSON.parse(_DATA_COLUMNS) where _DATA_COLUMNS = ["CUSTOMER_ID", "ADDRESS"]

String/Expression/Suggestion

None.

Specify the name of the historization-causing column from the target table. The field also supports the JSON.parse() function on a pipeline parameter that contains an array of key column names, allowing these columns to be provided dynamically at runtime.

Target Table Temporal Field(s)*

This field set enables you to configure the historical and updated information for the Cause-historization field. Click  to add SCD fields. Specify the values in the following fields for each SCD field row in the field set.

  • Meaning

  • Field

  • Value

Meaning*