In this article

1 Overview
- 1.1 Snap Type
- 1.2 Prerequisites
- 1.3 Support for Ultra Pipelines
- 1.4 Known Issues
2 Snap Views
3 Snap Settings
4 Examples
- 4.1 Historizing Incoming Records
- 4.2 Downloads
5 Snap Pack History

Overview

This Snap provides the functionality of SCD (Slowly Changing Dimension) Type 2 on a target Snowflake table. You can use this Snap to execute one SQL lookup request per set of input documents to avoid making a request for every input record. Its output is typically a stream of documents for the Snowflake - Bulk Upsert Snap, which updates or inserts rows into the target table. Therefore, this Snap must be connected to the Snowflake - Bulk Upsert Snap to accomplish the complete SCD2 functionality.

Snap Type

The Snowflake SCD2 is a Read-type Snap that enables you to execute multiple queries as a single atomic unit.

Prerequisites

Read and write access to the Snowflake instance.
The target table should have the following three columns for field historization to work:
- Column to demarcate whether a row is a current row or not. For example, "CURRENT_ROW". For the current row, the value would be true or 1. For the historical row, the value would be false or 0.
- Column to denote the starting date of the current row. For example, "START_DATE".
- Column to denote when the row was historized. For example, "END_DATE". For the active row, it is null. For a historical row, it has the value that indicates it was effective till that date.

Security Prerequisites

You should have the following permissions in your Snowflake account to execute this Snap:
- Usage (DB and Schema): Privilege to use database, role and schema.
- Create table: Privilege to create a table on the database. role and schema.
For more information on Snowflake privileges, refer to Access Control Privileges.

Internal SQL Commands

This Snap uses the SELECT command internally. It enables querying the database to retrieve a set of rows.

Support for Ultra Pipelines

Works in Ultra Pipelines. However, we recommend that you not use this Snap in an Ultra Pipeline.

Known Issues

Because of performance issues, all Snowflake Snaps now ignore the Cancel queued queries when pipeline is stopped or if it fails option for Manage Queued Queries, even when selected. Snaps behave as though the default Continue to execute queued queries when the Pipeline is stopped or if it fails option were selected.

We plan to address this issue in a patch for the next monthly release in December.

Snap Views

Type	Format	Number of Views	Examples of Upstream and Downstream Snaps	Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 1
Max: 1

Mapper
JSON Parser Snap

A document in the input view should contain a data map of key-value entries. The input data must contain data in the Natural Key (primary key) and Cause-historization fields.

Output

Document

Min: 1
Max: 1

Snowflake Bulk Upsert

A document in the output view contains a data map of key-value entries for all fields of a row in the target Snowflake table.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

Asterisk (*): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon (): Indicates that you can add fields in the field set.
Remove icon (): Indicates that you can remove fields from the field set.

Auto Historization Query		This field-set is used to specify the fields that are to be used to historize table data. Historization is in the sort order specified. Care must be taken that the field is sortable. You can also add multiple fields here; historizaton occurs when even of the fields is changed.
Field Name			Field Type	Description
Label* Default Value: Snowflake - SCD2 Example: Snowflake - SCD2			String	Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
Schema name Default Value: N/A Example: TestSchema			String/Expression	Specify the database schema name. In case it is not defined, then the suggestion for the Table Name will retrieve all tables names of all schemas. The property is suggestible and will retrieve available database schemas during suggest values.
Table Name Default Value: N/A Example: TestTable			String/Expression	Specify the name of the table in the instance. The table name is suggestible and requires an account setting. The target table should have the following three columns for field historization to work: Column to demarcate whether a row is a current row or not. For example, "CURRENT_ROW". For the current row, the value would be true or 1. For the historical row, the value would be false or 0. Column to denote the starting date of the current row. For example, "START_DATE". Column to denote when the row was historized. For example, "END_DATE". For the active row, it's null. For a historical row, it has the value that indicates it was effective till that date. Use the ALTER table command to add these columns to your target table if they are not present.
Natural key* Default Value: N/A Example: id (Each record has to have a unique value)			String/Expression	Specify the names of fields that identify a unique row in the target table. The identity key cannot be used as the Natural key, since a current row and its historical rows cannot have the same natural key value
Cause-historization fields Default Value: N/A Example: gold bullion rate			String/Expression	Specify the names of fields where any change in value causes the historization of an existing row and the insertion of a new current row.
SCD fields	The historical and updated information for the Cause-historization field. Click + to add SCD fields. By default, there are four rows in this field-set: Current row Historical row Start date of the current row End date of historical row.
	Meaning* Default Value: Current row Historical row Start date of current row End date of historical row		Dropdown list	Specifies the table columns that are to be updated for implementing the SCD2 type transformation.
	Field* Default Value: N/A Example: CURRENT_ROW		String/Expression	Specify the fields in the table will contain the historical information. Below are the values that must be configured for each row: Current row: The name of the column in the target table that holds the flag for the historized field. For example, "CURRENT_ROW". Historical row: The name of the column in the target table that holds the flag for the historized field. It has to be the same as the value configured for the Current row field. For example, "CURRENT_ROW". Start date of current row: The name of the column in the target table for denoting the start date for the current row. For example, "START_DATE". End date of historical row: The name of the column in the target table for denoting the end date for the historical row. For example, "END_DATE". By default, the start and end date for both Current row and Historical row are null. After the Snap is executed, the start date for the updated row data automatically becomes the end date for the earlier version of the data (Historical row).
	Value* Default Value: Current row and Historical row: N/A Start date of current row, and End date of historical row: `Date.now()`		String/Expression	Specify the value to be assigned to the current or historical row. For date-related rows, the default is `Date.now()`. The Value field should be configured as follows: Current row: 1 Historical row: 0
Ignore unchanged rows Default value: Deselected			Checkbox	Specifies whether the Snap must ignore writing unchanged rows from the source table to the target table. If you enable this option, the Snap generates a corresponding document in the target only if the Cause-historization column in the source row is changed. Else, the Snap does not generate any corresponding document in the target.
Number of Retries Default Value: 0 Example: 3			Integer/Expression	The number of times that the Snap must try to write the fields in case of an error during processing. An error is displayed if the maximum number of tries has been reached.
Retry Interval (Seconds) Default Value: 1 Example: 3			Integer/Expression	The time interval, in seconds, between subsequent retry attempts.
		Field* Default Value: N/A Example: Invoice_Number	String/Expression	Specify the name of the field. This is a suggestible field and suggests all the fields in the target table. If this field has null values in the incoming records, then the value in the Snowflake table is treated as the current value and the incoming record is historized.
		Sort Order* Default Value: Ascending Order Example: Descending Order	Dropdown list	The order in which the selected field is to be historized. Available options are: Ascending Order: The higher value is classified as a current event. For example date of transaction, age, height, etc. Descending Order: The lower value is classified as the current event. For example, rank.
Input Date Format Default Value: Continue to execute the snap with given input Date format Example: Auto Convert the format to Snowflake default format			Dropdown list	The property has the following two options: Select Continue to execute the snap with the given input Date format if you want the Snap to continue with the current date format. This option is selected by default. Select Auto Convert the format to Snowflake default format if you want the Snap to convert the provided date format to the default Snowflake date format. To know about the date formats supported by Snowflake, see Snowflake date formats
Manage Queued Queries Default Value: Continue to execute queued queries when the Pipeline is stopped or if it fails Example: Cancel queued queries when the Pipeline is stopped or if it fails			Dropdown list	Select this property to decide whether the Snap should continue or cancel the execution of the queued Snowflake Execute SQL queries when you stop the pipeline. If you select Cancel queued queries when pipeline is stopped or if it fails, then the read queries under execution are canceled, whereas the write queries under execution are not canceled. Snowflake internally determines which queries are safe to be canceled and cancels those queries.
Snap Execution Default Value: Validate & Execute Example: Execute only			Dropdown list	Select one of the following three modes in which the Snap executes: Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime. Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data. Disabled: Disables the Snap and all Snaps that are downstream from it.

Examples

Historizing Incoming Records

This example demonstrates how you can use the Snowflake SCD2 Snap to auto-historize records. In this example, since the existing record in the Snowflake table is the latest, the incoming records are historized.

This Pipeline performs the following operations:

The Target Table

Before we start, let us look at the target table and understand some of its columns that are necessary for the Pipeline:

We focus on the highlighted columns above to demonstrate auto-historization and describe their function in the table.

POINT ID is the natural key in this table.
Changes in SCHEDULEDVOLUME for a natural key are historized.
HISTORYSTARTDATE and HISTORYENDDATE are used to identify the current record.
FLAG denotes if a record is a current record with the value TRUE.
STARTDATE and ENDDATE are automatically calculated by the Snap. They represent the date range during which a record was the current record. The ENDDATE is blank for the current record.

Input Data: Reading, Parsing and Mapping

This Pipeline is configured to send records into the target table. The File Reader Snap is configured to read a CSV file that contains the records. The downstream CSV Parser Snap parses the CSV file read by the File Reader Snap. Below is a preview of this file:

Based on the values of HistoryStDate and HistoryEndDate, it is clear that the existing record in the target table is the latest (or current) record.

Since the output from the CSV Parser Snap is a string, it has to be parsed into the appropriate data type. Parsing and data mapping is done using the Mapper Snap, as shown below:

This mapped data is then sent to the Snowflake SCD2 Snap.

Data Processing

The Snowflake SCD2 Snap performs SCD2 operations on the target table. We configure it as shown below:

Let us take a look at the highlighted Snap fields and how they affect the Snap functionality in this example:

Natural key: The Snap looks for records with matching POINT ID values in the incoming documents to group the records.
Cause-historization fields: For each unique POINT ID, changes in SCHEDULEDVOLUME initiate historization. If a change has not occurred, the incoming records are historized..
SCD fields:
- The state of the current or historical record is marked in the FLAG field, T for current record and F for the historical record.
- The columns STARTDATE and ENDDATE in the target table are maintained to denote the start and end dates of the current state of the table's data. The ENDDATE is always blank for a current record.
Auto Historization Query: The Snap sorts the values in the HISTORYSTARTDATE and HISTORYENDDATE columns for the same POINT ID in the Snowflake table and the incoming documents in ascending order. The record with the highest value in those fields is considered the current record.

All incoming records pertaining to a POINT ID are historized. The value F is assigned under the FLAG column to these fields and the corresponding STARTDATE and ENDDATE are evaluated by the expression Date.now().

This can be seen in the SCD2 Snap's output preview:

The Snap identifies the current and historical records and this data is now ready to be updated and inserted into the target table.

Upsert Data into the Target Table

We use the Snowflake Bulk Upsert Snap to update the target table with this historized data. We configure the Snowflake Bulk Upsert Snap as shown below:

Snowflake SCD2