Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this article

...

You can use this Snap to implement the Slowly Changing Dimension Type 2 (SCD2) function in your Snowflake, Redshift, Azure Synapse, Databricks Lakehouse Platform, or BigQuery table. Configure the database type and the access to corresponding tables in the Configuring ELT Database Accounts for the Snap.

Warning
titleInternal Change in Snap behavior

Starting from November 2021, 4.27 release, the ELT SCD2 Snap performs a MERGE INTO-based SCD2 operation instead of the previous Insert-and-then-Update approach. It is important that you take notice of this change in behavior. However, if you prefer to use the Insert-and-then-Update approach for your SCD2 operation, define the Pipeline parameter SCD2_INSERTUPDATE_FALLBACK_MODE with a value greater than zero (0), before running your Pipeline.

...

Info
titleSQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

...

Parameter NameData TypeDescriptionDefault ValueExample 
LabelString
Insert excerpt
File Writer
File Writer
nopaneltrue
N/AOrders_SCD2_Data
Get preview dataCheckbox
Multiexcerpt include macro
namegetpreviewdata
pageELT Intersect
Not selectedSelected
Database NameString/ExpressionSpecify the name of the database in which the target table exists. Leave this field blank if you want to use the database name specified in the Default Database Name field in the account settings.N/ATESTDB
Schema Name (Not applicable to Databricks Lakehouse Platform)String/Expression/SuggestionRequired. Specify the name of the database schema. Alternatively, click Image Modified to fetch a list of schemas associated with the database, and select the schema.
Multiexcerpt include macro
nameME_Schema_Name
pageELT Insert Select
N/ASALES
Target Table NameString/Expression/Suggestion

Required. Specify the name of the SCD2 table in which you want to load the data. Alternatively, click Image Modified to fetch a list of table names associated with the schema, and select the table name.

Multiexcerpt include macro
nameME_Schema_And_Table_Names
pageELT Insert Select

Note

The target table should have the following columns for field historization to work:

  • Required. Column to denote the starting date of the current row. For example, "START_DATE".
  • Required. Column to denote when the row was historized. For example, "END_DATE". For active row, it can be null. For a historical row it has the value that indicates it was effective till that date.
  • Optional. Column to demarcate whether a row is a current row or not. For example, "CURRENT_ROW". For current row, the value would be true or 1.
  • Required when current row is specified. Column to demarcate whether a row is a historical row or not. For example, "HISTORICAL_ROW". For historical row, the value would be false or 0.

Use the ALTER TABLE command to add these columns in your target table if they are not present. 


N/ASALES_ORDERS
Input Source AliasString/ExpressionSpecify an alias name for your source data table.N/Asrc
Target Table ActionDrop-down list

Select the SCD2 action to perform on the target table.

Available options are:

  • Merge data into target table. Loads the data from the source files into the specified target table. You must ensure that the target table exists, else the Snap displays an error.
  • Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Table Columns field set.
  • Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Table Columns field set with the ColumnData Type, and Modifier fields. You must ensure that the target table exists, else the Snap displays an error.


Info
titleDuring Pipeline Validation

The Snap does not modify the existing tables during Pipeline validation, but if you choose Target Table Action as Drop and Create table, it creates a new target table.


Note
titleSelecting the Target Table Action
  • Use the Drop and Create table option in this field only to create the target table for the first time. After the first execution of your Pipeline, ensure that you change the Target Table Action to Merge data into target table.
  • We recommend your utmost discretion in the intermittent use of the Alter table option (schema changes to target table).


Merge data into target tableAlter table
Table Columns

This field set enables you to configure the schema of the target table. You can use this field set to add or drop columns in the target table if you select the Drop and Create table or Alter table in the Target Table Action field. Click Image Modified to add a new row in the field set.

This field set consists of the following fields:

  • Column
  • Data Type
  • Modifier
ColumnString/SuggestionSpecify the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Target Table Action field. N/A

ID

FNAME

Data TypeString

Activates when you select the Drop and Create table or Alter table options in the Target Table Action field.

Specify the data type for the values in the specified column. 

N/A

INT

VARCHAR

ModifierDrop-down list

Activates when you select the Alter table option in the Target Table Action field.

Select whether you want to add/drop the specified column.

Available options are:

  • Add
  • Drop
AddDrop
Null Value BehaviorDrop-down list

Select an option to specify how you want the Snap to treat rows with null values in Natural Key or Cause-Historization fields of the source table. The available options are:

  • Honor nulls as distinct values: To treat each null as a value in the field. And as a result, equate all null values (as the same value) for all practical purposes.

  • Ignore nulls: To treat any rows with nulls in Natural Key and Cause-Historization fields of the source table as absent and proceed with the SCD2 operation based on the values available in the remaining rows.

  • Error on nulls: To abort the Snap execution when a null value is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap.

Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.

Honor nulls as distinct valuesIgnore nulls
Invalid Row HandlingDrop-down list

Select an option to specify how you want the Snap to treat invalid rows, if any, in the source data. A row is considered invalid when it does not result in a new, valid, and coherent SCD2 entry/row in the target table.

The available options are:

  • Ignore invalid rows: To treat invalid rows as absent and proceed with the SCD2 operation based on the valid values available in the remaining fields. The Snap also ignores inserting any potentially invalid rows into the target table.

  • Insert duplicate invalid rows: To treat all rows in a uniform manner—allowing an entry for each invalid row, in the target table— and optionally flagging such invalid entries in the target table as invalid (Invalid = true or 1 or a similar boolean value) based on the Invalid historical row flag specified in the Target Table Temporal Field(s) fieldset.

  • Error on invalid rows: To abort the Snap execution when an invalid row is found in the source data. The Snap writes the error (message) to the Pipeline runtime statistics and displays a corresponding error message in the Snap.

Refer to the ELT SCD2 scenarios to learn more about the impact of the selection you make in this field.

Ignore invalid rowsInsert duplicate invalid rows
Target Table Natural Key(s)

Required. This field set enables you to configure the names of fields that identify a unique row in the target table. The identity key cannot be used as the Natural key, since a current row and its historical rows cannot have the same identity value. Click Image Modified to add a new row in the field set.

Specify the value in the following field for each new row added in this field set.

  • Target Table Natural Key(s)
Target Table Natural Key(s)String/Expression/SuggestionSpecify the name of the Natural Key column from the target table.N/ACUSTOMER_ID
Target Table Cause-Historization Field(s)

Required. This field set enables you to configure the names of fields where any change in value causes the historization of an existing row and the insertion of a new 'current' row. Click Image Modified to add a new row in the field set. 

Specify the value in the following field for each new row added in this field set.

  • Target Table Cause-Historization Field(s)
Target Table Cause-Historization Field(s)String/Expression/SuggestionRequired. Specify the name of the historization-causing column from the target table.N/ALAST_ORDER_AMT
Target Table Temporal Field(s)

Required. This field set enables you to configure the historical and updated information for the Cause-historization field. Click Image Modified to add SCD fields. Specify the values in the following fields for each SCD field row in the field set.

  • Meaning
  • Field
  • Value
MeaningDrop-down list
RequiredSelect one from the following six permitted values for this field in each row. At the minimum, ensure that you configure appropriate values for the first three meanings, only one of the end dates, and the flag in this list.
  • Current row (Required)

  • Historical row (Required)

  • Start date of current row (Required)

  • End date of current row (Selected, by default)

  • End date of historical row

    • Remove/replace End date of current row with this item, if you want to specify this date.

  • Invalid historical rows flag (Optional)

Info
titleBreaking change

In 4.26 GA, we have updated the last option in this fieldInvalid historical row flag—and it may cause your Pipelines to fail. See Snap Pack History for more information.


Five rows with one list item in each row 
(without repetition and End date of historical row).

.

Start date of current row

FieldString/Expression/Suggestion

Required. Configure the field name from the target table that would contain the historical information. Click Image Modified to select an item from the suggested list of items.

  • For Current row: The name of the column in the target table that holds the flag for the historized field.

  • For Historical row: The name of the column in the target table that holds the flag for the historized field. It has to be the same as the value configured for the Current row field.

  • For Start date of current row: The name of the column in the target table for denoting the start date for the current row.

  • For End date of current row: The name of the column in the target table for denoting the end date for the current row. In this case, the start date of the current row specified above is automatically filled in as the end/closing date for the matching historical record.

  • For End date of historical row: The name of the column in the target table for denoting the end date for the historical rows.

    Info

    If you are specifying the End date of historical row, we recommend that you specify an appropriate expression in the corresponding Value or Expression field. See Expressions to use for End date of Historical Row to choose the right expression from a set of expression templates.


  • For Invalid historical row flag: The name of a nullable column that holds the flag for an invalid historical row. If you specify a non-null value for this field:
    • While inserting an invalid row in the target table, the Snap inserts the specified value in the Invalid Row flag column to indicate that the row is invalid.

    • If no invalid rows are inserted, the Invalid Row flag column value remains empty for all valid rows.

See the Troubleshooting section in this document and ELT SCD2 Scenarios to ensure that you have configured these fields in your Snap's settings properly.

Info

The date fields must be configured with a field and a corresponding value. Pass null as the value to indicate the field's value is null.


N/A

Start_Date

End_Date

Current_Row

Historical_Row

Value or ExpressionString/Expression

Required. Configure the value for each of the fields selected in the Field field.

Info

The date fields must be configured with a field and a corresponding value. Pass null as the value to indicate the field's value is null.


Info

For the End date of historical row, we recommend that you specify an appropriate expression in this field. See Expressions to use for End date of Historical Row to choose the right expression from a set of expression templates.


N/A

For date fields:
src.start_date,
'2021-12-31'
CURRENT_DATE
null

For row fields:
'true' or 'false'

Target Table Sort Field(s)

Required. This field set enables you to sort the records in the target table, by one or more historization fields (for example, by natural key). This mapping is for columns that are not part of the Target Table Temporal FieldsClick Image Modified to add a sort order. Specify the values in the following fields for each sort order.

  • Field
  • Sort Order
  • Null Value Sort Preference

Sort Field

String/ExpressionSpecify the column by which to sort the target table data set.N/A

start_date

GENDER

Sort OrderDrop-down listChoose either of the sort orders—ASC (ascending) or DESC (descending)—for the target table data set.ASCDESC
Null Value Sort PreferenceDrop-down list

Choose the position for the null values, if any, in the selected column while sortingNULLS FIRST (at the beginning) or NULLS LAST (at the end).

Info

This field is not applicable if you are working with an Azure Synapse target database.


NULLS LASTNULLS FIRST
Target-Source Column Mapping

This field set enables you to define a one-to-one mapping between the columns in the source and target tables for inserting new rows into the target table. Click Image Modified to add a new mapping row.

This field set contains the following fields:

  • Target Table Column Name
  • Source Column Mapping Expression

Target Table Column Name

String/Expression/Suggestion

Choose a column from the target table's schema to map with a column from the source data set (schema).

Note

Ensure that the target table column name you specify for mapping with a source column is not already configured as a target table SCD field.


N/AORD_AMT

Source Column Mapping Expression

String/Expression

Choose a column from the source data set (schema) that corresponds to the respective column in the target table's schema. You can alternatively specify an expression for the source column name.

Note

If you have specified the target table column as a target table natural key or a target table cause-historization field already in this Snap, ensure that you specify the corresponding source column mapping expression (not null nor left blank). All other columns not mentioned in the Snap can be mapped to null.


N/Asrc.ORD_AMT

Expressions to use for End Date of Historical Row

Using a fixed (date or timestamp) value for the End date of Historical Row may lead the Snap to unexpected behaviors like incorrect data loads or Snap errors. Hence, we recommend that you enable the Expression button to :Expression: and specify one of the following expressions:

...

Sort Order

...

DATEADD (DAY, -1, CASE WHEN LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST) IS NULL then '9999-12-31' ELSE LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST) END)

...

Descending

...

DATEADD (DAY, -1, CASE WHEN LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE DESC NULLS FIRST) IS NULL then '1900-01-01' ELSE LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE DESC NULLS FIRST) END)

...

DATEADD (DAY, -1, CASE WHEN LEAD (src.START_DATE, src.SORT_COLUMN) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST, SORT_COLUMN <sortOrder> <nullOrder>) IS NULL then '9999-12-31' ELSE LEAD (src.START_DAT, src.SORT_COLUMN) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST, SORT_COLUMN <sortOrder> <nullOrder>) END)

...

TIMEADD (HOUR, n, CASE WHEN LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST) IS NULL then '9999-12-31 23:59:59.999' ELSE LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST) END)

...

Descending

...

TIMEADD (HOUR, n, CASE WHEN LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE DESC NULLS FIRST) IS NULL then '1900-01-01 00:00:00.000' ELSE LEAD (src.START_DATE) OVER (partition by <natural_key> ORDER BY START_DATE DESC NULLS FIRST) END)

...

TIMEADD (HOUR, n, CASE WHEN LEAD (src.START_DATE, src.SORT_COLUMN) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST, SORT_COLUMN <sortOrder> <nullOrder>) IS NULL then '9999-12-31' ELSE LEAD (src.START_DATE, src.SORT_COLUMN) OVER (partition by <natural_key> ORDER BY START_DATE ASC NULLS FIRST, SORT_COLUMN <sortOrder> <nullOrder>) END)

...

titleBefore using the above expressions, make sure that you consider the following:

# You must replace DATEADD and TIMEADD functions with the appropriate function based on your target database.

^ When start_date is not specified as the sort order column in the Snap configuration, the Snap automatically first sorts the rows in the ascending order of their start dates, followed by the other sort order columns specified. Hence, while you define the expression, you must include the start_date as the sort order column before mentioning the other sort order columns.

...


Troubleshooting

ErrorReasonResolution
Ambiguous SCD column mappingYou cannot map a field that is configured in Target Table SCD Fields again in the Target-Source Column Mapping field set.Ensure that the target table column name you specify for mapping with a source column is not already configured as a target table temporal field.
Invalid target column - source expression mappingYou cannot map a target table column to null or a blank source column value/expression.If you have specified the target table column as a target table natural key or a target table cause-historization field already in this Snap, ensure that you specify the corresponding source column mapping expression (not null nor left blank). All other columns not mentioned in the Snap can be mapped to null.

Database cannot be blank.

(When seeking the suggested list for Schema Name field)

Suggestions in the Schema Name and Target Table Name fields do not work when you have not specified a valid value for the Database Name field in this Snap.Specify the target Database Name in this Snap to view and choose from a suggested list in the Schema Name and Target Table Name fields respectively.

Invalid row flag not configured for target table

Invalid historical row flag is not defined in Target Table Temporal Field(s) field set.

Add an entry in Target Table Temporal Field(s) field set for the Meaning Invalid historical row flag with the corresponding field name in the target table and the boolean value (true or false) or an expression that resolves to a boolean true or false. Make sure that the target table contains the specified field.

Target table sort column %s not mapped to a source column or an expressionTarget table sort fields should be mapped to a source column or an expression in one of the two places: Target Table Temporal Field(s) field set or Target-Source Column Mapping field set.Map this column either in Target Table Temporal Field(s) field set or Target-Source Column Mapping field set before you can use it for sorting.
Invalid end date configuration for SCD2You have provided both End date of current row and End date of historical row while one of them is adequate.Remove one of End date of current row and End date of historical row from the Target Table Temporal Field(s) field set and try again.

Ambiguous source data

(Only when you configure the Snap to check for nulls and duplicates.)

Source data contains nulls in one or more of columns you specified in the Target Table Natural Key(s) field set, Target Table Cause-Historization Field(s) field set, or the Start date of current row in Target Table Temporal Field(s) field set.Peruse your source data to ensure that it is valid and contains valid values required to perform the SCD2 operation and try again. 

Source data has ambiguous records

(Only when you configure the Snap to check for nulls and duplicates.)

Source data contains duplicate keys in one or more of columns you specified in the Target Table Natural Key(s) field set, Target Table Cause-Historization Field(s) field set, or the Start date of current row in Target Table Temporal Field(s) field set.Peruse your source data to ensure that it is valid and contains only unique entries required for the SCD2 operation in these columns and try again.

Incorrect start date in source data

(Only when you configure the Snap to check for nulls and duplicates.)

The column you specified for the Start date of current row in Target Table Temporal Field(s) field set has one or more nulls.Ensure that your source data or the start date SQL expression resolves to a valid start date and try again.

Incorrect start date(s) in source data

(Only when you configure the Snap to check for nulls and duplicates.)

Multiple rows for one or more Target Table Natural Keys that you defined contain the same start date.Ensure that your source data contains only unique start dates across the rows for the same natural key. Perform this check for all natural keys you specified, before trying again.
Scenario: Sort order by a date column in the target table is incorrect after the SCD2 operation is complete.Sorting the date values in a column with date data type may yield different results when compared to sorting the same date values in a column with string data type.Ensure that the values loaded in the target table column are of the same data type as defined for the respective column/field.
SQL exception from Snowflake: Syntax error in one or more positions in the SQL queryColumn names in Snowflake tables are case-sensitive. It stores all columns in uppercase unless they are surrounded by quotes during the time of creation in which case, the exact case is preserved. See, Identifier Requirements — Snowflake Documentation.Ensure that you follow the same casing for the column table names across the Pipeline.

[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0

Cannot create table ('<schema name>`.`<table name>`'). The associated location (`…<table name>`) is not empty but it's not a Delta table

(Target CDW: Databricks Lakehouse Platform)

The specified location contains one of the following:

  • A non-Delta table (such as CSV, ORC, JSON, PARQUET)

  • A corrupted table

  • A Delta table with a different table schema

So, the Snap/Pipeline cannot overwrite this table with the target table as needed.

Ensure that you take appropriate action (mentioned below) on the existing table before running your Pipeline again (to create another Delta table at this location).

Move or drop the existing table from the schema manually using one of the following commands:

Access the DBFS through a terminal and run:

  • dbfs mv dbfs:/<current_table_path> dbfs:/<new_table_path> to move the table or

  • dbfs rm -r dbfs:/<table_path> to drop the table.

OR

Use a Python notebook and run:

  • dbutils.fs.mv(from: String, to: String, recurse: boolean = false): boolean to move the table/file/directory or

  • dbutils.fs.rm(dir: String, recurse: boolean = false): boolean to drop the table/file/directory.

...