Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this article

Table of Contents
minLevel2
maxLevel3
absoluteUrltrue

This article emphasizes the different scenarios that are possible while performing an SCD2 operation and the behavior of the ELT SCD2 Snap, in each of these scenarios.

Terminology

To follow these scenarios and configure the ELT SCD2 Snap properly in your SnapLogic environment, you will need to understand the following terms:

...

Term

...

Definition

...

Target Table Natural Key

...

A unique identifier based on which the SCD2 operation inserts a new record with different values for the same key in the target table.

...

Target Table Cause-Historization Field(s)

...

One or more fields in the target table which can cause new versions of data in the row.

For example, changes in Address or Employee Designation of an individual over a period of time. Each change in the address can cause a new version of the row (SCD Type2 entry) and so does each change in an employee’s designation.

...

Target Table Temporal Fields

...

A set of fields in the target table that store information specific to the validity of the cause-historization values in each of the row versions created. They can be a combination of the following fields:

  • Current row (Required): Boolean value to indicate if an entry is the current row.

  • Historical row (Required): Boolean value to indicate if an entry is a historical row.

  • Start date of current row (Required): Starting date-time for an entry to be the current row.

  • End date of current row (Optional) or End date of historical row (not chosen, by default): Date-time till when a current row entry remains current (becomes a historical row after that date-time passes by).

  • Invalid historical rows flag (Optional): An indicator for invalid rows in the target table. They are marked invalid due to lack of adequate historization information (logically incorrect values or null values) in the rows.

Info

For every natural key in the target table, there must be only one current row (valid now). All other entries (rows) for that natural key are either historical rows or invalid rows.

A change of value for a cause-historization field causes the existing current row to expire and the new entry (with the updated value) becomes the current row.

Scenario 1: Ideal SCD2 operation

Target table In this article

Table of Contents
minLevel2
maxLevel3
absoluteUrltrue

This article emphasizes the different scenarios that are possible while performing an SCD2 operation and the behavior of the ELT SCD2 Snap, in each of these scenarios.

Terminology

To follow these scenarios and configure the ELT SCD2 Snap properly in your SnapLogic environment, you will need to understand the following terms:

Term

Definition

Target Table Natural Key

A unique identifier based on which the SCD2 operation inserts a new record with different values for the same key in the target table.

Target Table Cause-Historization Field(s)

One or more fields in the target table which can cause new versions of data in the row.

For example, changes in Address or Employee Designation of an individual over a period of time. Each change in the address can cause a new version of the row (SCD Type2 entry) and so does each change in an employee’s designation.

Target Table Temporal Fields

A set of fields in the target table that store information specific to the validity of the cause-historization values in each of the row versions created. They can be a combination of the following fields:

  • Current row (Required): Boolean value to indicate if an entry is the current row.

  • Historical row (Required): Boolean value to indicate if an entry is a historical row.

  • Start date of current row (Required): Starting date-time for an entry to be the current row.

  • End date of current row (Optional) or End date of historical row (not chosen, by default): Date-time till when a current row entry remains current (becomes a historical row after that date-time passes by).

  • Invalid historical rows flag (Optional): An indicator for invalid rows in the target table. They are marked invalid due to lack of adequate historization information (logically incorrect values or null values) in the rows.

Info

For every natural key in the target table, there must be only one current row (valid now). All other entries (rows) for that natural key are either historical rows or invalid rows.

A change of value for a cause-historization field causes the existing current row to expire and the new entry (with the updated value) becomes the current row.

Scenario 1: Ideal SCD2 operation

Target Table (Before)

Customer ID
(Natural Key)

Address
(Cause Historization Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

1

Chicago

1/1/18

12/1/18

false

1

New York

1/1/19

MAX_DATE

true

Source/Update Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

Chicago

1/1/20

1

New York

1/1/21

Output 1: Target Table (After SCD2 operation)

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

1/1/20

false

Not current row anymore.

1

Chicago

1/1/20

1/1/21

false

New entry but not current row. Start date for the newer row is the End date for this entry.

1

New York

1/1/21

MAX_DATE

true

Current row with default End Date.

Info

The SCD2 operation uses a row-by-row approach. The final current row is arrived at, only after processing all rows for that natural key.

Scenario 2: Cause Historization Row is Same as Current Row

Target Table (Before)

Customer ID
(Natural Key)

Address
(Cause Historization Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

1

Chicago

1/1/18

12/1/18

false

1

New York

1/1/19

MAX_DATE

true

Source/Update Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

New York

1/1/20 (or 1/1/17)

1

Boston

1/1/21

Output 2: Target Table (After SCD2 operation)

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

1/1/21

false

Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

1

Boston

1/1/21

MAX_DATE

true

Current row with default End Date.

Scenario 3: Target Table is Empty and Source Table Contains Duplicate Values for Cause-Historization field

Target Table (Before)

Customer ID
(Natural Key)

Address
(Cause Historization Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

(Empty Table)

Source/Update Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

Chicago

1/1/15

1

Boston

1/1/16

1

Chicago

1/1/18

1

Chicago

1/1/19 (or 1/1/14)

1

Boston

1/1/20

Output 3: Target Table (After SCD2 operation)

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

1/1/15

1/1/16

false

End date arrived based on the start date of the next entry (in ascending order).

1

Boston

1/1/16

1/1/18

false

End date arrived based on the start date of the next entry (in ascending order).

1

Chicago

1/1/18

1/1/20

false

Any start date for Chicago outside the range 1/1/18 - 1/1/20 will be ignored.

1

Boston

1/1/20

MAX_DATE

true

Current row with default End Date.

Info

The SCD2 operation sorts the source data records in the ascending order of their Start Dates when there is no row numbering (sequence) specified in the source data.

Scenario 4: Start Date in Source Row Earlier than Start Date of Current Row

Target Table (Before)

Customer ID
(Natural Key)

Address
(Cause Historization Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

1

Chicago

1/1/18

12/1/18

false

1

New York

1/1/19

MAX_DATE

true

Source/Update table Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

Chicago

Orlando

1/1/

20

17

1

New York

Boston

1/1/21

Output 4.1: Target

...

Table (After SCD2 operation) when Invalid rows are Ignored

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

1/1/20

false

Not current row anymore.

1

Chicago

1/1/

20

19

1/1/21

false

New entry but not

Not current row anymore.

Start date for the newer row is the End date for this entry

Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

1

New York

Boston

1/1/21

MAX_DATE

true

Current row with default End Date.

Info

The SCD2 operation uses a row-by-row approach. The final current row is arrived at, only after processing all rows for that natural key.

Scenario 2: Cause Historization Row is Same as Current Row

Target table (Before)

Customer ID

Current row with default End Date.

Output 4.2: Target Table (After SCD2 operation) when Invalid rows are Logged

Customer ID (Natural Key)

Address


(New Cause Historization

Field

Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Invalid
(Temporal Field 4)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

MAX_DATE

true

Source/Update table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

New York

1/1/21

false

Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

1

Orlando

1/1/

20 (or 1/1/17)

17

MIN_DATE

false

true

Invalid row logged with End Date defaulted to MIN_DATE.

1

Boston

1/1/21

...

MAX_DATE

true

Current row with default End Date.

Scenario 5: One or more Start Dates in Source Rows contain NULLs

Target Table (Before)

Customer ID
(Natural Key)

Address
(

New

Cause Historization

Entries

Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

1/1/21

false

Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

false

1

Boston

New York

1/1/

21

19

MAX_DATE

true

Current row with default End Date.

Scenario 3: Target table is Empty and Source Table Contains Duplicate Values for Cause-Historization field

Target table (Before)

...

Source/Update Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

Orlando

NULL

1

Boston

1/1/21

Output 5.1: Target Table (After SCD2 operation) when Invalid rows in the Source are Ignored

Customer ID (Natural Key)

Address


(New Cause Historization

Field

Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current

Row
(Temporal Field 3)

(Empty Table)

Source/Update table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date

Row
(Temporal Field

1

3)

Remarks / Key Take-aways

1

Chicago

1/1/

15

18

12/1/18

Boston

1/1/16

false

No change.

1

Chicago

New York

1/1/

18

19

1/1/21

Chicago

false

Not current row anymore. Any end date for New York before 1/1/

19 (or 1/1/14)

21 (start date for Boston) results in New York becoming a historical row.

1

Boston

1/1/

20

...

21

MAX_DATE

true

Current row with default End Date.

Output 5.2: Target Table (After SCD2 operation) when Invalid rows are Logged and Nulls Honored

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Invalid
(Temporal Field 4)

Remarks / Key Take-aways

1

Chicago

1/1/

15

18

1

12/1/

16

18

false

End date arrived based on the start date of the next entry (in ascending order)

No change.

1

Boston

New York

1/1/

16

19

1/1/

18

21

false

End date arrived based on the start date of the next entry (in ascending order)

Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

1

Chicago

1/1/18

1/1/20

false

Any start date for Chicago outside the range 1/1/18 - 1/1/20 will be ignored

Orlando

NULL

MIN_DATE

false

true

NULL remains as is. Invalid row logged with End Date defaulted to MIN_DATE.

1

Boston

1/1/

20

21

MAX_DATE

true

Current row with default End Date.

Info

The SCD2 operation sorts the source data records in the ascending order of their Start Dates when there is no row numbering (sequence) specified in the source data.

Scenario 4: Start Date in Source Row Earlier than Start Date of Current Row

...

true

Current row with default End Date.

Output 5.3: Target Table (After SCD2 operation) when the Pipeline is set to Return the Error upon Detecting Invalid rows

No updates to the target table. SCD2 operation is aborted.

Scenario 6: NULLs in Natural Key or Cause Historization Fields at Source

Target Table (Before)

Customer ID
(Natural Key)

Address
(Cause Historization Field)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

1

Chicago

1/1/18

12/1/18

false

1

New York

1/1/19

MAX_DATE

true

Source/Update table Table (Incoming SCD2 Data)

Customer ID (Natural Key)

Address (Cause Historization Values)

Start Date (Temporal Field 1)

1

Orlando

NULL

1/1/

17

20

1

Boston

1/1/21

...

NULL

Miami

1/1/22

Output 6.1: Target

...

Table (After SCD2 operation) when

...

NULLs in Source and Target Tables are Honored

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Remarks / Key Take-aways

1

Chicago

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/

18

19

12

1/1/

18

20

false

No change

Not current row anymore. Any end date for New York before 1/1/20 (start date for the next row) results in New York becoming a historical row.

1

New York

NULL

1/1/

19

20

1/1/21

false

Not

A NULL entry made in the ascending order of the start dates. End Date added This row cannot be the current row

anymore. Any end date for New York before

.

1

Boston

1/1/21

(start date for Boston) results in New York becoming a historical row.

1

Boston

MAX_DATE

true

Current row with default End Date.

NULL

Miami

1/1/

21

22

MAX_DATE

true

Current row with default End Date

This entry, though included in the target table, does not correspond to the same Natural Key as the other entries, in this example.

Output

...

6.2: Target

...

Table (After SCD2 operation) when

...

NULLs in Source and Target Tables are Ignored

Customer ID (Natural Key)

Address (New Cause Historization Entries)

Start Date
(Temporal Field 1)

End Date
(Temporal Field 2)

Current Row
(Temporal Field 3)

Invalid
(Temporal Field 4)

Remarks / Key Take-aways

1

Chicago

1/1/18

12/1/18

false

No change.

1

New York

1/1/19

1/1/21

false

Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row.

1

Orlando

1/1/17

MIN_DATE

false

true

Invalid row logged with End Date defaulted to MIN_DATE

Any entry (Start Date 1/1/20, in this case) containing a NULL is ignored.

1

Boston

1/1/21

MAX_DATE

true

Current row with default End Date.