In this article
...
Term | Definition | ||
---|---|---|---|
Target Table Natural Key | A unique identifier based on which the SCD2 operation inserts a new record with different values for the same key in the target table. | ||
Target Table Cause-Historization Field(s) | One or more fields in the target table which can cause new versions of data in the row. For example, changes in Address or Employee Designation of an individual over a period of time. Each change in the address can cause a new version of the row (SCD Type2 entry) and so does each change in an employee’s designation. | ||
Target Table Temporal Fields | A set of fields in the target table that store information specific to the validity of the cause-historization values in each of the row versions created. They can be a combination of the following fields:
|
Scenario 1: Ideal SCD2 operation
Target table (Before)
Customer ID | Address | Start Date | End Date | Current Row |
1 | Chicago | 1/1/18 | 12/1/18 | false |
1 | New York | 1/1/19 | MAX_DATE | true |
...
Customer ID (Natural Key) | Address (Cause Historization Values) | Start Date (Temporal Field 1) |
1 | Chicago | 1/1/20 |
1 | New York | 1/1/21 |
Output 1: Target table (After SCD2 operation)
Customer ID (Natural Key) | Address (New Cause Historization Entries) | Start Date | End Date | Current Row | Remarks / Key Take-aways |
1 | Chicago | 1/1/18 | 12/1/18 | false | No change. |
1 | New York | 1/1/19 | 1/1/20 | false | Not current row anymore. |
1 | Chicago | 1/1/20 | 1/1/21 | false | New entry but not current row. Start date for the newer row is the End date for this entry. |
1 | New York | 1/1/21 | MAX_DATE | true | Current row with default End Date. |
Info |
---|
The SCD2 Operation operation uses a row-by-row approach. The final current row is arrived at, only after processing all rows for that natural key. |
Scenario 2: Cause Historization Row is Same as Current Row
Target table (Before)
Customer ID | Address | Start Date | End Date | Current Row |
1 | Chicago | 1/1/18 | 12/1/18 | false |
1 | New York | 1/1/19 | MAX_DATE | true |
Source/Update table (Incoming SCD2 Data)
Customer ID (Natural Key) | Address (Cause Historization Values) | Start Date (Temporal Field 1) |
1 | New York | 1/1/20 (or 1/1/17) |
...
1 | Boston | 1/1/21 |
Output 2: Target table (After SCD2 operation)
Customer ID (Natural Key) | Address (New Cause Historization Entries) | Start Date | End Date | Current Row | Remarks / Key Take-aways |
1 | Chicago | 1/1/18 | 12/1/18 | false | No change. |
1 | New York | 1/1/19 | 1/1/ |
21 | false | Not current row anymore. Any end date for New York before 1/1/21 (start date for Boston) results in New York becoming a historical row. | |||
1 | Boston | 1/1/21 | MAX_DATE | true | Current row with default End Date. |
Scenario 3: Target table is Empty and Source Table Contains Duplicate Values for Cause-Historization field
Target table (Before)
Customer ID | Address | Start Date | End Date | Current Row |
(Empty Table) |
Source/Update table (Incoming SCD2 Data)
Customer ID (Natural Key) | Address (Cause Historization Values) | Start Date (Temporal Field 1) |
1 | Chicago | 1/1/15 |
1 | Boston | 1/1/16 |
1 | Chicago | 1/1/18 |
1 | Chicago | 1/1/19 (or 1/1/14) |
1 | Boston | 1/1/20 |
Output 3: Target table (After SCD2 operation)
Customer ID (Natural Key) | Address (New Cause Historization Entries) | Start Date | End Date | Current Row | Remarks / Key Take-aways |
1 | Chicago | 1/1/15 | 1/1/ |
16 | false |
New entry but not current row. Start date for the newer row is the End date for this entry.
1
End date arrived based on the start date of the next entry (in ascending order). | |||||
1 | Boston | 1/1/16 | 1/1/18 | false | End date arrived based on the start date of the next entry (in ascending order). |
1 | Chicago | 1/1/18 | 1/1/20 | false | Any start date for Chicago outside the range 1/1/18 - 1/1/20 will be ignored. |
1 | Boston | 1/1/ |
20 | MAX_DATE | true | Current row with default End Date. |
Info |
---|
The SCD2 operation sorts the source data records in the ascending order of their Start Dates when there is no row numbering (sequence) specified in the source data. |