Diff

In this article

Overview

You can use this Snap to compare two sorted streams of documents (Original and New) and return four output streams: Deletions, Insertions, Modified, and Unmodified.

Ensure the first input is always New and the second input as Original in order for the Snap to display the output (insertions/deletions/modifications) in reference to the original document.

This Snap does not handle arrays. You need to flatten an array into documents using the JSON Splitter, then sort the data before sending it to the Diff Snap.

Snap Type

Diff Snap is a TRANSFORM-type Snap that compares two sorted streams of documents.

Prerequisites

None.

Support for Ultra Pipelines

Limitations

When using the new UI form to create new or edit existing Pipelines with the Diff Snap, Output view name values go missing in Snap Settings after switching to the Views or Info tab. The workaround is to switch back to the old UI form (disable New UI Form in User Settings) and then create the new Pipelines or edit existing Pipelines.

Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

 

  • Min: 2

  • Max: 2

  • Sort

  • CSV Generator

  • Union

  • Exit

  • Router

This Snap has exactly two document input views, New and Original.

Ensure the first input is always New and the second input as Original in order for the Snap to display the output (insertions/deletions/modifications) in reference to the original document.

Output

Document

 

  • Min: 4

  • Max: 4

  • Mapper

  • Filter

  • Union

This Snap has four document output views:

  • Deletions - Contains documents that exist in the Original view but not the New view.

  • Insertions - Contains documents that exist in the New view but not the Original view.

  • Modified - Contains documents that exist in both views, but are different in some property.

  • Unmodified - Contains documents that are the same in both input views.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk (*): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon (): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon (): Indicates that you can add fields in the fieldset.

  • Remove icon (): Indicates that you can remove fields from the fieldset.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label*

Default Value: Diff
Example: Diff_new

String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

 

Sort paths*

Default value: None
Example: $person.firstname

String

Specify the list of paths to sort on. For example, to sort a list of person objects by the field 'firstname', then $person.firstname should be used.

Sort order*

Default value: Ascending
Example: Descending

 

Dropdown list

Ordering of the sort of the data coming in. Allowed values are ascending or descending.

Output view mapping*

String

Allows you to map output view name to output view type if you removed and then re-added output views.

Snap Execution

Default Value: Validate & Execute
Example: Disabled

Dropdown list

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

 

Examples

A Diff Made between Two Files Shows What Data Goes to Which Output View

In this example, a diff is made between two files, showing what data goes to which output view.

 

Consider that the original file contains only a few records (as demonstrated being created in a CSV Generator Snap, but can be read from a File Reader with a CSV Parser).

By comparison, the new file has over 1000 records (again, demonstrated being created in a CSV Generator Snap).

Because the comparison can only be done by sorted stream, each file will need to be sorted by the same column, such as $Last, with the same sort order, so add a Sort Snap after each file input (CSV Generators in this example).

Next, configure the Diff Snap with the Sort path and Sort order. 

This results in deleted lines routing to the Deletions view, new lines to the Insertions view, modified lines to the Modified view, and unchanged lines to the Unmodified view

Reading an Original CSV File and a Modified Version

This example pipeline reads two files, an original CSV and a modified version. The modified version includes changes to existing rows, deleted rows and added rows. It creates four different output files based on the diff conditions.

 

Downloads

  File Modified

File Diff Snap Example_2.slp

Apr 06, 2017 by Diane Miller

ZIP Archive Diff_sample_files.zip

Apr 06, 2017 by Diane Miller

Snap Pack History

Â