Unique
On this Page
Snap type: | Transform | |||||||
---|---|---|---|---|---|---|---|---|
Description: | This Snap eliminates duplicate documents in a document stream, such as duplicate rows in a CSV file. To understand the functionality of the Unique Snap, it is important to understand that SnapLogic is streaming data, processing "one document at a time" in each Snap. The Unique Snap does NOT sort the documents before making that comparison. If the data being passed was in the following order: The result would be exactly the same as the input, as the duplicate rows are not adjacent in the flow and therefore, are not identified.
| |||||||
Prerequisites: | [None] | |||||||
Support and limitations: | Does not work in Ultra Pipelines. | |||||||
Account: | Accounts are not used with this Snap. | |||||||
Views: |
| |||||||
Settings | ||||||||
Label | Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | |||||||
Minimum memory (MB) | If the available memory is less than this property value while processing input documents, the Snap stops to fetch the next input document until more memory is available. This feature is disabled if this property value is 0. Default value: 500 | |||||||
Minimum free disk space (MB) | If the free disk space is less than this property value, the Snap stops processing input documents until more free disc space is available. This feature is disabled if this property value is 0. Default value: 500 | |||||||
Out-of-resource timeout (minutes) | If the Snap pauses longer than this property value while waiting for more memory available, it throws an exception to prevent the system from running out of memory or disk space. Default value: 30 | |||||||
Snap Execution | Select one of the three modes in which the Snap executes. Available options are:
Default Value: Execute only |
Temporary Files
During execution, data processing on Snaplex nodes occurs principally in-memory as streaming and is unencrypted. When larger datasets are processed that exceeds the available compute memory, the Snap writes Pipeline data to local storage as unencrypted to optimize the performance. These temporary files are deleted when the Snap/Pipeline execution completes. You can configure the temporary data's location in the Global properties table of the Snaplex's node properties, which can also help avoid Pipeline errors due to the unavailability of space. For more information, see Temporary Folder in Configuration Options.Example
A simple pipeline for this Snap would include:
File Reader + CSV Parser + Unique + CSV Formatter + File Writer