In this article
Use this Snap to remove duplicate records from input documents. When you use multiple matching criteria to deduplicate your data, it is evaluated using each criterion separately, and then aggregated to give the final result.
None.
None.
The Deduplicate Snap fails to deduplicate data when the input document contains an empty string, white spaces, or null values in a field.
Does not support Ultra Pipelines.
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| A document with data containing duplicate records. |
Output | Document |
|
|
|
Parameter Name | Data Type | Description | Default Value | Example | |
---|---|---|---|---|---|
Label | String | N/A | Deduplicate Office Names | ||
Threshold | Decimal | Required. The minimum confidence required for documents to be considered matched as duplicates using the matching criteria. Minimum Value: 0 Maximum Value: 1 | 0.8 | 0.95 | |
Confidence | Checkbox | Select this check box to include each match's confidence levels in the output. | Deselected | N/A | |
Group ID | Checkbox | Select this check box to include the group ID for each record in the output. | Deselected | N/A | |
Matching Criteria | Fieldset | Enables you to specify the settings that you want to use to match input documents with the matching criteria. | N/A | N/A | |
Field | JSONPath | The field in the input dataset that you want to use for matching and identifying duplicates. | N/A | $name | |
Cleaner | String | None | Text | ||
Comparator | String | Levenshtein | Numeric | ||
Low | Decimal | A decimal value representing the level of probability of the input documents to be matched if the specified fields are completely unlike.
| N/A | 0.1 | |
High | Decimal | A decimal value representing the level of probability of the input documents to be matched if the specified fields are a complete match.
| NA | 0.8 | |
Snap Execution | String |
| Validate & Execute | N/A |
In this example, you deduplicate the data in a CSV file containing a list of childhood centers in Chicago.