On this page
The page's title should always be Configuring ABC Snaps where ABC is the Snap's name.
Use this Snap to remove duplicate records from input documents. When you use multiple matching criteria to deduplicate your data, it is evaluated using each criterion separately, and then aggregated to give the final result.
Provide a functional overview of the Snap. Do not mention anything about the Snap's internal technology or techniques. The user should be able to understand what the Snap. Include a screenshot of a well-configured Snap.
List all prerequisites for using the Snap as a bullet list. Use direct sentences. For example, in case of a Write-type Snap a prerequisite would be that the user must have write access. Include links to external official documentation, if required. Use "None." if there no prerequisites.
None.
List all Snap-specific limitations as a bullet list. Limitations can be imposed by the Snap's development environment and also by the endpoint's API. List both. Use direct sentences. Include links to external official documentation, if required.
None.
List, as bullet points, all Snap-level error messages encountered by the user and link each to the corresponding troubleshooting article in the Troubleshooting page. Use "None." if there are no prerequisites.
None.
Type of view: Document/Binary/Both. Get number of views from the Views tab in the Snap. List at least three compatible Snaps in each category. Provide a brief of the input/output required. If the input/output is optional then preface the description with "Optional." For example, "Transaction data complying with the Orderful schema as a JSON document."
Input/Output | Type of View | Number of Views | Compatible Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| A document with data containing duplicate records. |
Output | Document |
|
|
|
Parameter Name | Data Type | Description | Default Value | Example | |
---|---|---|---|---|---|
Label | String | N/A | Deduplicate Office Names | ||
Threshold | Decimal | Required. The minimum confidence required for documents to be considered matched as duplicates using the matching criteria. Minimum Value: 0 Maximum Value: 1 | 0.8 | 0.95 | |
Confidence | Check box | Select this check box to include each match's confidence levels in the output. | Deselected | N/A | |
Group ID | Check box | Select this check box to include the group ID for each record in the output. | Deselected | N/A | |
Matching Criteria | Fieldset | Enables you to specify the settings that you want to use to match input documents with the matching criteria. | N/A | N/A | |
Field | JSONPath | The field in the input dataset that you want to use for matching and identifying duplicates. | N/A | $name | |
Cleaner | String | None | Text | ||
Comparator | String | Levenshtein | Numeric | ||
Low | Decimal | A decimal value representing the level of probability of the input documents to be matched if the specified fields are completely unlike.
| N/A | 0.1 | |
High | Decimal | A decimal value representing the level of probability of the input documents to be matched if the specified fields are a complete match.
| NA | 0.8 | |
Snap Execution | String |
| Validate & Execute | N/A |
In this example, you deduplicate the data in a CSV file containing a list of childhood centers in Chicago.
Edit the Excerpt Include macro below to link to the Snap Pack page for this Snap page. Ensure that the heading Snap Pack History is not within the Snap Pack's history Excerpt.