...
Deduplication is a process that eliminates duplicate records and decreases storage capacity requirements, thereby increasing increases process efficiency.
The deduplication tool costs $28k-$33k per year, which you can avoid using the Marketo Snap Pack.
This use case demonstrates how to deduplicate records using the Marketo Snaps.
...
Using the Marketo Snap Pack, you can automate the deduplication of records by merging the lead source data with the same email ID. The example pipelines below demonstrate how this can be done. The deduplication of the records can be done using a Smart List, Static List, and createdAt and updatedAt fieldsfilters.
Method | Pipeline |
---|---|
Smart list | Download this pipeline. |
Static list | Download this pipeline. |
Fetching records using createdAt and updatedAt fields. | Download this pipeline. |
Download this pipeline. | |
Download this pipeline. |
Understanding the Solution
...
Deduplication of records using the Smart List is a straightforward approach, where you must create the Smart List in the Marketo account and import it into the pipeline. Next, you must configure the Smart List filter to fetch all the required records and provide the Smart List name/ID in the pipelines.
...
Step 1: Create a Smart List in the Marketo account. Navigate to the Database section and right-click on any folders and click New Smart List.
...
Step 5: Configure the name of the Smart List in the pipeline parameters.
...
Step 6: Configure Make use of the Marketo Bulk Extract Snap with the Leads entity to fetch the records.
...
Step 7: Validate Execute the pipeline. On validatingexecuting:
The pipeline identifies the records with the Person Source field data as Master Records parent records and the duplicate records without the Person Source data as Slave Recordschild records. The Slave Records child records are then merged into the Master Recordsparent records.
If none of the records contain Person Source data, then the merge can occur either way.
If the Person Source data is present in all the records, then the one with the latest data is merged into the older record.
...
Step 4: Configure the pipeline parameters with the static list name.
...
Step 5: Configure Make use of the Marketo Bulk Extract Snap with the Leads entity to fetch the records.
...
Step 5: Validate Execute the pipeline. On validatingexecuting:
The pipeline identifies the records with the Person Source data as Master Records parent records and the duplicate records without the Person Source data as Slave Recordschild records. The Slave Records child records are then merged into the Master Recordsparent records.
If none of the records has Person Source data, then the merge can occur either way.
If the Person Source data is present in all the records, then the one with the latest data is merged into the older record.
...
The main or first pipeline takes the start date and the end date of the period for which the deduplication must be done. The first pipeline executes the child pipeline (2nd pipeline) to segregate the number of days into batches of 31 days (Marketo has a limitation to fetch records for 31 days using Bulk Extract Snap).
...
Step 1: Configure the start date and end date for which the deduplication of records has to be done. The difference between the two dates is in milliseconds.
...
Step 2: Configure the Mapper Snap for evaluating the number of days from milliseconds.
...
Step 3: Evaluate the number of runs to fetch records for a period of 31 days.
...
Provide the start date and the end date in the pipeline parameters.
...
Second Pipeline
The second pipeline executes another child pipeline (3rd pipeline) that fetches the records in batches of 31 daysfor the specified date range. This pipeline identifies the duplicate records by email ID and merges the Person Source field data to form a unique record for an email Id.
If the duplicate records do not contain the Person Source data, the records are merged seamlessly.
If the first record has Person Source data and the duplicate records do not, the latter is merged into the first.
If all the duplicates have the Person Source data, the latest records are merged into the older record.
...
...
Third Pipeline
...
The third pipeline bulk extracts the data for the specified period for the Leads entity.
...
The second pipeline matches the email Id and identifies all the duplicates.
...
The pipeline identifies the records with the Person Source data as Master Records and the duplicate records without the Person Source data as Slave Records. The Slave Records are then merged into the Master Records.
If none of the records has Person Source data, then the merge can occur either way.
If the Person Source data is present in all the records, then the one with the latest data is merged into the older record.
...