AutoPrep Example

Use the AutoPrep Snap to prepare data for analysis, reporting, and machine learning without writing expressions, SQL scripts, or Python code. AutoPrep supports common data transformation operations. In contrast with the Mapper Snap, you do not need to understand or write expressions to use these transformations. Try this example to learn how to flatten a data hierarchy, remove fields, specify custom handling of nulls, and mask data.

The main steps in this example include:

Create a Pipeline

First, create a new Pipeline:

  1. Click the Designer tab.

  2. On the top left of the canvas, click to create a new Pipeline.

  3. For Label, enter AutoPrep Example.

  4. Click Save.

Upload Data

Add the upstream data:

  1. Download Example.json and save it locally. The following shows an element from the file, which tracks the addresses of customers that received a service call, how they rated the service, and when they were last seen. The location object contains nested fields:

    { "id" : "FmGF1B-Rpsjq1f5b56qMwg", "is_closed" : false, "rating" : 4.5, "location" : { "last_seen" : "03/05/2022", "address1" : "373 Columbus Ave", "address2" : "", "address3" : "", "city" : "San Francisco", "zip_code" : "94133", "country" : "US", "state" : "CA" }, "distance" : 1465.2460213942109, "review_count" : 488 }
  2. In Designer, from the left side of the palette, select Snaps.

  3. Search for the File Reader Snap and drag it onto the canvas.

  4. Click the Snap to open it.

  5. To the right of the File field, clickto choose a file.

  6. Click Upload File and select the Example.json file.

  7. Click Save and close the File Reader Snap.

  8. Drag a JSON Parser Snap and connect it to the File Reader Snap.

  9. In the Designer toolbar, click to validate.
    The JSON Parser Snap is ready to pass the data as a JSON document:

    AutoPrep example

Use AutoPrep 

Add the AutoPrep Snap:

  1. From the left side of the palette, select Snaps.

  2. Search for the AutoPrep Snap.

  3. Drag the AutoPrep Snap to connect with the right side of the JSON Parser Snap.

  4. Click the AutoPrep Snap.
    A welcome dialog opens.

  5. Click the button to continue.
    AutoPrep opens:

    AutoPrep interface

    The Flatten structure tree shows that the location object is a root node with leaf nodes that you can flatten.

Flatten the Data Structure

Flatten the location object:

  1. In the Flatten Structure tree, select location.

  2. Click Update.

  3. Scroll the Preview Data pane or maximize it to view the fields to the right of the location field:

Note that:

  • The location object is now empty and the location column contains empty braces.

  • The leaf nodes are now at the root level and have location_ prepended to their names.

  • The location_address2 and location_address3 columns have empty and null values. To make the data easier to read, you can replace the null values with a string that is easier for a business user to understand.

Manage Null Values

It might be good to indicate which empty address lines are there intentionally. To do this, change the null handling for the two address fields to insert the string n/a instead of null:

  1. In the left pane, click Handle nulls to open the Manage null values table:

  2. For the location_address2 field, roll your cursor over the Empty string pill and click Custom input.

  3. In the Input custom value field, enter n/a.

  4. Do the same for the location_address3 field.

  5. Click Update to apply your entries.

  6. Click Review summary to verify the changes. Descriptions of the changes are indented below the field:

When you are preparing real data:

  • Click the trash icon to undo a change.

  • Click the pen icon to go back to the Manage fields tab.

Remove a Field

To clean up the data, remove the now unnecessary location field:

  1. At the bottom of the Manage fields tab, click the Select fields arrow to expand the column list.

  2. Click the  in the pill for location. The pill color changes from shaded to white.

  3. Click Update.

In the Preview Data pane, scroll to the right. Verify that the location field was deleted and all null values in the two address fields were replaced with n/a. Note that AutoPrep only replaces literal nulls and missing key/value pairs, not empty strings:

Mask Data

The example data contains customer IDs that you might not want to expose to all who will use the prepared data. Follow these steps to mask the data:

  1. In the field header, hover over the right side and click the three dots to open the options menu:

  2. Select Mask.

  3. Select the MD5 algorithm and note how AutoPrep changes the values in the id column.

  4. Click Done to save the transformations.

Change the Date Format

The location_last_seen column contains a Date String in MM/dd/yyyy. To be more precise about the time, choose the MM/dd/yyyy option that includes hours and minutes:

  1. Click the AutoPrep Snap to reopen it.

  2. In the location_last_seen field header, hover over the right side, click the three dots to open the options menu, and select Format:

     

  3. Select MM/dd/yyyy HH:mm:

  4. Click Apply.
    AutoPrep changes the format:

View the Results

If Pipeline validation is not disabled for the Org and Auto Validate is enabled in user settings, you can preview how AutoPrep will prepare the data set at runtime. To finish this exercise:

  1. On the right side of the AutoPrep snap, click to display the data preview.

  2. For Preview Type, select JSON.

  3. Click Expand All.

  4. Scroll to the record with the address of 373 Columbus Avenue.

  5. Note how AutoPrep transformations have changed the data from the original Example.json file.
    The screenshot on the left shows the record in the prepared preview compared with the original data on the right:

For this record, in the prepared example:

  • The id field contains a masked value.

  • The location field was flattened and then removed.

  • The location_address2 and location_address3 fields have empty string values, so AutoPrep did not apply the custom null handling.

  • The location_last_seen field includes hours and minutes.

Congratulations! You’ve transformed data without writing expressions!