AutoPrep

Use the AutoPrep Snap to prepare data for analysis, reporting, and machine learning without writing expressions, SQL scripts, or Python code. When you open AutoPrep, it uses introspection on a sample of the input data and calculates the probable data type and valid null handling for each field. The Preview data table shows a sample of the data it will output.

To prepare the data, choose from the following transformations:

  • Flatten leaf nodes of hierarchical data structures

  • Remove fields

  • Rename fields

  • Change the data type of String, Date, Integer, Number, and Boolean fields

  • Handle null values

  • Mask data to protect sensitive information

  • Choose the format for dates, currency, phone numbers, and country codes

  • Split fields based on a delimiter to create new fields

The following screenshot shows the AutoPrep point-and-click interface:

AutoPrep

As you apply transformations, the Preview data pane refreshes. The Review summary tab saves a history of each transformation and provides a way to remove individual transformations. Your changes are not saved until you click Done and exit AutoPrep. After your transformations are saved, if structural changes occur to input data, AutoPrep will warn you about those changes the next time you open it.

This page contains the following information. Refer to Transforming Data with AutoPrep to learn more about using AutoPrep.

Snap Type

The AutoPrep Snap is a Transform-type Snap.

The AutoPrep Snap acts as a data preparation application and does not have the same configuration dialogs as other Snaps in the Transform Snap Pack.

Prerequisites

  • The preceding Snap is a valid connector

  • The preceding Snap outputs data in JSON format

Support for Ultra Pipelines

Limitations

  • AutoPrep does not have views or an error handling tab.

  • AutoPrep can flatten leaf nodes, but cannot flatten objects.

Troubleshooting

This section describes AutoPrep warnings and error messages.

NaN

If you change the data type of a field and some values cannot be transformed to that type, the Preview Data pane displays NaN for those values:

Naan

Warning Icon in the Review Summary

When you click Generate, the AutoPrep Snap validates the data set and retains the transformations you defined. If the structure of the upstream data changes and you reopen AutoPrep, the Review summary warns about those changes.

For example, if a field was removed from the upstream data, AutoPrep displays a warning. In the following example, the distance field was removed from the source input. The Review summary shows the original column with a warning icon and the Preview Data shows the missing field with no data type:

You can remove the field from the Review summary if it was a deliberate deletion.

Error Messages

The following table describes AutoPrep error messages:

Error

Reason

Resolution

Error

Reason

Resolution

Looks like we couldn’t find a connector here. Please add a connector before you use the AutoPrep experience.

An upstream Snap must provide the sample data for AutoPrep.

Add a valid Snap that outputs JSON before the AutoPrep Snap.

We couldn’t find any preview data. Please try running the validation again before we can get to AutoPrepping.

The upstream Snap is not valid.

Make sure the data source is connected and that the upstream Snap is outputting JSON.


Snap Pack History