Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this article

Table of Contents
minLevel1
maxLevel2
absoluteUrltrue
exclude.*\.[1/8]

Overview

The AutoPrep Snap enables you to define simple data transformations without writing expressions. You can use AutoPrep Use the AutoPrep Snap to prepare data for analysis, reporting, and machine learning . AutoPrep supports without writing expressions, SQL scripts, or Python code. When you open AutoPrep, it uses introspection on a sample of the input data and calculates the probable data type and valid null handling for each field. The Preview data table shows a sample of the data it will output.

To prepare the data, choose from the following transformations:

  • Flattening Flatten leaf nodes of hierarchical data structures

  • Removing Remove fields

  • Rename fields

  • Changing Change the data type of String, Date, Integer, Number, and Boolean fields

  • Handling Handle null values

This page describes how to use AutoPrep. AutoPrep Example provides a sample JSON file and walks you through use of AutoPrep.

  • Mask data to protect sensitive information

  • Choose the format for dates, currency, phone numbers, and country codes

  • Split fields based on a delimiter to create new fields

The following screenshot shows the AutoPrep point-and-click interface:

...

As you apply transformations, the Preview data pane refreshes. The Review summary tab saves a history of each transformation and provides a way to remove individual transformations. Your changes are not saved until you click Done and exit AutoPrep. After your transformations are saved, if structural changes occur to input data, AutoPrep will warn you about those changes the next time you open it.

This page contains the following information. Refer to Transforming Data with AutoPrep to learn more about using AutoPrep.

Table of Contents
minLevel2
maxLevel2

Snap Type

The AutoPrep Snap is a Transform-type Snap.

...

Limitations

  • You cannot modify the AutoPrep Snap Label.This Snap does not support have views or an error handling tab.

  • AutoPrep can flatten leaf nodes, but cannot flatten objects.

Using AutoPrep

To use the AutoPrep Snap:

  1. In Designer, create a Pipeline to handle the data you want to transform.
    The Snap that precedes AutoPrep must have a document output, not binary.

  2. Add the AutoPrep Snap.

  3. Click the AutoPrep Snap to open it.
    The left pane has three tabs with controls for preparing data and the right pane displays the data preview in table format:

    AutoPrep interfaceImage Removed

The AutoPrep interface provides the following elements and controls:

  • Manage fields tab:

    • Flatten Structure tree: Search for fields, and select leaf nodes to flatten them to the root level.

    • Select fields control: Remove fields and change data types.

  • Handle Nulls tab: Displays default rules for handling null or empty values. View and modify null handling rules.

  • Review Summary tab: Lists the applied changes. Undo or modify changes.

  • Preview Data pane: Displays a preview of the data set in a table. As you apply changes, the table updates.

The buttons provide the following functionality:

  • Update saves your changes, updates the Preview Data pane, and validates the Pipeline.

  • Generate saves your changes and exits AutoPrep. AutoPrep automatically generates the expressions necessary to accomplish the transformations at runtime.

  • Cancel exits AutoPrep without saving the current changes.

Flattening a Nested Structure

Flatten the hierarchy using controls on the Manage fields tab. Fields at the root level cannot be flattened.

To flatten a field:

  1. In the Flatten fields tree, select the checkbox next to the field or click (blue star) to select all fields. Non-nested field names are grayed out and you cannot select them.

  2. Click Update.

Removing a field

Remove a field using controls on the Manage fields tab:

...

At the bottom of the Manage fields tab, click the Select fields arrow to open the list of fields:

...

  • .

...

...

Click Update. AutoPrep removes the table column for that field from the Preview data pane.

Changing the Data Type

You can change the data type of a field using controls on the Manage fields tab or in the Preview Data pane. You can only change the data type of a nested field from the Manage fields tab.

Info

The types available depend on the original data type. Object or List data types cannot be converted to other data types nor can simple data types be converted to Object or List.

To change a data type from the Manage fields tab:

  1. Click the Select fields arrow to open the list of fields:

    Image Removed
  2. Click the data type decorator:

    Data type decoratorImage Removed
  3. From the menu, select a new data type:

    Data type menuImage Removed
  4. Change the data types for other fields as required and click Update.
    The updated data types display next to the field names in the Preview data pane.

Handling Nulls

AutoPrep suggests how to handle null values and missing key/value pairs. For example, it suggests Empty string for String fields. If you do not change the null handling, at runtime, when a row includes a null, AutoPrep replaces it with an empty string.

The Manage null values display lists the suggested null handling to the right of each field as shown below:

...

AutoPrep uses the suggested values unless you explicitly change them. Roll your cursor over the rule to see the others available for that field. You can choose from the following null handling rules:

  • For a Boolean field: False, True, Custom input, or Ignore

  • For numeric types: Average, Zero, Ignore, or Custom input

  • For a String or a Date field: Empty string, Ignore, Custom input, or Popular

Info

Custom input enables you to specify a value to use for nulls. If you add a value that is not valid for the field type, the null data will output as NaN. Popular causes AutoPrep to calculate and insert the field’s most frequently-used value.

When parsing the sample data, AutoPrep detects existing null values and adds a tooltip:

...

View the null handling rules or modify them as follows:

  1. Select the Handle nulls tab to view the Manage null values table:

    Image Removed
  2. To change the rule, roll your cursor over the pill and select from the available options:

    Image Removed
  3. Click Update.

Reviewing and Undoing Changes

The Review summary tab lists all of your formatting and transformation changes. To review, undo, or edit a change:

  1. Click Review summary. Each modified field displays with the description of the change indented below it:

    Image Removed
  2. Click (blue star) to delete a change or (blue star) to edit it.

Exiting AutoPrep and Previewing Changes

When you are ready to exit AutoPrep, click Generate at the bottom of the screen. If Pipeline validation is not disabled for the Org and Auto Validate is enabled in user settings, you can preview how AutoPrep will prepare the data set at runtime. On the right side of the AutoPrep snap, click (blue star) to display the data preview:

...

Troubleshooting

This section describes AutoPrep warnings and error messages.

NaN

If you change the data type of a field to a type that does not support field dataand some values cannot be transformed to that type, the Preview Data shows as NaN to indicate that the data is not valid for that type:

...

pane displays NaN for those values:

...

Warning Icon in the Review Summary

When you click Generate, the AutoPrep Snap validates the data set and retains the transformations that you defined. If the structure of the upstream data changes and you re-open reopen AutoPrep, the Review summary warns about those changes.

For example, after AutoPrep Snap validation, if a field was removed from the upstream data and you reopen the AutoPrep Snap, the , AutoPrep displays a warning. In the following example, the distance field was removed from the source input. The Review summary shows the original column with a warning icon and the Preview Data shows the column as being of an unknown missing field with no data type:

...

You can remove the field from the Review summary if it was a deliberate deletion.

Error Messages

The following table describes AutoPrep error messages:

Error

Reason

Resolution

Looks like we couldn’t find a connector here. Please add a connector before you use the AutoPrep experience.

An upstream Snap must provide the sample data for AutoPrep.

Add a valid Snap that outputs JSON before the AutoPrep Snap.

We couldn’t find any preview data. Please try running the validation again before we can get to AutoPrepping.

The upstream Snap is not valid.

Make sure the data source is connected and that the upstream Snap is outputting JSON.

...

Snap Pack History

Related Content

...