Transforming Data with AutoPrep
Use the AutoPrep Snap to prepare data for analysis, reporting, and machine learning without writing expressions, SQL scripts, or Python code. To use AutoPrep:
In Designer, create a Pipeline to handle the data you want to transform.
The Snap that precedes AutoPrep must have a document output, not binary.Add the AutoPrep Snap.
Click the AutoPrep Snap to open it.
The left pane has three tabs with controls for preparing data and the right pane displays the data preview in table format:
The AutoPrep interface provides the following elements and controls:
AutoPrep: Click the name to rename the AutoPrep Snap.
Manage fields tab:
Flatten Structure tree: Search for fields, and select leaf nodes to flatten them to the root level.
Select fields control: Remove fields and change data types.
Handle nulls tab: Displays default rules for handling null or empty values. View and modify null handling rules.
Review summary tab: Lists the applied changes. Undo or modify changes.
Preview data pane: Displays a preview of the data set in a table. Use the toggle to expand the pane or to return to the original size.
As you apply changes, the pane updates, but the transformations are not saved until you click Done.
The buttons provide the following functionality:
Update saves your changes, updates the Preview Data pane, and validates the Pipeline.
Done saves your changes and exits AutoPrep. AutoPrep automatically generates the expressions necessary to accomplish the transformations at runtime.
Cancel exits AutoPrep without saving the current changes.
The Preview data pane has a control to expand to full size or return to the default. In the table, column headings include:
The data type that AutoPrep calculated from the majority of values for that field.
A menu with options. To open it, hover over the right side of the column header. The options are based on the field type and can include:
Changing the data type
Splitting the field into multiple fields, based on a delimiter
Choosing the format for dates, currency, phone numbers, and country codes
Enabling data masking
Renaming the field (column names are case-sensitive)
Watch the following video to see how to transform data with the AutoPrep Snap:
The following sections describe the steps for:
- 1 Flattening a Nested Structure
- 2 Removing a Field
- 3 Changing a Field Name
- 4 Changing the Data Type
- 5 Handling Nulls
- 6 Using Data Masking
- 7 Changing Formats
- 7.1 Country codes
- 7.2 Currency
- 7.3 Dates
- 7.4 Phone Numbers
- 8 Splitting a Field into Multiple Columns
- 9 Reviewing and Undoing Changes
- 10 Changing the AutoPrep Snap Name
- 11 Exiting AutoPrep and Previewing Changes
Flattening a Nested Structure
Flatten the hierarchy using controls on the Manage fields tab. Fields at the root level cannot be flattened.
To flatten a field:
In the Flatten fields tree, select the checkbox next to the field or click to select all fields. Non-nested field names are grayed out and you cannot select them.
Click Update.
Removing a Field
Remove a field using controls on the Manage fields tab:
At the bottom of the Manage fields tab, click the Select fields arrow to open the list of fields:
Click to remove a field. The pill changes from shaded to white:
Click Update.
AutoPrep removes the table column for that field from the Preview data pane.
Changing a Field Name
Change the name of a field in the Preview data pane. Valid field names can include alphabetic and numeric characters, underscores, spaces, and the @ character. Field names are case-sensitive. For example, you can have a field named Age
and one named age
. However, AutoPrep will inform you if you rename a field using a name that exists, but has a different casing.
In the field header, hover over the right side of the column header and click the three dots to open the options menu:
Select Rename.
Enter the new name.
Click Apply.
The renamed column displays as the last column to the right in the Preview data pane.
Changing the Data Type
Change the data type of a field in the Preview data pane or from the Manage fields tab. You can only change the data type of a nested field from the Manage fields tab. When you change a field’s data type, AutpPrep parses the records in the preview data and transforms values to the selected type.
Change the Data Type from the Preview data pane
To change a data type from the Preview data pane:
In the field header, hover over the right side of the column header and click the three dots to open the options menu:
Click Data type.
Select from the available types.
AutoPrep transforms the values in the preview to the new type. Values that cannot be transformed to the selected type display as NaN.
Change the Data Type from the Manage fields tab
To change a data type from the Manage fields tab:
Click the Select fields arrow to open the list of fields:
Click the data type decorator:
From the menu, select a new data type:
Change the data types for other fields as required and click Update.
The updated data types display next to the field names in the Preview data pane. Values that cannot be transformed to the selected type display as NaaN.
Handling Nulls
When you open AutoPrep, it parses the input data and suggests how to handle null values and missing key/value pairs. For example, it suggests Empty string for String
fields. If you do not change the null handling from Empty string, at runtime, when a row includes a null, AutoPrep replaces it with an empty string.
The Manage null values display lists the suggested null handling to the right of each field as shown below:
Fields where AutoPrep found null values have an informational tooltip:
AutoPrep uses the suggested null handling unless you explicitly change it. Roll your cursor over the value to see the others available for that field. You can choose from the following:
For a Boolean field: False, True, Custom input, or Ignore
For numeric types: Average, Zero, Ignore, or Custom input
For a String or a Date field: Empty string, Ignore, Custom input, or Popular
Custom input enables you to specify a value to use for nulls. If you add a value that is not valid for the field type, the null data will output as NaN
. Popular causes AutoPrep to calculate and insert the field’s most frequently-used value.
Modifying Null Handling
Modify null handling as follows:
Select the Handle nulls tab to view the Manage null values table:
To change the rule, hover over the pill and choose from the available options:
Click Update.
Using Data Masking
Data masking provides a way to protect sensitive data. AutoPrep supports the following algorithms to hash values so they are no longer human-readable:
When you mask a field, AutoPrep uses the selected algorithm to transform each value into a masked string.
Apply data masking to a field from the Preview data pane:
In the field header, hover over the right side of the column and click the three dots to open the options menu:
Select Mask.
Select the algorithm.
AutoPrep applies the algorithm to the values in the field and if the field was not a string, changes its type to a string.
Changing Formats
You can change the format of fields that contain country codes, currency, dates, and phone numbers.
Country codes
You can format country codes with a two- or three-letter abbreviation or use the full country name. Format country codes from the Preview data pane:
In the header of a string field that contains country codes, hover over the right side and click the three dots to open the options menu.
Click Format.
Select a country code format from the list. The right side displays an example in each format:
Currency
Format currency from the Preview data pane:
In the header of a field that contains currency values, hover over the right side and click the three dots to open the options menu.
Click Format.
Select whether to use a dot or a comma:
Dates
A common issue in data that has accumulated over time is inconsistent formatting of dates. For fields that AutoPrep detects as Dates or Date Strings, you can choose from various date formats. When you apply a date format, AutoPrep formats existing values as strings and changes the field type to string.
Apply a date format from the Preview data pane:
In the header of a date field, hover over the right side and click the three dots to open the options menu:
Click Format.
Select a format from the list:
AutoPrep parses the values and changes the field type to string.
Phone Numbers
You can format fields containing phone numbers to use national, international, E164, or RFC3966 formats. Format phone numbers from the Preview data pane:
In the header of a field that contains phone numbers, hover over the right side and click the three dots to open the options menu.
Click Format.
Select Phone number.
Select a format. The right side of the pane shows examples of each option:
Splitting a Field into Multiple Columns
You can split a string field into multiple columns based on the following delimiters:
Space
Comma
Tab
Colon
Custom
When splitting a field, you specify the number of columns for the split values. AutoPrep creates the specified number of new columns, and adds _N to the field name. For example, if you specify 3 columns for a field named groceries
, AutoPrep creates groceries_0
, groceries_1
, groceries_2
. Any values left over after AutoPrep splits a row are discarded.
To split a field from the Preview data pane:
In the field header, hover over the right side of the column and click the three dots to open the options menu:
Select Split.
To choose a delimiter other than the default, click the arrow to open the list and select one:
Enter the number of columns for the split values:
Click Apply.
Reviewing and Undoing Changes
The Review summary tab lists all of your formatting and transformation changes and AutoPrep null handling suggestions. To review, undo, or edit a change:
Click Review summary. Each modified field displays with the description of modifications indented below it:
Click the trash can to delete a change or click the pen to edit it.
Changing the AutoPrep Snap Name
You can use multiple AutoPrep Snaps in the same Pipeline. To distinguish them, change the default name:
Open AutoPrep.
Click the name and enter a new one:
Exit the field.
Exiting AutoPrep and Previewing Changes
When you are ready to exit AutoPrep, click Done. If Pipeline validation is not disabled for the Org and Auto Validate is enabled in user settings, you can preview how AutoPrep will prepare the data set at runtime. On the right side of the AutoPrep Snap, click to display the data preview:
Related Content
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2024 SnapLogic, Inc.