On this Page

Input	This Snap has at least two document input views.
Output	This Snap has exactly one document output view.
Error	This Snap has at most one document error view.

Troubleshooting

None.

Limitations and Known Issues

None.

Modes

Ultra pipelinesPipelines: Does not work in Ultra pipelinesPipelines.
Spark mode: Does not work in Spark mode.

Snap Settings

Label

Required. The name for the Snap. You can modify this to be specific, especially if you have more than one of the same Snap in your pipeline.

Policy

Specify the base and reference datasets and fields.

Base view

The input view to which the base dataset is connected.

Default value: input0 (name of the input view)

Base path

The field in the base dataset that is to be used as the base field. This has to be the common identifier for joining with the reference dataset.

Example: $customer_id

Default value: [None]

Ref view

View where the reference dataset is the input.

Default value: input0 (name of the input view)

Ref path

The field in the reference dataset that is to be used. This field must have the same values as in the base field.

Example: $customer_id

Default value: [None]

Multiexcerpt macro

name	Snap Execution

Snap Execution

Multiexcerpt macro

name	Snap_Execution_Introduced

Specifies the execution type:

Validate & Execute: Performs limited execution of the Snap (up to 50 records) during Pipeline validation; performs full execution of the Snap (unlimited records) during Pipeline execution.
Execute only: Performs full execution of the Snap during Pipeline execution; does not execute the Snap during Pipeline validation.
Disabled: Disables the Snap and, by extension, its downstream Snaps.

Default value: Validate & Execute

...

Expand

title	Understanding the Pipeline

The base dataset in this example is a collection of customer records. It has the following fields:

$customer_id
$firstname
$lastname
$create_time

The reference dataset is a collection of transactions made by the customers listed in the base dataset. It has the following fields:

$transaction_id
$customer_id
$source
$num_item
$total

Both datasets are provided by the CSV Generator Snaps titled Customers and Transactions. These are passed through a Type Converter Snaps so that all data types are mapped correctly. This is required to enable the Feature Synthesis Snap to generate features accurately.

A preview of the customer and transaction datasets that are output by the CSV Generator Snaps is as shown below:

The field $customer_id is common between both datasets. The Feature Synthesis Snap will use this field to join datasets and is configured as shown below:

The customer dataset is connected to the first input view (titled customer); so this view becomes the Base view. Similarly, the transaction dataset is connected to the second input view (titled transaction); so that becomes the Ref view.

Upon successful execution, the Feature Synthesis Snap generates features and adds them to the base dataset as shown below:

The same output is shown in a JSON format to let you see the full list of features:

Download this Pipeline.

Downloads

Multiexcerpt include macro

name	download_instructions
page	OpenAPI

Attachments

patterns	.slp,.zip

Additional Resources

...

Glossary

...

Insert excerpt

	ML Data Preparation Snap Pack
	ML Data Preparation Snap Pack
nopanel	true

Versions Compared

Old Version 11

New Version Current

Key

Troubleshooting

Limitations and Known Issues

Modes

Snap Settings

Downloads

Additional Resources

Page Comparison

Versions Compared

Old Version 11

New Version Current

Key

Troubleshooting

Limitations and Known Issues

Modes

Snap Settings

Downloads

Additional Resources