Feature Synthesis
On this Page
Overview
The Feature Synthesis Snap generates new features for a base or primary dataset by joining it with other datasets (reference datasets) linked together by common identifiers. Features are measurements of data points in a dataset. Using this Snap enables you to view statistical data of all related datasets in the base dataset itself. Categorical and numeric data will get different set of features. Some of the features generated are:
- Mean (Numeric data)
- Min (Numeric data)
- Max (Numeric data)
- Mode (Categorical data)
- Unique (Categorical data)
- Count (Generated at the base dataset level)
Input and Output
Expected input
- First input: The base dataset.
- Subsequent input(s): The reference dataset(s).
Expected output: The base dataset containing all the features generated based on the reference datasets.
Expected upstream Snaps: Snaps that offer a document output. For example, MySQL - Select, or PostgreSQL - Select.
Expected downstream Snaps: A Snap that accepts documents. For example, Mapper, JSON Formatter, or AutoML.
Prerequisites
The base dataset must have one-to-many or one-to-one relationship with the reference dataset; and the reference datasets must have one-to-one or one-to-many relationship with each other.
Configuring Accounts
Accounts are not used with this Snap.
Configuring Views
Input | This Snap has at least two document input views. |
---|---|
Output | This Snap has exactly one document output view. |
Error | This Snap has at most one document error view. |
Troubleshooting
None.
Limitations and Known Issues
None.
Modes
- Ultra Pipelines: Does not work in Ultra Pipelines.
Snap Settings
Label | Required. The name for the Snap. You can modify this to be specific, especially if you have more than one of the same Snap in your pipeline. |
---|---|
Policy | Specify the base and reference datasets and fields. |
Base view | The input view to which the base dataset is connected. Default value: input0 (name of the input view) |
Base path | The field in the base dataset that is to be used as the base field. This has to be the common identifier for joining with the reference dataset. Example: $customer_id Default value: [None] |
Ref view | View where the reference dataset is the input. Default value: input0 (name of the input view) |
Ref path | The field in the reference dataset that is to be used. This field must have the same values as in the base field. Example: $customer_id Default value: [None] |
Snap Execution | Specifies the execution type:
Default value: Validate & Execute |
Examples
Getting Customer Insights Based on Transaction Data
This example shows how the Feature Synthesis Snap is used to generate features based on transaction dataset and added to customer dataset.
Download this Pipeline.
Downloads
Important steps to successfully reuse Pipelines
- Download and import the pipeline into the SnapLogic application.
- Configure Snap accounts as applicable.
- Provide pipeline parameters as applicable.
Snap Pack History
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2024 SnapLogic, Inc.