On this Page

Overview

The AutoML Snap automates the process of exploring and tuning machine learning models for a given dataset within the resource limit. A machine learning model is a mathematical representation of a real-world process that can be used to predict or solve a specific problem. For example, predict whether the customers are going to churn, predict whether the loan will be fully paid, or, forecast sales. To generate a machine learning model, you must provide training data to a machine learning algorithm to learn from.

Currently, the AutoML Snap supports binary classification, multiclass classification, and regression. For each type of problem, the Snap provides a different set of metrics and reports.

Input and Output

Expected input

First input view: A document stream with classification or regression dataset.
Second input view: A document that contains a model built by an AutoML Snap from a previous execution.

Expected output

First output view: A serialization of a machine learning model, and metadata that are not human-readable. Additionally, the output includes a human-readable representation of the model if you select the Readable checkbox.
Second output view: A document that contains the leaderboard. All the models built by this Snap display in the order of ranking along with metrics indicating the performance of the model.
Third output view: A document that contains an interactive report of up to top 10 models.

Expected upstream Snaps

First input view: A Snap that generates a classification or regression dataset. For example, CSV Generator, Mapper, or a combination of File Reader and JSON Parser.
Second input view: A Snap that offers documents that provide a model built by an AutoML Snap. For example, a combination of File Reader and JSON Parser.

Expected downstream Snaps

First output view: A Snap that formats and saves the model. For example, a combination of JSON Formatter and File Writer.
Second output view: A Snap that accepts documents. For example, Aggregate, Mapper, or CSV Formatter.
Third output view: A Snap that formats and saves the report. For example, a combination of Binary to Document and File Writer.

Prerequisites

None.

Configuring Accounts

Accounts are not used with this Snap.

Configuring Views

Input	This Snap has at most two document input views.
Output	This Snap has at most three document output views.
Error	This Snap has at most one document error view.

Troubleshooting

None.

Limitations and Known Issues

The AutoML generates models with statistical information. If the statistic value of a model for RMSLE is NaN, then the Snap skips that value in charts that are generated in reports.

Modes

Ultra Pipelines: Does not work in Ultra Pipelines.

Snap Settings

Label	Required. The name for the Snap. Modify this to be more specific, especially if there are more than one of the same Snap in the pipeline.
Label field	Required. The label/class/target field in the dataset. This is the field you will train the model to predict. Default value: No default value Example: $class
Time limit	Required. The maximum number of seconds for (OR up to) which the AutoML Snap can be run. If you set the Time Limit to 0, the Snap takes as much time as required to build N models, where N is the number you specify in the Number of models property. The Snap throws an error if it cannot build at least one model within the specified time limit. If you validate the AutoML Snap, the Time Limit is set to 60 seconds, and the number of models is set to 5. If the value you enter is less than 60 or 5, the Snap uses the value that you specify. However, in execution mode, the Pipeline runs until it reaches the value set here. Default value: 3600
Number of models	The number of models that must be included in AutoML. If you set the value to 0, the Snap builds as many models as possible within the time specified in the Time limit property. If you validate the AutoML Snap, the Time Limit is set to 60 seconds, and the number of models is set to 5. If the value you enter is less than 60 or 5, the Snap uses the value you specify. However, in execution mode, the Pipeline runs until it reaches the value set here. For Weka, the Snap builds multiple models in parallel; so the number of output models may not be exactly as specified. Default value: 10
Fold	Required. The number of folds in k-fold cross validation. Minimum value: 2 Maximum value: 10 Default value: 5
Engine	Required. The engine to be used. Select from the following options: Weka H2O Default value: H20
Algorithms	Group of algorithms to be used to derive the best model. The Snap supports the following algorithms: Standard Tree XGBoost NN Default value: Standard, Tree, XGBoost, NN For Weka engine, the algorithms that the Snap supports under each group are: Standard: Logistic, Linear Regression, Naive Bayes, Naive Bayes Multinomial, Simple Linear Regression, SMO, SGD Tree: J48, Decision Stump, Random Forest, Random Tree, REP Tree, M5P, LMT XGBoost: Not available NN: Multilayer Perceptron For H2O engine, the algorithms that the Snap supports under each group are: Standard: Generalized Linear Modeling, Gradient Boosting Machine Tree: Distributed Random Forest XGBoost: XGBoost NN: Deep Learning
Readable	Select this to output the model in a human-readable format. When selected, a $readable field is added to the output, which displays the model in a readable format. Default value: Not selected
Use random seed	If selected, Random seed is applied to the randomizer in order to get reproducible results Default value: Selected
Random seed	Required. A number used as the static seed for the randomizer. Default value: 12345
Report title	Title for the report.
Page lookup error: page "Anaplan Read" not found. If you're experiencing issues please see our Troubleshooting Guide.	Page lookup error: page "Anaplan Read" not found. If you're experiencing issues please see our Troubleshooting Guide.

Temporary Files

During execution, data processing on Snaplex nodes occurs principally in-memory as streaming and is unencrypted. When larger datasets are processed that exceeds the available compute memory, the Snap writes Pipeline data to local storage as unencrypted to optimize the performance. These temporary files are deleted when the Snap/Pipeline execution completes. You can configure the temporary data's location in the Global properties table of the Snaplex's node properties, which can also help avoid Pipeline errors due to the unavailability of space. For more information, see Temporary Folder in Configuration Options.

Example

This Pipeline demonstrates how the AutoML Snap helps you train models with the H2O engine.

Download this pipeline.

Understanding the pipeline

In this example, the input dataset contains a list of flowers and the length and width of their sepals and petals. The input document from the File Reader Snap is passed through the CSV Parser and Type Converter Snap. The Type Converter Snap is configured to automatically detect and convert the data types. The output preview of the Type Converter Snap is as follows:

This dataset is passed to the AutoML Snap, which is configured as follows:

In the AutoML Snap, we specify the Label field as $class, which is the flower name. We set the Time limit to 3600 seconds and we set the Number of models to 10. The Snap will try at most 10 models within the 3600 seconds time limit using the H2O engine. We select Standard, Tree, XGBoost, and NN as the set of algorithms to be used to derive the best model.

The Snap is configured for three output views. The first output view displays the model, the second output view displays the leaderboard, and the third output view displays the report. The leaderboard and report contain statistics of the models built during the process.

The first output preview of the AutoML Snap displays the summary of the model, the metadata, and the model in a serialized format. This output can then be passed to a Predictor (Classification) Snap to derive predictions or you can use this model as an Ultra Task to provide REST API to an external application.

The second output preview of the AutoML Snap displays the Leaderboard. The first row with rank 1 is the best model.

The model output of the AutoML Snap is converted to JSON using the JSON Formatter Snap and then passed to a File Writer Snap. You can pass this model as the second input view of the same AutoML Snap in subsequent executions of the Pipeline. This guarantees that you get a model that is as good as the best one from the previous execution.

The leaderboard output of the AutoML Snap is converted to CSV using the CSV Formatter Snap and then passed to a File Writer Snap.

The third output preview of the AutoML Snap displays the report. The report is passed to a Document to Binary Snap and then to a File Writer Snap where you can save and download the report in HTML format. The preview of the report in HTML from the File Writer Snap is as follows:

Click New tab to view the report in a new tab where you have more space to see the details.

Snap Pack History

Click to view/expand

Release	Snap Pack Version	Date	Type	Updates
November 2024	main29029	13 Nov 2024	Stable	Updated and certified against the current SnapLogic Platform release.
August 2024	main27765	21 Aug 2024	Stable	Upgraded the `org.json.json` library from v20090211 to v20240303, which is fully backward compatible.
May 2024	main26341	08 May 2024	Stable	Updated and certified against the current SnapLogic Platform release.
February 2024	main25112	14 Feb 2024	Stable	Updated and certified against the current SnapLogic Platform release.
November 2023	main23721	Nov 8, 2023	Stable	Updated and certified against the current SnapLogic Platform release.
August 2023	main22460	Aug 16, 2023	Stable	Updated and certified against the current SnapLogic Platform release.
May 2023	433patches21854	14 Jul 2023	Latest	Fixed an issue with the Cross Validator (Classification) Snap where the native Windows DLL caused the Snaplex to stall
May 2023	433patches21644	28 Jun 2023	Latest	Improved an error message in the Remote Python Script Snap to explain the reason and resolution for the case where a Python script has errors.
May 2023	main21015	10 May 2023	Stable	Upgraded with the latest SnapLogic Platform release.
February 2023	main19844	09 Feb 2023	Stable	Upgraded with the latest SnapLogic Platform release.
November 2022	main18944	10 Nov 2022	Stable	Upgraded with the latest SnapLogic Platform release.
August 2022	main17386	11 Aug 2022	Stable	Upgraded with the latest SnapLogic Platform release.
4.29	429patches16809	20 Jul 2022	Latest	Removed the log4j dependency from the ML Core Snaps due to security vulnerabilities.
4.29	main15993	14 May 2022	Stable	Upgraded with the latest SnapLogic Platform release.
4.28	main14627	20 Jul 2022	Stable	Upgraded with the latest SnapLogic Platform release.
4.27	427patches13948	07 Jan 2022	Latest	Fixed an issue with the following Snaps, where a deadlock occurred when data is loaded from both the input views. Predictor Classification Predictor Regression Clustering
4.27	main12833	13 Nov 2021	Stable	Upgraded with the latest SnapLogic Platform release.
4.26	main11181	14 Aug 2021	Stable	Upgraded with the latest SnapLogic Platform release.
4.25	main9554	08 May 2021	Stable	Upgraded with the latest SnapLogic Platform release.
4.24	main8556	13 Feb 2021	Stable	Upgraded with the latest SnapLogic Platform release.
4.23	main7430	14 Nov 2020	Stable	Upgraded with the latest SnapLogic Platform release.
4.22	main6403	12 Sep 2020	Stable	Upgraded with the latest SnapLogic Platform release.
4.21	snapsmrc542	09 May 2020	Stable	Upgraded with the latest SnapLogic Platform release.
4.20 Patch	mlcore8770	18 Mar 2020	Stable	Adds the log4j dependency to the ML Core Snaps to resolve the "`Could not initialize class org.apache.log4j.LogManage`r" error.
4.20	snapsmrc535	08 Feb 2020	Stable	Upgraded with the latest SnapLogic Platform release.
4.19	snapsmrc528	14 Nov 2019	Stable	Upgraded with the latest SnapLogic Platform release.
4.18	snapsmrc523	10 Aug 2019	Stable	Upgraded with the latest SnapLogic Platform release.
4.17 Patch	ALL7402	11 Jun 2019	Latest	Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.
4.17	snapsmrc515	11 Jun 2019	Latest	New Snap: Introducing the Clustering Snap that performs exploratory data analysis to find hidden patterns or groupings in data. Enhanced the AutoML Snap. You can now: Select algorithms to derive the top models. Input the best model generated by another AutoML Snap from a previous execution. View an interactive HTML report that contains statistics of up to 10 models. Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.
4.16	snapsmrc508	16 Feb 2019	Stable	New Snap: Introducing the AutoML Snap, which lets you automate the process of selecting machine learning algorithms and tuning hyperparameters. This Snap gives the best predictive model within the specified time limit.
4.15	snapsmrc500	15 Dec 2018	Stable	New Snap Pack. Perform data modeling operations such as model training, cross-validation, and model-based predictions. Additionally, you can also execute Python scripts remotely. Snaps in this Snap Pack are: Cross Validator -- Classification Cross Validator -- Regression Predictor -- Classification Predictor -- Regression Remote Python Script Trainer -- Classification Trainer -- Regression Releases the Remote Python Executor account and the Remote Python Executor Dynamic account for the Remote Python Script Snap.