Problem Scenario
Taxonomy is the science of classifying organisms including plants, animals and microorganisms. In September 1936, R. A. Fisher published a paper named "The Use of Multiple Measurements in Taxonomic Problems". In this paper, four measurements (sepal length, sepal width, petal length, and petal width) of 150 flowers are included. There are 50 samples of each type of Iris flowers: Iris setosa, Iris versicolor, and Iris virginica. The author demonstrated that it is possible to find good enough linear functions of measurements that can be applied to distinguish types of Iris flowers.
Description
Almost 100 years have passed, Iris dataset is now one of the best known datasets for people who study Machine Learning and data science. This dataset is a multi-class classification setting with four numeric features. The screenshot below shows a preview of this dataset, there are three types of Iris flowers: setosa, versicolor, and virginica. The numbers indicating the size of sepal and petal are in centimetres. You can find more details about this dataset here. If you are familiar with Python, you can also get this dataset from Sci-Kit Learn library as described here. We will use Logistic Regression algorithm to tackle this classification problem. We will also build the model and host it as an API inside the SnapLogic platform.
Objectives
- Model Building: Use Remote Python Script Snap from ML Core Snap Pack to deploy Python script to train neural networks model on Iris flower dataset.
- Model Testing: Test the model with few samples.
- Model Hosting: Use Remote Python Script Snap from ML Core Snap Pack to deploy Python script to host the model and schedule an Ultra Task to provide API.
- Testing the API: Build the API as a Task then execute the Task to test the API.
Pipelines
We are going to build three pipelines: Model Building, Model Testing, and Model Hosting.
Model Building
In this pipeline, the File Reader Snap reads the training set that contains 100 samples. Then, the Remote Python Script Snap trains the model using Neural Networks Algorithm. The model consists of two parts: target_encoder describes the mapping between encoded class to actual class name; and the model that is serialized. The model is converted into JSON format and saved on SnapLogic File System (SLFS).
Below is the output of the Remote Python Script Snap.
Python Script
Below is a piece of code from the Remote Python Script Snap used in this pipeline. There are three main functions: Snaplogic_init, Snaplogic_process, and Snaplogic_final. The first function (Snaplogic_init) will be executed before consuming input data. The second function (Snaplogic_process) will be called on each of the incoming document. The last function (Snaplogic_final) will be processed after all incoming documents have been consumed by Snaplogic_process.
In Snaplogic_init, we create a new session. For Snaplogic_process, we simply format the incoming documents, extract features and target, then, store them in lists. Once we have all the data, we build the neural networks model in Snaplogic_final. We start by encoding iris flower names as integers, then, we do one hot encoding. At this point, features and targets are numpy array. Our neural networks model has 1 hidden layer with 16 neurons, we train the model with adam optimizer (epochs=50 and batch_size=10). After training the model, we use SnapLogicUtil.model_to_text to serialize the neural networks model, and, we use SnapLogicUtil.encode to serialize the target encoder. Model and target encoder in text format are sent to next Snap along with training history.
Model Testing
In the bottom flow, File Reader Snap reads the neural networks model from SLFS. In the top flow, CSV Generator Snap contains 3 samples. The correct labels are setosa, versicolor, and virginica respectively.
The left picture below shows the content of CSV Generator Snap. The right picture below shows the predictions from Remote Python Script Snap.
Python Script
The input of the Remote Python Script Snap can be either the neural networks model or a sample. If it is the model, we use SnapLogicUtil.text_to_model to deserialize the model and load it into the memory, we use SnapLogicUtil.decode to deserialize the target encoder. If the incoming document is a sample, we will add it to the queue. Once the model is loaded, we will apply the model to samples in the queue and output predictions. SnapLogicUtil.predict accepts the model and a sample, it returns the prediction. In order to preserve lineage property in Ultra Task, SnapLogicUtil.drop_doc is returned for the document describing the model.
Model Hosting
The core components of this pipeline are File Reader, JSON Parser, and Remote Python Script Snaps that are the same as in the Model Testing pipeline. Instead of taking data from the CSV Generator Snap, the Remote Python Script Snap takes the data from an API request. The Check Token Snap (Router) is used to authenticate the request by checking the token that can be changed in pipeline parameters. The Extract Params Snap (Mapper) extracts the data from the request. The Body Wrapper Snap (Mapper) maps from prediction to $content which will be the response body. Finally, CORS Wrapper Snap (Mapper) adds headers to allows Cross-Origin Resource Sharing.
This pipeline is created as Ultra Task to provide REST API to external applications. The core Snaps are File Reader, JSON Parser, and Remote Python Script. The rest are for authentication, parameter extraction, and CORS handling.
Testing the API
In order to test the API, we must first build it as a Task and execute this Task.
Building the API
To build an API from this pipeline. Go to the calendar icon in the toolbar. You can either use Triggered Task or Ultra Task.
Triggered Task is good for batch processing since it starts a new pipeline instance for each request. Ultra Task is good to provide REST API to external applications that require low latency. In this case, we use Ultra Task. You do not need to specify the bearer token here since we use the Router Snap to perform authentication inside the pipeline. You can go to Manager by clicking Show tasks in this project in Manager in the Create Task window to see task details as shown in the screenshot below.
Testing
After creating an Ultra Task, you can test it. The screenshot shows a sample request and response. Based on the sepal and petal size shown below, the pipeline returns setosa as the first prediction.