Iris Flower Classification using Neural Networks
On this Page
Problem Scenario
Taxonomy is the science of classifying organisms including plants, animals, and microorganisms. In September 1936, R. A. Fisher published a paper named "The Use of Multiple Measurements in Taxonomic Problems". In this paper, four measurements (sepal length, sepal width, petal length, and petal width) of 150 flowers are included. There are 50 samples of each type of Iris flowers: Iris setosa, Iris versicolor, and Iris virginica. The author demonstrated that it is possible to find good enough linear functions of measurements that can be applied to distinguish types of Iris flowers.
Description
Almost 100 years have passed, Iris dataset is now one of the best-known datasets for people who study Machine Learning and data science. This dataset is a multi-class classification setting with four numeric features. The screenshot below shows a preview of this dataset, there are three types of Iris flowers: setosa, versicolor, and virginica. The numbers indicating the size of sepal and petal are in centimeters. You can find more details about this dataset here. If you are familiar with Python, you can also get this dataset from Sci-Kit Learn library as described here. We will build simple Neural Networks to tackle this classification problem. Then, we will host the model as an API inside the SnapLogic platform.
Objectives
- Model Building: Use Remote Python Script Snap from ML Core Snap Pack to deploy python script to train neural networks model on Iris flower dataset.
- Model Testing: Test the model with a few samples.
- Model Hosting: Use Remote Python Script Snap from ML Core Snap Pack to deploy python script to host the model and schedule an Ultra Task to provide API.
- API Testing: Use REST Post Snap to send a sample request to the Ultra Task to make sure the API is working as expected.
We are going to build 4 pipelines: Model Building, Model Testing, Model Hosting, and API Testing; and an Ultra Task to accomplish the above objectives. Each of these pipelines is described in the Pipelines section below.
Pipelines
Model Building
In this pipeline, the File Reader Snap reads the training set containing 100 samples. Then, the Remote Python Script Snap trains the model using Neural Networks algorithm. The model consists of two parts: target_encoder describes the mapping between the encoded class to actual Iris flower name; and the model that is serialized. The model is converted into JSON format and saved on SnapLogic File System (SLFS) using JSON Formatter Snap and File Writer Snap.
Remote Python Script Snap executes python script on Remote Python Executor (RPE). If no account is provided, it will assume RPE at localhost:5301 without a token.
Below is the output of the Remote Python Script Snap.
Python Script
Below is the script from the Remote Python Script Snap used in this pipeline. There are 3 main functions: snaplogic_init, snaplogic_process, and snaplogic_final. The first function (snaplogic_init) is executed before consuming input data. The second function (snaplogic_process) is called on each of the incoming documents. The last function (snaplogic_final) will be processed after all incoming documents have been consumed by snaplogic_process.
We use SLTool.ensure to automatically install required libraries. The SLTool class contains useful methods: ensure, execute, encode, decode, etc. In this case, we need scikit-learn, keras, and tensorflow. The tensorflow 1.5.0 does not have optimization and hence is recommended for old CPUs.