Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Problem Scenario

Machine Learning has been is showing promising results in various applicationstechnology domains. Healthcare is one of them. Machine Learning can accurately help doctors diagnose patients accurately.

In this use case, we are trying to use machine learning algorithms to predict the progression of diabetes of in patients.

Description

In this paper, they collect The baseline measurements: Age, Sex, BMI, BP, and 6 Serum Measurements (S1, S2,...S6) . One year after baseline, a measure of diabetes progression is collectedof 422 patients is available in this paper. Our goal here is to teach the machine to predict the diabetes progression based on these 10 measurements.Below  The live demo is available at our Machine Learning Showcase.

The following screenshot is the preview of this dataset. There are 10 measurements and diabetes progression represented as Y in $Y which is the rightmost columnfield.

The live demo is available at our Machine Learning Showcase.

Image RemovedImage Added

Objectives

  1. Cross-Validation: Use the Cross Validator (Regression) Snap from ML Core Snap Pack to perform 10-fold cross-validation with the Linear Regression algorithm. K-Fold Cross Validation is a method of evaluating machine learning algorithms by randomly separating a dataset into training set and test set, the model will be trained on the training set and evaluated on the test setK chunks. Then, K-1 chunks will be used to train the model which is evaluated on the last chunk. This process repeats K times and the average error and other statistics are computed.
  2. Model Building: Use the Trainer (Regression) Snap from ML Core Snap Pack to build the linear regression model based on the training set of 392 samples; then serialize and store.
  3. Model EvauationEvaluation: Use Predictor (Regression) Snap from ML Core Snap Pack to apply the model on the test set of containing the remaining 50 samples and compute error.
  4. Model Hosting: Use Predictor (Regression) Snap from ML Core Snap Pack to host the model and build the API using Ultra Task.
  5. Test the API : Build the API as a Task then execute the Task to test the API.

...

  1. Testing: Use REST Post Snap to send a sample request to the Ultra Task to make sure the API works as expected.

To accomplish the objectives, we are going to build 4 5 pipelines: Cross Validation, Model Building, Model Evaluation, and Model Hosting, and API Testing; and a Task to accomplish the above objectivesan Ultra Task. Each of these pipelines is described in the Pipelines section below.

Pipelines

Cross-Validation

In this pipeline, we use the Cross Validator (Regression) Snap to perform 10-fold cross validation using the Linear Regression a linear regression algorithm. The result shows that the overall mean absolute error is 44.595256.

In this pipeline,

  1. The File Reader Snap reads the

...

  1. dataset which is in CSV format.

...

  1. The CSV Parser Snap converts binary data into documents.

...

  1. The Type Converter Snap automatically derives the types of data, since the types of documents from CSV Parser Snap are

...

  1. text represented by String data type.
    In this case, the data

...

  1. is converted into either BigInteger or BigDecimal representing numeric values.

...

  1. The Cross Validator (Regression) Snap performs 10-fold cross validation using the

...

  1. linear regression algorithm.
    Image Added
    Image AddedImage Added
  2. Finally, we use

...

  1. JSON Formatter Snap and File Writer Snap to save the result.

In this case, we save the result on SnapLogic File System (SLFS) which you can preview can be previewed by clicking at the document icon next to the File name in the File Writer Snap or download from the Manager page.

Image Removed

The screenshot below shows that the overall mean absolute error is 44.595256.

Info

You may try

...

other regression

...

algorithms in the Cross Validator (Regression) Snap and see which algorithm performs the best on this dataset.



Image RemovedImage AddedImage Added

Model Building

In this pipeline, we use the Trainer (Regression) Snap to build the model from the training set using the Linear Regression linear regression algorithm.

  1. The File Reader Snap reads the training set containing 392 samples.

...

  1.  
  2. The CSV Parser Snap converts binary data into documents.
  3. Since the types of

...

  1. documents from CSV Parser Snap are

...

  1. text represented by String data type, we use the Type Converter Snap to automatically derive types of

...

  1. data.
  2. The Trainer (Regression) Snap trains the model using the

...

  1. linear regression algorithm.
    The model consists of two parts: metadata describing the schema (field names and types) of the dataset, and the actual model. Both metadata and model

...

  1. are serialized

...

  1. . If the Readable option in the Trainer (Regression) Snap is

...

  1. checked,

...

  1. the readable model will be generated.

...

  1. The model is

...

  1. written as a JSON file

...

  1. on the SLFS using the JSON Formatter Snap and the File Writer

...

  1. Snap.

Image Added

Image RemovedImage Added

Image Added

Model Evaluation

In this pipeline, the model generated above is tested evaluated against the test set.

  1. The Predictor (Regression) Snap has two input views

...

  1. : the first input view is for the test set

...

  1. , and the second input view accepts the model generated in the previous pipeline. In this case, the Predictor (Regression) Snap predicts the progression of diabetes.
  2. The predictions from the Predictor (Regression) Snap are merged with the real diabetes progression (answer) from the Mapper Snap which extracts the $Y

...

  1. field from the test set. Image Added
  2. The result of merging is displayed in the screenshot below

...


  1. Image Added
  2. After that, we use the Aggregate Snap to compute

...

  1. mean absolute error and mean squared error which is 32.804 and 1793.410 respectively.
  2. The result is then saved using CSV Formatter Snap and File Writer Snap. 

...

  1. Image Added

Image Removed

Model Hosting

This pipeline is scheduled as an Ultra Task to provide a REST API that is accessible by external applications. The core components of this pipeline are File Reader, JSON Parser and Predictor (Regression) Snaps that are the same as in the Model Evaluation pipeline.  Instead In this pipeline:

  1. Instead of taking the data from the test set, the Predictor (Regression) Snap takes the data from API request.
  2. The

...

  1. Filter Snap

...

  1. is used to authenticate the request by checking the token that can be changed in pipeline parameters.
  2. The Extract Params Snap (Mapper) extracts the

...

  1. required fields from the request.
  2. The

...

  1. Prepare Response Snap (Mapper) maps from prediction to $content

...

  1. .pred which will be the response body.

...

  1. This Snap also adds headers to allow Cross-Origin Resource Sharing (CORS).

Image Removed

Testing the API

In order to test the API, we must first build it as a Task and execute this Task.

Image AddedImage Added

Image AddedImage Added

Building API

To build an API from this pipeline. Go to deploy this pipeline as a REST API, perform the following:

  1. Click the calendar icon in the toolbar. You can either

...

  1. use Triggered Task or Ultra

...

  1. Task.

    Image Added

    Info

    Triggered Task is good for batch processing since it starts a new pipeline instance for each request. Ultra Task is good to provide REST API to external applications that require low latency.

    In this case,

...

  1. the Ultra Task

...

  1. is preferable. A bearer token is not needed here since

...

  1. the

...

  1. Filter Snap

...

  1. performs authentication inside the pipeline.

...


  1. In order to get the URL, click Show tasks in this project in Manager in

...

  1. the Create Task window

...

Image Removed

Testing

After creating the Ultra Task, you can test it. The screenshot below shows a sample request and response. Based on the following 10 measurements, the pipeline returns 103.88 as predicted diabetes progression. The expected diabetes progression of this patient is 118.

...

  1. .
  2. Click the small triangle next to the task and then Details.
    The task detail shows up with the URL.
    Image Added

API Testing

In this pipeline:

A sample request is generated by the JSON Generator. The request is sent to the Ultra Task by REST Post Snap.

The Mapper Snap is used to extract the response which is in $response.entity.
Image Added

Following is the content of the JSON Generator Snap. It contains $token and $params which will be included in the request body sent by REST Post Snap.
Image Added

The REST Post Snap gets the URL from the pipeline parameters. Your URL can be found in the Manager page. In some cases, it is required to check Trust all certificates in the REST Post Snap.

Image AddedImage Added

The output of REST Post Snap is shown below. The last Mapper Snap is used to extract $response.entity from the request. In this case, the predicted diabetes progression is 199.95.

Image Added