The following table displays a Pipeline set from the Telco Customer Churn Prediction use case. You can reuse these Pipelines by connecting to your data sources.
|Profiling. This Pipeline reads the dataset from SnapLogic File System (SLFS), performs type conversion, and computes data statistics that are saved into the SLFS in JSON format.|
|Data Preparation. This Pipeline also reads the dataset from SLFS, performs type conversion. Then, the Mapper Snap removes id field from the dataset. The Clean Missing Value Snap replaces all missing values in the dataset with average value. The average value is included in the data statistics computed in the previous Pipeline. We use the File Reader Snap to read these statistics.|
|Cross Validation. We have two Pipelines in this step.|
The top Pipeline (child Pipeline) performs k-fold cross validation using a specific ML algorithm.
The Pipeline on the bottom (parent Pipeline) uses the Pipeline Execute Snap to automate the process of performing k-fold cross validation on multiple algorithms, the Pipeline Execute Snap spawns and executes child Pipeline multiple times with different algorithms. Instances of child Pipelines can be executed sequentially or in parallel to speed up the process by taking advantages of multi-core processor. The Aggregate Snap applies max function to find the algorithm with the best result.
|Model Building. After knowing which algorithm performs the best on your dataset, this Pipeline builds the model using the Trainer (Classification) Snap. You can store this model in JSON, binary, or other formats.|
Model Hosting. This Pipeline is scheduled as Ultra Task to provide REST API to external application. The request comes as an open input view. The key Snap in this Pipeline is Predictor (Classification), which hosts the ML model from JSON Parser and consumes requests from Extract Params (Mapper) Snap. It applies the ML model on the data in the request and generates a prediction.