The input is generated by the CSV Generator Snap and is composed of the following fields:
- Relative Compactness
- Surface Area
- Wall Area
- Roof Area
- Overall Height
- Glazing Area
- Glazing Area Distribution
- Heating Load
Use Cross Validator (Regression) Snap to evaluate how each ML algorithm performs in this dataset.
This input document is passed through the Type Converter Snap that is configured to automatically detect and convert the data types. In any ML pipeline, you must first analyze the input document using the Profile Snap and the Type Inspector Snap to ensure that there are no null values or that the data types are accurate. This step is skipped in this example for simplicity's sake.
Below is a preview of the output from the Type Converter Snap:
After preparing the data, the first thing to do is K-fold Cross Validation. Cross Validator (Regression) Snap takes the full dataset and randomly splits the dataset into training set and test set which are used to evaluate the selected ML algorithm.
Below is the configuration of the Cross Validator (Regression) Snap:
The output from this Snap is as shown below:
Optionally, you can write the output from the Cross Validator (Regression) Snap into a file using the downstream File Writer Snap.
Download this pipeline.