The input document is generated by the CSV Generator Snap and is composed of four fields, one classification field, and three numeric fields:
- Balance Class: The classification field to denote the status of the weighing scale. B for Balanced, L for Left-inclined, and R for Right-inclined.
- Left Weight
- Left Distance
- Right Weight
- Right Distance
Use Cross Validator (Classification) Snap to evaluate how each ML algorithm performs in this dataset.
This input document is passed through the Type Converter Snap that is configured to automatically detect and convert the data types. In any ML pipeline, you must first analyze the input document using the Profile Snap and the Type Inspector Snap to ensure that there are no null values or that the data types are accurate. This step is skipped in this example for simplicity's sake.
After preparing the data, the first thing to do is K-fold Cross Validation. Cross Validator (Classification) Snap takes the full dataset and randomly splits the dataset into training set and test set which are used to evaluate the selected ML algorithm.
Below is the configuration of the Cross Validator (Classification) Snap:
The output from this Snap is as shown below, it can be seen that the selected algorithm performs at 92% accuracy for the provided dataset.
Optionally, you can write the output from the Cross Validator (Classification) Snap into a file using the downstream File Writer Snap.
You can now train the model using this algorithm in the Trainer (Classification) Snap. See Weight Balance Classification – Model Training for details.