In this example, the CSV Generator Snap named Train Dataset contains Iris flower samples and the length and width data of their sepals and petals. The Type Converter Snap is configured to automatically detect and convert the data types. In this example, all 4 fields: sepal_length, sepal_width, petal_length, and petal_width are automatically converted into numeric fields. The output preview of the Type Converter Snap is as follows:
This dataset is passed to the Clustering Snap which is configured as follows:
The Clustering Snap is configured as follows:
- Algorithm: K-Means
- Max cluster: 3
- Pass through: Selected
Because we have selected K-Means, the Snap will create exactly 3 clusters. We select Pass through to include all the input fields in the output.
The Clustering Snap is configured for two output views. The first output view displays the data and the cluster index, and the second output view displays the model.
The preview from the first output view of the Clustering Snap is as follows:
The dataset is grouped into 3 clusters as seen in the $pred column. The Clustering Snap computes the differences in each pair of the documents and groups the most similar ones into the same cluster. Each algorithm has a way to cluster documents differently. This clustering is for the K-Means algorithm.
The preview from the second output view of the Clustering Snap is as follows:
In the output preview, we can view the model. The model output of the Clustering Snap is converted to JSON using the JSON Formatter Snap and then passed to a File Writer Snap. This model can then be used in another Pipeline with Clustering Snap to derive predictions.