This pipeline contains the following Snaps:
- CSV Generator: Generates a set of sentences as documents.
- Tokenizer: Converts sentences into an array of tokens.
The CSV Generator Snap outputs the following sentences:
These sentences are used as input to the Tokenizer Snap using the following configurations:
As you can see, we selected $text for the Text field property. This is the content that will be tokenized and output as an array of tokens.
The pipeline, when run, offers the following output:
As you can see, each word in the input sentences has now become a token, and sentences in each input document have become an array.
Download this pipeline.