In this article
Table of Contents | ||||
---|---|---|---|---|
|
Overview
You can use this Snap to apply aggregate functions on input data using the Group By support. This Snap enables you to calculate an aggregate function on a set of values to return a single scalar value.
This Snap does not support
list
andmap
objects referenced in the JSON paths.If the input documents areunsorted and GROUP-BY fields are used, you must use the Sort Snap upstream of the Aggregate Snap to presort the input document stream and set the Sorted stream field Ascending or Descending to prevent the
out-of-memory
error. Learn more about presorting unsorted input documents to be processed by the Aggregate Snap. However, if the total size of input documents is expected to be relatively small compared to the available memory, then Sort Snap is not required upstream.
Learn more about presorting unsorted input documents to be processed by the Aggregate Snap.
...
The following are the commonly used SQL Aggregate functions:
...
Snap Type
Aggregate Snap is a Transofrm Transform-type Snap that transforms, parses, cleans, and formats data from binary to document data.
...
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| Each document should contain values referenced in the Aggregate fields and the GROUP-BY fields field set. If not, the input data is sent to the error view. |
Output | Document |
|
| Each document contains the mapped data that includes key-value entries of the GROUP-BY field name and its value, and a key-value entry of the Result field and its value, if processed successfully. |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:
Learn more about Error handling in Pipelines. |
...
During execution, data processing on Snaplex nodes occurs principally in-memory as streaming and is unencrypted. When larger datasets are processed that exceeds exceed the available compute memory, the Snap writes Pipeline data to local storage as unencrypted to optimize the performance. These temporary files are deleted when the Snap/Pipeline execution completes. You can configure the temporary data's location in the Global properties table of the Snaplex's node properties, which can also help avoid Pipeline errors due to the unavailability of space. For more information, see Temporary Folder in Configuration Options.
...
The following example pipeline demonstrates how to use the Aggregate Snap to count the occurrences of a given product name.
...
Downloads
Info |
---|
|
...