Group By Fields

In this article

Snap type:

Transform

Description:

The Snap groups input documents by the field values into batches of output documents. Each batch is an output document with a list of input Map data as a value at the location specified by the Target field property. Input documents with the same group-by field values are grouped into the same output document.


The Snap expects the input documents with the same group-by field values to be contiguous and whenever the group-by field values change, the Snap produces a new output document. Therefore, if all input documents with the same group-by field values are expected to be grouped into one output document, the Sort Snap can be used in front of the Group By Fields Snap so that the input document stream are sorted by the group-by field values.


  • Expected upstream Snaps: Any Snap with a document output view
  • Expected downstream Snaps: Any Snap with a document input view
  • Expected input: A document with Map data
  • Expected output: A document with a list of input Map data as a value at the location specified by the Target field
Prerequisites:

All input documents should be of Map data type and contain values specified by the Fields property.

Support and limitations:Does not work in Ultra Pipelines.
Account: 

Accounts are not used with this Snap.

Views:
InputThis Snap has exactly one document input view.
OutputThis Snap has exactly one document output view. The Snap is configured with a second output view to get statistics of the input data
ErrorThis Snap has at most one document error view and produces zero or more documents in the view. The error view contains error, reason, resolution and stack trace.

Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Fields



Required. The fields to group by.

Example: $OrderNumber

Default value: [None]

Memory Sensitivity

Required. Indicates the Snap's behavior towards memory changes. Choose one of the available options:

  • None: If selected, it groups input documents by the field values into batches of output documents.

  • Dynamic: If selected, groups may be split into multiple parts, depending on memory availability. The group size to scale against each group is determined statistically from the groups already processed (mean group size + one standard deviation)

Default value: [None]

Min. Part Size

Activated when Memory Sensitivity is set to Dynamic.

Enter the minimum part size that you want Snap to split larger groups into multiple parts. 

This limit does not apply to the last part of the multi-part group or a single part of the group that's smaller than the size of the part mentioned here. 

Example: 100

Default value: 10

Target field


Required. Target field name to be used as a key in the output document or a JSON path where a list of input Map data would be located.

Example: batch

Default value: group

Minimum memory (MB)

If the available memory is less than this property value while processing input documents, the Snap stops to fetch the next input document until more memory is available. This feature is disabled if this property value is 0.

Example: 500

Default value: 750

Out-of-memory timeout (minutes)

If the Snap pauses longer than this property value while waiting for more memory available, it throws an exception to prevent the system from running out of memory.


Example: 30

Default value: 20

Snap Execution



Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default Value: Execute only
Example: Validate & Execute

Examples


 Input and output documents batched by the group name and fields

Input and Output Documents Batched by the Group Name and Fields

In this pipeline, the Group By Fields Snap groups the input documents into a batch of output documents with the same group by the field property.

The JSON Generator Snap passes the values to be batched into groups by fields.


The Sort Snap Sorts the input documents into ascending order, the respective output preview:


The Group By Fields Snap groups the documents by group name and fields.

The output preview from the Group By Fields Snap grouped by the order number and the group fields:


The output preview in the table format:


 Input and output documents by code

Input and Output Documents by Code

Assume an input stream of nine documents as follows:

[
    {
      "OrderNumber": 1,
      "something": "abc"
    },
    {
      "OrderNumber": 1,
      "something": "def"
    },
    {
      "OrderNumber": 1,
      "something": "ghi"
    },
    {
      "OrderNumber": 2,
      "something": "jkl"
    },
    {
      "OrderNumber": 2,
      "something": "mno"
    },
    {
      "OrderNumber": 2,
      "something": "pqr"
    },
    {
      "OrderNumber": 3,
      "something": "stu"
    },
    {
      "OrderNumber": 3,
      "something": "vwx"
    },
    {
      "OrderNumber": 4,
      "something": "yz"
    }
]


 If we set the Fields property to "$OrderNumber" and the Target field property to "$group.items", there will be four output documents as follows:


[
  {
    "groupBy": {
      "OrderNumber": 1
    },
    "group": {
      "items": [
        {
          "OrderNumber": 1,
          "something": "abc"
        },
        {
          "OrderNumber": 1,
          "something": "def"
        },
        {
          "OrderNumber": 1,
          "something": "ghi"
        }
      ]
    }
  },
  {
    "groupBy": {
      "OrderNumber": 2
    },
    "group": {
      "items": [
        {
          "OrderNumber": 2,
          "something": "jkl"
        },
        {
          "OrderNumber": 2,
          "something": "mno"
        },
        {
          "OrderNumber": 2,
          "something": "pqr"
        }
      ]
    }
  },
  {
    "groupBy": {
      "OrderNumber": 3
    },
    "group": {
      "items": [
        {
          "OrderNumber": 3,
          "something": "stu"
        },
        {
          "OrderNumber": 3,
          "something": "vwx"
        }
      ]
    }
  },
  {
    "groupBy": {
      "OrderNumber": 4
    },
    "group": {
      "items": [
        {
          "OrderNumber": 4,
          "something": "yz"
        }
      ]
    }
  }
]
  

Related Content


 Click to view/expand
ReleaseSnap Pack VersionDateType Updates
November 2024439patches29078 Latest

Fixed an issue with the CSV Parser Snap that introduced unexpected characters into the records and output data because of incorrect handling of the delimiter.

November 2024main29029 StableUpdated and certified against the current SnapLogic Platform release.
August 2024438patches28073 Latest

Fixed an issue with the JSON Generator and XML Generator Snaps that caused unexpected output displaying '__at__' and '__h__' instead of '@' and '-' respectively because the Snap could not update them to their original values after the Velocity library upgrade.

August 2024438patches27959 Latest

Fixed an issue with the Sort where the Snap could not sort files larger than 52 MB. This fix applies to Join Snap also.

August 2024main27765 StableUpgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.
May 2024437patches26643 Latest
  • Fixed an issue with the Sort Snap that displayed an error when estimating the size of the input document provided by the upstream S3 Browser Snap.
  • Fixed an issue with the Parquet Formatter Snap that was unable to route errors to the error view.
May 2024437patches26453 Latest
  • Added expression support to the Skip lines field in the CSV Parser Snap to enable passing pipeline parameters and upstream values. 

  • Fixed an issue with the XML Parser Snap that caused an error when using the Splitter option in the Snap settings. 

May 2024main26341 Stable
  • Added Parquet Parser and Parquet Formatter Snaps to the Transform Snap Pack:
    • Parquet Parser: Reads the binary Parquet data and writes document data to the output.
    • Parquet Formatter: Reads the document data and writes it to the output in binary Parquet format.
  • Enhanced the JSON Splitter Snap to capture metadata and lineage information from the input document.

February 2024436patches25564 Latest

Fixed an issue with the JSON Formatter Snap that generated incorrect schema.

February 2024436patches25292 Latest

Fixed an out-of-memory error issue with the Aggregate Snap. This Snap no longer performs the presort for the input documents.

If the input documents are unsorted and GROUP-BY fields are used, you must use the Sort Snap upstream of the Aggregate Snap to presort the input document stream and set the Sorted stream field Ascending or Descending to prevent the out-of-memory error. However, if the total size of input documents is expected to be relatively small compared to the available memory, then Sort Snap is not required upstream.

Learn more about presorting unsorted input documents to be processed by the Aggregate Snap.

February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023435patches24802 LatestFixed an issue with the Excel Parser Snap that caused a null pointer exception when the input data was an Excel file that did not contain a StylesTable.
November 2023435patches24481 Latest

Fixed an issue with the Aggregate Snap where the Snap was unable to produce the desired number of output documents when the input was unsorted and the GROUP-BY fields field set was used.

November 2023435patches24094 Latest

Fixed a deserialization issue for a unique function in the Aggregate Snap.

November 2023main23721 StableUpdated and certified against the current SnapLogic Platform release.
August 2023434patches23076 LatestFixed an issue with the Binary to Document Snap where an empty input document with Ignore Empty Stream selected caused the Snap to stop executing.
August 2023434patches23034 Latest
  • Fixed an issue with the Transform Snap Pack that caused an error when the input file was a binary JSON file that contained a string value of more than 20,000,000 characters.
  • Fixed a memory issue with the Aggregate Snap that occurred when using GROUP-BY fields.

August 2023434patches22705 Latest

Fixed an issue with the JSON Splitter Snap that caused the pipeline to terminate with excessive memory usage on the Snaplex node after the 4.33 GA upgrade. The Snap now consumes less memory.

August 2023main22460 StableUpdated and certified against the current SnapLogic Platform release.
May 2023433patches22431 Latest
  • Fixed an issue with the Excel Multi Sheet Formatter Snap that caused it to produce binary output data when there was no input document and Ignore empty stream was selected.
  • Introduced the following new Snaps:
    • GeoJSON Parser: Parses geospatial data from binary data input and outputs the contents as a GeoJSON document downstream.

    • WKT Parser: Parses geospatial data from binary data input and outputs the contents as a WKT (Well Known Text) document downstream.

May 2023433patches21779 Latest

The Decrypt Field and Encrypt Field Snaps now support CTR (Counter mode) for the AES (Advanced Encryption Standard) block cipher algorithm.

May 2023433patches21586 Latest

The Decrypt Field Snap now supports the decryption of various encrypted fields on providing a valid decryption key.

May 2023433patches21461 Latest

The following Transform Snaps include new fields to improve memory management: Aggregate, Group By Fields, Group By N, Join, Sort, Unique.

May 2023433patches21336 Latest

Fixed an issue with the AutoPrep Snap where dates could potentially be rendered in a currency format because currency format options were displayed for the DOB column.

May 2023433patches21196 Latest

Enhanced the In-Memory Lookup Snap with the following new fields to improve memory management and help reduce the possibility of out-of-memory failures:

  • Minimum memory (MB)

  • Out-of-memory timeout (minutes)

These new fields replace the Maximum memory % field.

May 2023main21015 StableUpgraded with the latest SnapLogic Platform release.
February 2023432patches20535 Latest

Fixed an issue with the Encrypt Field Snap, where the Snap failed to support an RSA public key to encrypt a message or field. Now the Snap supports the RSA public key to encrypt the message.

February 2023432patches20446 Latest

The Join Snap is enhanced with the following:

  • The Pipeline Execution Statistics of the Join Snap now has a status message that displays the parameters - Free disk space, Available memory, and Average document size.

  • The internal sort buffer size is reduced to a minimum of 10MB when the available memory in the node becomes lower than 500MB to avoid the out-of-memory crash.

  • The internal sort buffer size is restored to its original size when the available memory becomes larger than 2GB.

  • We have improved the readability of the error message for the out of disk space on node error. The updated error message now provides clearer information and guidance for users, as shown below:
    Reason: Insufficient free disk space available to stage sort data into temporary files.
    Resolution:  Increase the amount of free disk space and try again.

February 2023

432patches20250

 Latest
  • Fixed an issue with the JSON Splitter Snap that was causing errors when using multiple repeated dots in the JSON Path.
  • The Sort Snap includes the following improvements:

    • The Maximum memory % field is revised to Maximum memory.

    • The Maximum memory unit (new dropdown list) enables you to choose a unit, percentage (%), or MB for better memory control.

February 2023432patches20151 Stable/Latest

Fixed an issue that occurred with the JSON Splitter Snap when used in an Ultra pipeline. The request was acknowledged before it was processed by the downstream Snaps, which caused a 400 Bad Request response.

February 2023432patches20062 Stable/LatestFixed the behavior of the JSON Splitter Snap for some use cases where its behavior was not backward compatible with the 4.31 GA version. These cases involved certain uses of either the Include scalar parents feature or the Include Paths feature.
February 2023432patches19974 Stable/Latest

Fixed the "Json Splitter expects a list" error by restoring the JSON Splitter Snap's previous behavior of handling the case where the document element referenced by the JSON Path to Split field is an object instead of a list or array.

Review your pipelines where this error occurred to check your assumptions about the input to the JSON Splitter and whether the value referenced by the JSON Path to Split field will always be a list. If the input is provided by an XML-based or SOAP-based Snap like the Workday or NetSuite Snaps, a result set or child collection that’s an array when there's more than one result or child will be an object when there's only one result or child. In these cases, we recommend using a Mapper Snap and the sl.ensureArray() function to ensure that the value being split by the JSON Splitter is always an array (even for the single element cases).

February 2023432patches19918 Stable/Latest
  • Fixed an issue with the CSV Formatter Snap where the Unicode character delimiters using [0-9a-f] did not work.

  • Fixed an issue with the JSON Splitter Snap that was generating null values for empty input data.

February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022431patches19441 Stable

The Encrypt Field Snap supports decryption of encrypted output in Snowflake Snaps.

November 2022431patches19385 Latest

The Transform Join Snap now doesn’t fail with the Null Pointer Exception when you configure the Sorted streams field with Ascending.

November 2022431patches19359 LatestThe JSON Splitter Snap includes memory improvements and a new Exclude List from Output Documents checkbox. This checkbox enables you to prevent the list that is split from getting included in output documents, and this also improves memory usage.
November 2022main18944 Stable
  • The Excel Formatter and Excel Multi Sheet Formatter Snaps now include a Convert formula strings to formulas checkbox.
  • The Mapper Snap now has a Sorted checkbox in the Input Schema and Target Schema panels, which allows you to sort the input and target schemas. When unchecked, the Snap unsorts the input and the target schema.

October 2022430patches18800 LatestThe Sort and Join Snaps now have improved memory management, allowing used memory to be released when the Snap stops processing.
October 2022430patches18610 Latest

The CSV Formatter and CSV Parser Snaps now support shorter values of Unicode characters.