Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Snap type:

Transform


Description:

The Snap groups Snap groups data from multiple input documents into batches of output documents by size(number)each output document. Each batch is an output document with a list contains an array of input Map data as a value at the location specified by the Target field property. The size of the list of input Map data an array is specified by the Group size property. The number of output documents is, the number of input documents divided by the Group size, rounded up, except when the Memory Sensitivity property is set to Dynamic, which allows the group size to vary dynamically.

  • Expected upstream Snaps: Any Snap with a document output view
  • Expected downstream Snaps: Any Snap with a document input view
  • Expected input: A document with a Map data
  • Expected output: A document with a list of input Map data as a value at the location specified by the Target field


Prerequisites:

All input documents should be of Map data type.


Support and limitations:Does not work in Ultra Pipelines.
Account: 

Accounts are not used with this Snap.


Views:


InputThis Snap has exactly one document input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view. The error view contains error, reason, resolution, and stack trace.


Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Target field

Required.Target field name to be used as a key in the output document or a JSON path where a list of input Map data would be located Specifies the JSON path where the group array should be located within each output document.

Example: "grouped_data", "$group", "$group.list"

Default value: "group"


Memory Sensitivity

Required. Indicates the Snap's behavior towards memory changes. Choose one of the available options:

  • None: If selected, the size of every group will be the configured Group Size, except for the final group and partial groups created by the Flush Timeout feature

  • Dynamic: If selected, the size of each group will vary dynamically based on available memory, ranging from a maximum specified by Group Size and a minimum specified by Group Size.

    Note

    This setting is disabled when the Group Size is set to 0


    Example: Dynamic

    Default value: None

Group size

Required. Enter the number of input documents to be grouped into a single output document. A value of 0 instructs the Snap to group all the input documents into a single document. When Memory Sensitivity is Dynamic, this field specifies the maximum size of the group. 

  • Minimum value: 0
  • Maximum value: No maximum value.
Info

When the input stream ends, the Snap outputs the final group, regardless of the Group Size. For example, if the input stream has 105 documents in it, and the Group Size is 100, the Snap outputs one group of 100 and one group of 5.


Note

This is an expression-enabled property; however, you can only pass data for this property using Pipeline parameters or expressions. This Snap does not support passing upstream data.

Example: 15000

Minimum value: 0

Maximum value: No maximum value.

Default value: 10

Min Group Size

Activated when Memory Sensitivity is set to Dynamic.

Enter the minimum number of input documents to be grouped into a single output document.

Note

We recommend setting this value to 5% or less of the Group Size. It should not be higher than 10% of the Group Size. This setting is not applicable to the last group and flushed groups.


Flush Timeout

Required. Enter a non-zero value in this field to specify the number of seconds which can pass with no new input before the Snap should output a partial group, a group containing fewer than Group Size input documents.

Info

When the Flush Timeout is 0, the Snap waits until it receives the messages specified in the Group Size field.

The Flush Timeout is useful in scenarios where the input stream never ends, or has long pauses as documents are read from it. In scenarios, where the Snap continually polls from an external system for new data, such as Kafka Consumer or Salesforce Subscriber Snaps, you can use the Flush Timeout field to specify a timeout so that the Snap always outputs whatever is available.

For example, if the Group Size is 100 and 105 records are currently available from Kafka application, the Snap passes output in two groups (100 and then 5), and continues to wait for more records. If the upstream Snap outputs another 15 records that are available, another group of 15 or more is passed, after the Flush Timeout is reached.

Example: 10

Default value: 0

Multiexcerpt include macro
nameSnap Execution
pageAnaplan Read

Multiexcerpt include macro
nameSnap_Execution_Introduced
pageAnaplan Read

Examples


Expand
titleThe input and output documents in code

Input and Output Documents in Code

Assume an input stream of five documents as follows:

Code Block
[
    {
      "OrderNumber": 1,
      "OrderItem": "hamburger",
      "Stuff": [
        "apple",
        "pear",
        "peach"
      ]
    },
    {
      "OrderNumber": 1,
      "OrderItem": "fries",
      "Stuff": [
        "apple",
        "pear",
        "peach"
      ]
    },
    {
      "OrderNumber": 1,
      "OrderItem": "coke",
      "Stuff": [
        "apple",
        "pear",
        "peach"
      ]
    },
    {
      "OrderNumber": 2,
      "OrderItem": "hot dog",
      "Stuff": [
        "apple",
        "pear",
        "peach"
      ]
    },
    {
      "OrderNumber": 2,
      "OrderItem": "sprite",
      "Stuff": [
        "apple",
        "pear",
        "peach"
      ]
    }
  ]


If we set the Group size property to "2" and the Target field property to "$group.list", there will be three output documents as follows:


Code Block
[
  {
    "group": {
      "list": [
        {
          "OrderNumber": 1,
          "OrderItem": "hamburger",
          "Stuff": [
            "apple",
            "pear",
            "peach"
          ]
        },
        {
          "OrderNumber": 1,
          "OrderItem": "fries",
          "Stuff": [
            "apple",
            "pear",
            "peach"
          ]
        }
      ]
    }
  },
  {
    "group": {
      "list": [
        {
          "OrderNumber": 1,
          "OrderItem": "coke",
          "Stuff": [
            "apple",
            "pear",
            "peach"
          ]
        },
        {
          "OrderNumber": 2,
          "OrderItem": "hot dog",
          "Stuff": [
            "apple",
            "pear",
            "peach"
          ]
        }
      ]
    }
  },
  {
    "group": {
      "list": [
        {
          "OrderNumber": 2,
          "OrderItem": "sprite",
          "Stuff": [
            "apple",
            "pear",
            "peach"
          ]
        }
      ]
    }
  }
]
 



Expand
titleThe input and output documents in a pipeline

Input and Output Documents in a Pipeline

In this pipeline, the Group By N Snap groups the input documents into batches by the group size. The File Reader Snap passes the input documents to be parsed and sorted by the Group By N Snap.

The File Reader Snap passes the input document order.json to be grouped into batches by size.

The JSON Parser Snap parses the binary input from the File Reader Snap:

The Sort Snap sorts the input documents in ascending order. The respective output preview:

The Group By N Snap groups the Target Field $group.list in the size of 2 meaning each group will have a batch of two output documents.

The output preview from the Group By N Snap with the grouping into batch of output documents by Group size 2:



Insert excerpt
Transform Snap Pack
Transform Snap Pack
nopaneltrue