In this article

Snap type:

Transform

Description:

This Snap joins left and right input data streams. It uses the right input data as an in-memory lookup table.

  • Expected upstream Snaps: Any Snap with a document output view, such as CSV Parser, JSON Parser, Mapper.
  • Expected downstream Snaps: Any Snap with a document input view, such as CSV Formatter, JSON Formatter, Mapper.
  • Expected input:   
    • Input data streams may be unsorted.
    • The right input document stream is loaded in memory as a lookup table while the left input document stream is not stored in the Snap.
    • The JOIN operation starts when the right input document stream ends.
    • All input document data should be of a flat map data type.
  • Expected output: Each left input document is joined with the right input data if a match is found. If not, the left input document is written to the output view without join. If the Single document output property is true, only one document is written to the output view for each left input document regardless of the number of matches found in the in-memory lookup table.
Prerequisites:

Enough free memory should be available to load all right input data to the in-memory lookup table.

Support and limitations:

Limited support in Ultra Task Pipelines when the Single document output field is selected. Only one document is written to the output view for each input document, which is a prerequisite for Ultra Pipelines.

Account: 

Accounts are not used with this Snap.

Views:


InputThis Snap has exactly two document input views. Users may want to edit the right input view label in the 'Views' section of the Snap since the right input view label is used as a prefix during the JOIN operation if the same column name exists in the left input data.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.


Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Join paths

Required. Field names to use for left and right sides of join. Each row in the table defines a relationship between a left field and a right field. The Snap supports flat map data only for all input documents, and a structured JSON path like '$customer.address' is not supported.

Left path

Required. Field name in the left input data. It can be selected from the suggested list. This property does not support expressions.

Example: customer_id, $customer_id, _leftFieldName (for pipeline parameter with the expression button enabled)

Default value:  None, the expression toggle enabled

Right path

Required. Field name in the right input data. The Right path suggestion is not available yet, except pipeline parameters. This property does not support expressions.

Example: customer_id, $customer_id, _rightFieldName (for pipeline parameter with the expression button enabled)

Default value: None, the expression button enabled

Single document outputIf selected, only one document is always written to the output view for each input document. If more than one row in the lookup table matches with the left input data, the first one in the list is joined with the left input data. If there is no match, the left input data is written to the output view. Leave this property selected if the pipeline is executed in Ultra Task Pipelines mode.
If unselected, each of the matching rows is joined with the left input data. Therefore, the number of output documents may be larger than the input document counts.
Minimum memory (MB)

If the available memory is less than this value when building the in-memory lookup table from the right input documents, the Snap stops to fetch the next right input document until more memory is available. This feature is disabled if this value is 0.

This Snap loads all right input documents into the in-memory lookup table before it starts to perform the JOIN operation. Therefore, if the input data in the right input view exceeds the available memory, it may cause an out-of-memory failure. This field helps reduce the possibility of out-of-memory failures.

Default value: 200 MB
Example: 500 MB

Out-of-memory timeout (minutes)

If the Snap pauses longer than this value while waiting for more memory to become available, it throws an exception to prevent the system from running out of memory.

Default value: 30 minutes
Example: 10 minutes

Examples


Views Page with Inputs Renamed

In-Memory Lookup overview image


Example Pipeline - Views Page with Inputs Renamed


Views page with inputs renamed:

Left input data:

 Right input data:

Output data with Single document output selected:

Output data with Single document output not selected: