Executing an eXtreme-mode Pipeline From a Standard-mode Pipeline
This example Pipeline uses a Pipeline Execute Snap in standard-mode to call and execute an eXtreme-mode Pipeline as a child Pipeline.
- Create an eXtreme-mode Pipeline.
- Create a standard-mode Pipeline to call the eXtreme-mode Pipeline.
This eXtreme-mode Pipeline (azd - numbers) reads a CSV file from Azure Storage blob (WASB), parses the file in CSV format, transforms the data using certain calculations in Aggregate - SparkSQL 2.x Snap, formats the CSV output file, and finally writes the file to the specified path.
In this Pipeline, we configure the Pipeline Execute Snap to call the eXtreme-mode Pipeline. Hence, we specify the child Pipeline name azd - numbers and the Snaplex Path azd_DS3_v2_1w on which to execute the eXtreme-mode Pipeline.
Upon running the Pipeline Execute Snap (standard-mode), it locates and successfully executes the selected eXtreme-mode Pipeline. We can view the statistics of the Pipeline execution as shown below:
Download this Pipeline.
Run a Child Pipeline Multiple Times
The project demonstrates how you can configure the Pipeline Execute Snap to execute a child Pipeline multiple times. The project contains the following Pipelines:
- PE_Multiple_Executions_Child: A simple child Pipeline that writes out a document with static string and the number of input documents received by the Snap.
- PE_Multiple_Executions_NoReuse_Parent: A parent Pipeline that executes the PE_Multiple_Executions_Child Pipeline five times. You can save the Pipeline to examine the output documents. Note that the output contains a copy of the original document and the $inCount field is always set to one because the Pipeline was separately executed five times.
- PE_Multiple_Executions_Reuse_Parent: A parent Pipeline that executes the PE_Multiple_Executions_Child Pipeline once and feeds the child Pipeline execution five documents. You can save the Pipeline to examine the output documents. Note that the output does not contain a copy of the original document and the $inCount field goes up for each document since the same Snap instance is being used to process each document.
- PE_Multiple_Executions_UltraSplitAggregate_Parent: A parent Pipeline that is an example of using Snaps that are not Ultra-compatible in an Ultra Pipeline. This Pipeline can be turned into an Ultra Pipeline by removing the JSON Generator Snap at the head of the Pipeline and creating an Ultra Task.
- PE_Multiple_Executions_UltraSplitAggregate_Child: A child Pipeline that splits an array field in the input document and sums the values of the $num field in the resulting documents.
Propagate a Schema Backward – 1
The project, PE_Backward_Schema_Propagation_Contacts, demonstrates the schema suggest feature of the Pipeline Execute Snap. It contains the following files:
- contact.schema (Schema file)
- test.json (Output file)
The parent Pipeline is shown below:
The child Pipeline is as shown below:
The Pipeline Execute Snap is configured as:
The following schema is provided in the JSON Formatter Snap. It has three properties - $firsName, $lastName, and $age. This schema is back propagated to the parent Pipeline.
The parent Pipeline must be validated in order for the child Pipeline's schema to be back-propagated to the parent Pipeline. Below is the Mapper Snap in the parent Pipeline:
Notice that the Target Schema section shows the three properties of the schema in the child Pipeline:
Upon execution the data passed in the Mapper Snap will be written into the test.json file in the child Pipeline. The exported project is available in the Downloads section below.
Propagate Schema Backward and Forward
The project, PE_Backward_Forward_Schema_Propagation, demonstrates the Pipeline Execute Snap's capability of propagating schema in both directions – upstream as well as downstream. It contains the following Pipelines:
The parent Pipeline is as shown below:
The Pipeline Execute Snap is configured to call the Pipeline schema-child. This child Pipeline consists of a Mapper Snap that is configured as shown below:
The Mapper Snaps upstream and downstream of the Pipeline Execute Snap: Mapper_InputSchemaPropagation, and Mapper_TargetSchemaPropagation are configured as shown below:
When the Pipeline is executed, data propagation takes place between the parent and child Pipeline:
- The string expression $foo is propagated from the child Pipeline to the Pipeline Execute Snap.
- The Pipeline Execute Snap propagates it to the upstream Mapper Snap (Mapper_InputSchemaPropagation), as visible in the Target Schema section. Here it is assigned the value 123.
- This is passed from the Mapper to the Pipeline Execute Snap that internally passes the value to the child Pipeline. Here $foo is mapped to $bar. $baz is another string expression in the child Pipeline (assigned the value 2).
- $bar, and $baz are propagated to the Pipeline Execute Snap and propagated forward to the downstream Mapper Snap (Mapper_TargetSchemaPropagation). This can be seen in the Input Schema section of the Mapper Snap.