Introduction

Pipeline execution flow is a systematic, step-by-step process to facilitate the seamless movement of data from its initiation to completion. This orchestrated flow is designed to ensure the efficient processing of data across diverse business systems. Within this process, the control plane manages the integration tasks by strategically organizing components across the Snaplex infrastructure. Simultaneously, the Snaplex nodes execute the assigned integration tasks under the guidance of the control plane. The pipeline can be executed either by Triggered, Ultra, or Scheduled Tasks. Regardless of the task type, the pipeline goes through distinct states during its lifecycle.

Architecture Overview

The pipeline execution architecture involves a user interface for pipeline design, a control plane orchestrating tasks and managing the overall workflow, and a distributed Snaplex environment where Snaplex nodes execute integration tasks in parallel. The Metadata in the control plane stores essential pipeline information, and the Snap store provides a catalog of pre-built components. Throughout the process, the collaboration between these components ensures efficient task execution, error handling, and communication for seamless data integration.

Snaplogic execution.png

Steps in the pipeline execution flow

execution flow.png

Initialization

The initialization phase in pipeline execution involves the preparatory steps required to set up and begin the pipeline execution. Also called a NoUpdate state, this state sets the groundwork for the subsequent stages in the pipeline execution flow, allowing the system to establish the context and allocate resources effectively for the upcoming pipeline execution.

This stage involves the following activities:

note

This state is only relevant if the pipeline is executed on the leader node.

This state is only relevant if the pipeline is executed on the leader node.

Prepare

During the Prepare stage, several important tasks are carried out to set the groundwork for the execution of the pipeline. It involves communication between the control plane and the data plane (Snaplex). This state ensures that all necessary components are in place for pipeline execution and identifies and addresses potential configuration issues. The Prepare stage involves the following key activities:

Execution

During the execution stage, the actual processing of integration tasks takes place, and the orchestrated flow of data, as defined in the pipeline, is carried out in real-time. Throughout this process, the Snaplex nodes communicate with the control plane to report the status of task execution, provide updates, and receive further instructions.

Completion

After the pipeline execution is complete and resources are released. The pipeline sends a comprehensive set of execution metrics to the control plane, including:

Pipeline workflow: How control plane and nodes work together

Control plane

The control plane acts as a central management overseeing the pipeline execution. It plays a pivotal role in managing the lifecycle of data integration tasks, from initiation to completion. It manages the execution of integration workflows and ensures efficient collaboration with Snaplex nodes.

Snaplex nodes

Snaplex nodes are the distributed execution nodes in the SnapLogic architecture. They process the integration tasks assigned by the control plane. Each Snaplex node functions as an independent computing unit capable of executing tasks concurrently, contributing to the scalability and performance of the overall integration process.

Stage

Control Plane

Snaplex nodes

Initialization

  • Receives pipeline execution request from the external trigger.

  • Assigns specific tasks within a pipeline to Snaplex nodes based on their availability and capabilities.

  • Initiates the process by identifying the relevant pipeline based on the execution request.

  • Recieves pipeline execution request from control plane.

  • Allocates resources like memory and processing power to perform the data integration tasks.

  • Sets up its execution environment based on the pipeline's requirements.

Prepare

  • Accesses the metadata store to retrieve the required information about the designated pipeline.

  • Performs validation and authorization checks to ensure that the pipeline execution adheres to defined security and access controls.

  • Establishes connections to relevant data sources and targets. This includes connecting to databases, APIs and file systems in the data integration process.

  • Performs pre-execution checks to validate that the environment is ready and any prerequisites are met.

  • Communicates its readiness and initialization status back to the control plane.

Execution

  • Loads the metadata into its runtime environment.

  • Monitors the progress of pipeline execution.

  • Captures relevant metrics for auditing, troubleshooting, and performance analysis.

  • Based on the real-time communication from the snaplex nodes, control plane manges errors either by predefined error-handling logic or stopping the entire pipeline if needed.

  • Executes the specific tasks assigned to them.

  • Executes tasks in parallel to optimize performance and handle large volumes of data efficiently.

  • Handles errors locally and try to resolve them before communicating them to the control plane.

Completion

  • Logs error messages and timestamps.

  • Consolidates output results of each Snap and transformed data generated by Snaplex nodes.

  • Performs post-execution cleanup activities like releasing resources and closing connections.

  • Provides a final status report with the status of the pipeline execution and relevant performance metrics.

  • Releases allocated compute resources, closes open connections, and cleans up temporary files.

  • Sends comprehensive metrics to the control plane for analysis and optimization.

  • Archives detailed execution logs and troubleshooting for future reference.