...
Pipeline execution flow is a systematic, step-by-step process to facilitate the seamless movement of data from its initiation to completion. This orchestrated flow is designed to ensure the efficient processing of data across diverse business systems. Within this process, the control plane manages the integration tasks by strategically organizing components across the Snaplex infrastructure. Simultaneously, the Snaplex nodes execute the assigned integration tasks under the guidance of the control plane. The pipeline can be executed either by Triggered, ultraUltra, or scheduled tasksScheduled Tasks. Regardless of the task type, the pipeline goes through distinct states during its lifecycle.
...
The pipeline execution architecture involves a user interface for pipeline design, a control plane orchestrating tasks and managing the overall workflow, and a distributed Snaplex environment where Snaplex nodes execute integration tasks in parallel. The metadata store Metadata in the control plane stores essential pipeline information, and the Snap store provides a catalog of pre-built components. Throughout the process, the collaboration between these components ensures efficient task execution, error handling, and communication for seamless data integration.
...
Steps in the pipeline execution flow
Initialization
The initialization phase in pipeline execution involves the preparatory steps required to set up and begin the pipeline execution. Also called a NoUpdate state, this state sets the groundwork for the subsequent stages in the pipeline execution flow, allowing the system to establish the context and allocate resources effectively for the upcoming pipeline execution.
...
Request processing: The control plane receives the pipeline execution request, which can originate from various sources:
Scheduled task
External triggers like API calls and event notifications
Manual initiation by a user from the Designer or Manager
Leader node decision: In this stage, the control plane assesses the status and resource capacity of the Snaplex nodes, prioritizes nodes based on workload, processing power, and memory, selects the most appropriate node, and prepares it for pipeline execution.
This state is only relevant if the pipeline is executed on the leader node.
Prepare
During the Prepare stage, several important tasks are carried out to set the groundwork for the execution of the pipeline. It involves communication between the control plane and the data plane (Snaplex). This state ensures that all necessary components are in place for pipeline execution and proactively identifies and addresses potential configuration issues. The Prepare stage involves the following key activities:
...
Snap execution: The individual Snaps are activated and begin processing data. They perform the designated task according to the directives provided by the control plane.
Endpoint interactions: The pipeline establishes connections to any required external endpoints (for example, databases, applications, and cloud services) using the specified protocols. This enables the pipeline to process data from these systems.
Data flow orchestration: The pipeline coordinates the flow of data between Snaps and endpoints, ensuring that data moves through the pipeline in the correct sequence and format.
Resource management: The Snaplex node dynamically manages resources (memory, CPU, network) during execution to ensure optimal performance and prevent bottlenecks. The pipeline collects execution metrics, such as processing time, data volume, and error rates.
Pipeline execution statistics: The Pipeline Execution Statistics pipeline execution statistics is an information on the pipeline status when executed. As a pipeline executes, the statistics are updated periodically so that you can monitor its progress. For more details, see Pipeline Execution Statistics.
...
Execution start and end timestamps: These timestamps give the duration of the pipeline execution and assess its overall performance.
Data volume processed: This refers to the amount of data that is ingested, transformed, and processed by the pipeline during its execution.
Number of records processed: This refers to the count of individual data records that are ingested, transformed, and processed by the pipeline during its execution.
Success or failure status: A successful status indicates that the pipeline completed its data processing tasks without encountering errors or issues, while a failure status indicates that the pipeline encountered errors or exceptions during execution.
Any errors or warnings encountered: This refers to the issues or notifications that arise during the execution of the pipeline. These errors and warnings can include data validation failures, connectivity issues with endpoints, resource constraints, or any other issues that may might impact the successful processing of the data.
Resource usage statistics: This includes the following statistics:
CPU Utilityutility: The percentage of available CPU resources utilized by the pipeline during execution.
Memory Usageusage: The amount of memory (RAM) consumed by the pipeline to store data, code, and intermediate results.
Network Usageusage: The amount of network traffic generated by the pipeline, typically in bytes or megabytes, for data transfer and communication with external systems.
...
Stage | Control Plane | Snaplex nodes |
---|---|---|
Initialization |
|
|
Prepare |
|
|
Execution |
|
|
Completion |
|
|
...