Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Pipeline execution flow is a systematic, step-by-step process to facilitate the seamless movement of data from its initiation to completion. This orchestrated flow is designed to ensure the efficient processing of data across diverse business systems. Within this process, the control plane manages the integration tasks by strategically organizing components across the Snaplex infrastructure. Simultaneously, the Snaplex nodes execute the assigned integration tasks under the guidance of the control plane. The pipeline can be executed either by Triggered, ultraUltra, or scheduled tasksScheduled Tasks. Regardless of the task type, the pipeline goes through distinct states during its lifecycle.

...

The pipeline execution architecture involves a user interface for pipeline design, a control plane orchestrating tasks and managing the overall workflow, and a distributed Snaplex environment where Snaplex nodes execute integration tasks in parallel. The metadata store Metadata in the control plane stores essential pipeline information, and the Snap store provides a catalog of pre-built components. Throughout the process, the collaboration between these components ensures efficient task execution, error handling, and communication for seamless data integration.

...

Steps in the pipeline execution flow

execution flow.pngImage Removedexecution flow.pngImage Added

Initialization

The initialization phase in pipeline execution involves the preparatory steps required to set up and begin the pipeline execution. Also called a NoUpdate state, this state sets the groundwork for the subsequent stages in the pipeline execution flow, allowing the system to establish the context and allocate resources effectively for the upcoming pipeline execution.

...

  • Request processing: The control plane receives the pipeline execution request, which can originate from various sources:

    • Scheduled task

    • External triggers like API calls and event notifications

    • Manual initiation by a user from the Designer or Manager

  • Leader node decision: In this stage, the control plane assesses the status and resource capacity of the Snaplex nodes, prioritizes nodes based on workload, processing power, and memory, selects the most appropriate node, and prepares it for pipeline execution.

This state is only relevant if the pipeline is executed on the leader node.

Prepare

During the Prepare stage, several important tasks are carried out to set the groundwork for the execution of the pipeline. It involves communication between the control plane and the data plane (Snaplex). This state ensures that all necessary components are in place for pipeline execution and proactively identifies and addresses potential configuration issues. The Prepare stage involves the following key activities:

...

  • Snap execution: The individual Snaps are activated and begin processing data. They perform the designated task according to the directives provided by the control plane.

  • Endpoint interactions: The pipeline establishes connections to any required external endpoints (for example, databases, applications, and cloud services) using the specified protocols. This enables the pipeline to process data from these systems.

  • Data flow orchestration: The pipeline coordinates the flow of data between Snaps and endpoints, ensuring that data moves through the pipeline in the correct sequence and format.

  • Resource management: The Snaplex node dynamically manages resources (memory, CPU, network) during execution to ensure optimal performance and prevent bottlenecks. The pipeline collects execution metrics, such as processing time, data volume, and error rates.

  • Pipeline execution statistics: The Pipeline Execution Statistics pipeline execution statistics is an information on the pipeline status when executed. As a pipeline executes, the statistics are updated periodically so that you can monitor its progress. For more details, see Pipeline Execution Statistics.

...

  • Execution start and end timestamps: These timestamps give the duration of the pipeline execution and assess its overall performance.

  • Data volume processed: This refers to the amount of data that is ingested, transformed, and processed by the pipeline during its execution.

  • Number of records processed: This refers to the count of individual data records that are ingested, transformed, and processed by the pipeline during its execution.

  • Success or failure status: A successful status indicates that the pipeline completed its data processing tasks without encountering errors or issues, while a failure status indicates that the pipeline encountered errors or exceptions during execution.

  • Any errors or warnings encountered: This refers to the issues or notifications that arise during the execution of the pipeline. These errors and warnings can include data validation failures, connectivity issues with endpoints, resource constraints, or any other issues that may might impact the successful processing of the data.

  • Resource usage statistics: This includes the following statistics:

    • CPU Utilityutility: The percentage of available CPU resources utilized by the pipeline during execution.

    • Memory Usageusage: The amount of memory (RAM) consumed by the pipeline to store data, code, and intermediate results.

    • Network Usageusage: The amount of network traffic generated by the pipeline, typically in bytes or megabytes, for data transfer and communication with external systems.

...

Stage

Control Plane

Snaplex nodes

Initialization

  • Receives pipeline execution request from the external trigger.

  • Assigns specific tasks within a pipeline to Snaplex nodes based on their availability and capabilities.

  • Initiates the process by identifying the relevant pipeline based on the execution request.

  • Recieves pipeline execution request from control plane.

  • Allocates resources like memory and processing poweto power to perform the data integration tasks.

  • Sets up its execution environment based on the pipeline's requirements.

Prepare

  • Accesses the metadata store to retrieve the required information about the designated pipeline.

  • Performs validation and authorization checks to ensure that the pipeline execution adheres to defined security and access controls.

  • Establishes connections to relevant data sources and targets. This includes connecting to databases, APIs and file systems in the data integration process.

  • Performs pre-execution checks to validate that the environment is ready and any prerequisites are met.

  • Communicates its readiness and initialization status back to the control plane.

Execution

  • Loads the metadata into its runtime environment.

  • Monitors the progress of pipeline execution.

  • Captures relevant metrics for auditing, troubleshooting, and performance analysis.

  • Based on the real-time communication from the snaplex nodes, control plane manges errors either by predefined error-handling logic or stopping the entire pipeline if needed.

  • Executes the specific tasks assigned to them.

  • Executes tasks in parallel to optimize performance and handle large volumes of data efficiently.

  • Handles errors locally and try to resolve them before communicating them to the control plane.

Completion

  • Logs error messages and timestamps.

  • Consolidates output results of each Snap and transformed data generated by Snaplex nodes.

  • Performs post-execution cleanup activities like releasing resources and closing connections.

  • Provides a final status report with the status of the pipeline execution and relevant performance metrics.

  • Releases allocated compute resources, closes open connections, and cleans up temporary files.

  • Sends comprehensive metrics to the control plane for analysis and optimization.

  • Archives detailed execution logs and troubleshooting for future reference.

...