Fully managed cloud services are rapidly enabling global enterprises to focus on strategic differentiators vs. maintaining infrastructure, by creating data lakes and performing big data processing in the cloud. SnapLogic supports your move to a fully-managed data architecture. SnapLogic eXtreme enables an enterprise's citizen integrators and data integrators to efficiently support and augment data-integration use cases by performing complex transformations on large volumes of data.
SnapLogic eXtreme extends the accessible and easy-to-use SnapLogic Intelligent Integration Platform (IIP) to build and submit powerful Spark-based Pipelines to managed Big Data as a Service (BDaaS) providers, such as Amazon EMR. The SnapLogic visual programming interface eliminates the need for error-prone manual coding procedures, leading to quicker time-to-value without the traditional dependence on complex IT or data-science organizations. Unlike other data integration solutions that require integrators to have detailed knowledge on how to build and submit Spark jobs, SnapLogic eXtreme allows business users with domain expertise to perform complex processing and transformations on extremely large volumes of data within the enterprise’s existing big data infrastructure.
Spark 2.4.4 with Amazon EMR 5.29
Spark 2.4.5 with Azure Databricks 6.5.
Develop Spark Pipelines using visual design (zero code).
Fully managed and automated big data runtime environment with a serverless architecture.
Lifecycle management of cloud-based, transient, big data clusters in AWS/Azure Cloud environments.
Run complex and high-volume data transformation routines at elastic scale for terabytes or petabytes of data.
Perform read-write operations for cloud storage like S3 and Azure blob storage, as well as cloud data warehouse like Redshift, Snowflake, and Delta Lake.
Pushdown optimization support for Snowflake and Delta Lake.
Execute PySpark and Java Spark applications.
You can execute an eXtreme-mode Pipeline from a standard-mode Pipeline using the Pipeline Execute Snap. This Snap is equipped to call an eXtreme-mode Pipeline as a child Pipeline. See this example for more information.
Incorrect value of Total Nodes and Nodes Active displays in the eXtremeplex panel in Dashboard.
Refer the Requested Instances and Running Instances fields in eXtremeplex details for the correct information.
In eXtreme pipelines (unlike standard-mode Pipelines), the Snaps' color does not change during validation.
Wait for Pipeline validation to complete for the appropriate color to display on the Snaps.
Validation of Pipeline with a large dataset in a row (20MB and higher) fails withKryo Serialization failederror.
SnapLogic eXtreme is a self-service integration solution that runs on the cloud to facilitate creating fully-managed cloud-resident data lake on Amazon Elastic MapReduce (EMR), which uses S3 for storing data and EMR for data processing as well as Azure Databricks, which uses Delta Lake for storing data. Thus, SnapLogic eXtreme extends the data integration capability of SnapLogic Intelligent Integration Platform by enabling data processing at scale with Apache Spark capabilities.
Data ingestion is performed via standard-mode Pipelines using SnapLogic's extensive connectivity. At-scale processing utilizes the data engineering Snaps available via eXtreme and transforms the data on the underlying cloud storage. The first time an eXtreme Pipeline executes, an EMR cluster is spun up using the configuration specified in the eXtremeplex definition. After the EMR cluster starts, the Pipeline is submitted to the cluster for execution. Subsequently, the data stored on your cloud data warehouse is transformed based on the Pipeline specifics, and the results are written back to the storage in your chosen format. After a period of inactivity, the cluster terminates on behalf of the user to save operational expenses.