eXtreme Execute Snap Pack

Overview

This Snap Pack enables eXtreme users to execute PySpark and Java Spark applications through the SnapLogic Platform. If users already have Spark applications to transform the data, they can now run these scripts through their pipelines.

The Spark driver submits these Snaps as a step on the EMR cluster. Accordingly, the step submission process requires additional permissions set on the EMR cluster via the following IAM roles:

  • AddJobFlowSteps
  • RunJobFlow
  • TerminateJobFlows
  • CancelSteps

For details, see Actions, Resources, and Condition Keys for Amazon Elastic MapReduce.

Known Issue

The Spark job continues being in the running state even after the SnapLogic Designer/Dashboard displays the Pipeline status as failed on AWS EMR. 



Snap Pack History

 Click to view/expand

4.27 (main12833)

  • No updates made.

4.26 (main11181)

  • No updates made.

4.25 (main9554)

  • No updates made.

4.24 (main8556)

4.23 (main7430)

  • Accounts support validation. Thus, you can click Validate in the account settings dialog to validate that your account is configured correctly. 

4.22 (main6403)

  • No updates made.

4.21 Patch 421patches5928

  • Adds Hierarchical Data Format v5 (HDF5) support in AWS EMR. With this enhancement, you can read HDF5 files and parse them into JSON files for further data processing. See Enabling HDF5 Support for details.
  • Adds support for Python virtual environment to the PySpark Script Snap to enable reading HDF5 files in the S3 bucket. You can specify the path for the virtual machine environment's ZIP file in this field.

4.21 Patch 421patches5851

  • Optimizes Spark engine execution on AWS EMR, requiring lesser compute resources.

4.21 (snapsmrc542)

  • No updates made.

4.20 (snapsmrc535)

  • Introduced a new account type, Azure Databricks Account. This enhancement makes account configuration mandatory for the PySpark Script and JAR Submit Snaps.
  • Enhanced the PySpark Script Snap to display the Pipeline Execution Statistics after a Pipeline with the Snap executes successfully.

4.19 (snapsmrc528)

  • No updates made.

4.18 (snapsmrc523)

  • No updates made.

4.17 Patch ALL7402

  • Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17 (snapsmrc515)

  • No updates made. Automatic rebuild with a platform release.

4.16 (snapsmrc508)

  • New Snap Pack. Execute Java Spark and PySpark applications through the SnapLogic platform. Snaps in this Snap Pack are:
    • JAR Submit: Upload your existing Spark Java JAR programs as eXtreme Pipelines.
    • PySpark Script: Upload your existing PySpark scripts as eXtreme Pipelines.