Transform - Spark SQL 2.x

On this Page

Snap type:Transform
Description:

The Transform Snap transforms the input data by applying configured transformation rules.

Prerequisites:Must use on eXtremeplex.
Support and limitations:

This Snap is only available in eXtreme pipelines.

Account:

None

Views:
InputOne document
OutputOne document
ErrorNot supported

Settings

Label


Name for the Transform Snap.

Pass throughDetermines if data should be passed through or not. If unchecked, then only the data transformation results defined in the mapping section appears in the output document, and the input data will be discarded. If checked, then all of the original input data is passed into the output document along with the data transformation results.
Transformations
Input SchemaThe schema of the incoming data.
Expression

You can enter a function to transform the data (for example, combine, concat, flatten)

See Spark Expression Language for more information.

Target path

Transformed data is written out to the configured target path.

Enter the folder within the target JSON path where the value from the expression will be written.

Example: $person.firstname to write the field firstname of the person object.

Target Schema

The schema of the target destination.

Input preview dataPreview of input data before data transformation
Output preview dataPreview of output data after data transformation


Examples


Snap Settings

Pipeline

One of the output partition results shows 7 output columns:

367,Parker,97,367,367,48.0,0-30-2014
367,Parker,97,367,367,48.0,0-30-2014
367,Parker,97,367,367,48.0,0-30-2014
367,Parker,97,367,367,48.0,0-30-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014
367,Parker,97,367,367,48.0,0-1-2014

Parameter Configuration

The following screenshots display the Transform Snap expression and settings configuration:

See Also

Snap Pack History

 Click to view/expand

4.25 (main9554)

  • Introduced the SCD2 - Spark SQL 2.x Snap to support Type 2 Slowly Changing Dimensions (SCD2) updates to the target databases in the eXtreme mode.
  • Upgraded the Spark SQL 2.x Snap Pack to support Spark 3.0.1 on the following cloud platform versions:
    • Amazon EMR 6.2.0 (Hadoop distribution: Amazon)
    • Azure Databricks 7.5

4.24 (424patches8724)

  • Fixes the issue where the eXtremeplex is unable to read Parquet files written from a Groundplex (and hence displays base64 enabled in all the output columns upon validation), by changing the data encoding from Base64-encoded to Plain text format. This issue does not occur during Pipeline execution.

4.24 (main8556)

4.23 (main7430)

  • Accounts support validation. Thus, you can click Validate in the account settings dialog to validate that your account is configured correctly. 
  • Enhances multiple Snaps to support Snap suggestions for the file or directory path. You can click  to retrieve a list of available file names, based on your account configuration. The following Snaps have the new suggest functionality:

4.22 (422patches6845)

  • Fixes an issue in the Parquet Formatter Snap where the partitioned sub-folders are not organized in the order of the keys in the Partition by field.

4.22 (main6403)

4.21 (421patches5851)

  • Optimizes Spark engine execution on AWS EMR, requiring lesser compute resources.

4.21 (snapsmrc542)

  • Enhanced the Snap Pack to support Java Database Connectivity (JDBC). This enhancement adds the following Snaps and account type:
    • JDBC InsertInserts data into a target table through a JDBC connection. 
    • JDBC SelectFetches data from a target table through a JDBC connection. 
    • JDBC Storage Account: Enables you to connect to databases that support JDBC.

4.20 (snapsmrc535)

  • Introduced the Sample Snap, which enables you to generate a sample dataset from the main dataset. You can use the sample dataset to test Pipelines, thereby saving resources while designing Pipelines.
  • Introduced a new account type, Amazon Web Services (AWS) Account, to support object encryption using AWS Key Management Service (KMS). This enhancement makes account configuration mandatory for the Spark 2.x File Writer and File Reader Snaps. 

4.19 (snapsmrc528)

  • No updates made.

4.18 (snapsmrc523)

  • No updates made.

4.17 Patch ALL7402

  • Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17 (snapsmrc515)

  • No updates made. Automatic rebuild with a platform release.

4.16 (snapsmrc508)

  • No updates made. Automatic rebuild with a platform release.

4.15 (snapsmrc500)

Added the following six new Snaps:

  • Avro Formatter
  • Avro Parser
  • Catalog Reader
  • Catalog Writer
  • JSON Formatter
  • JSON Parser

Also added is the support to ingest the Schema in Spark SQL 2.x CSV and JSON parser Snaps via Inferred Schema (automatically) or from a Hive Metastore (selected by the user). 

4.14 Patch sparksql2x5801

Fixed an issue wherein the Spark SQL 2.x Snap documentation did not open.

4.14 MULTIPLE5756 (Stable)

The Spark SQL 2.x Snap Pack updates deploy these Snaps: Aggregate, Cache, Copy, CSV Formatter, CSV Parser, Diff, Execute, Filter, File Reader, File Writer, Intersect, Join, Limit, LineReader, ORC Formatter, ORC Parser, Parquet Formatter, Parquet Parser, Pivot, Repartition, Router, Sort, Transform, Union, Unique.