File Reader - Spark SQL 2.x

On this Page

Snap type:Read
Description:

The File Reader Snap reads data from a data store in a configured account and produces a binary data stream in the output.

  • Expected upstream Snaps: None
  • Expected downstream Snaps: Parser Snaps or other Snaps that accept Binary input
Prerequisites:Must use on eXtremeplex.
Support and limitations:

This Snap is only available in eXtreme pipelines.

Account:

See Configuring the Spark SQL 2.x Account to configure the File Reader Snap account.

Views:
InputNone
OutputOne binary output
ErrorNot supported

Settings

Label

The name for the File Reader Snap.

File/Directory path

The path to the file associated with your account from to read data. The Snap supports the following protocols:

  • wasb:///<container>/<path>
  • abfs:///<container>/<path>
  • wasb://<container>@<accountname>.<endpoint>/<path>
  • abfs://<container>@<accountname>.<endpoint>/<path>

This is a suggestible field. Click  to view directory path suggestions. The list varies depending upon your account configuration.

Default value: /data

Examplewasb:///sl-bigdata/data/csv/date_dim.dat

  • To use the abfss:// or wasbs:// protocol, you must enable the Secure transfer required property while configuring your ADLS Gen2 Storage/WASB account.
  • Specify a direct folder in the File Reader directory/path field. To read data from multiple nested folders, use  /*/* to specify the path. For example, s3://extreme/data/nested/*/*.

The Spark SQL functions do not work for the File Reader Snap, even though you can toggle  the expression field (file/directory path).

Examples



See Also

Snap Pack History

 Click to view/expand

4.25 (main9554)

  • Introduced the SCD2 - Spark SQL 2.x Snap to support Type 2 Slowly Changing Dimensions (SCD2) updates to the target databases in the eXtreme mode.
  • Upgraded the Spark SQL 2.x Snap Pack to support Spark 3.0.1 on the following cloud platform versions:
    • Amazon EMR 6.2.0 (Hadoop distribution: Amazon)
    • Azure Databricks 7.5

4.24 (424patches8724)

  • Fixes the issue where the eXtremeplex is unable to read Parquet files written from a Groundplex (and hence displays base64 enabled in all the output columns upon validation), by changing the data encoding from Base64-encoded to Plain text format. This issue does not occur during Pipeline execution.

4.24 (main8556)

4.23 (main7430)

  • Accounts support validation. Thus, you can click Validate in the account settings dialog to validate that your account is configured correctly. 
  • Enhances multiple Snaps to support Snap suggestions for the file or directory path. You can click  to retrieve a list of available file names, based on your account configuration. The following Snaps have the new suggest functionality:

4.22 (422patches6845)

  • Fixes an issue in the Parquet Formatter Snap where the partitioned sub-folders are not organized in the order of the keys in the Partition by field.

4.22 (main6403)

4.21 (421patches5851)

  • Optimizes Spark engine execution on AWS EMR, requiring lesser compute resources.

4.21 (snapsmrc542)

  • Enhanced the Snap Pack to support Java Database Connectivity (JDBC). This enhancement adds the following Snaps and account type:
    • JDBC InsertInserts data into a target table through a JDBC connection. 
    • JDBC SelectFetches data from a target table through a JDBC connection. 
    • JDBC Storage Account: Enables you to connect to databases that support JDBC.

4.20 (snapsmrc535)

  • Introduced the Sample Snap, which enables you to generate a sample dataset from the main dataset. You can use the sample dataset to test Pipelines, thereby saving resources while designing Pipelines.
  • Introduced a new account type, Amazon Web Services (AWS) Account, to support object encryption using AWS Key Management Service (KMS). This enhancement makes account configuration mandatory for the Spark 2.x File Writer and File Reader Snaps. 

4.19 (snapsmrc528)

  • No updates made.

4.18 (snapsmrc523)

  • No updates made.

4.17 Patch ALL7402

  • Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17 (snapsmrc515)

  • No updates made. Automatic rebuild with a platform release.

4.16 (snapsmrc508)

  • No updates made. Automatic rebuild with a platform release.

4.15 (snapsmrc500)

Added the following six new Snaps:

  • Avro Formatter
  • Avro Parser
  • Catalog Reader
  • Catalog Writer
  • JSON Formatter
  • JSON Parser

Also added is the support to ingest the Schema in Spark SQL 2.x CSV and JSON parser Snaps via Inferred Schema (automatically) or from a Hive Metastore (selected by the user). 

4.14 Patch sparksql2x5801

Fixed an issue wherein the Spark SQL 2.x Snap documentation did not open.

4.14 MULTIPLE5756 (Stable)

The Spark SQL 2.x Snap Pack updates deploy these Snaps: Aggregate, Cache, Copy, CSV Formatter, CSV Parser, Diff, Execute, Filter, File Reader, File Writer, Intersect, Join, Limit, LineReader, ORC Formatter, ORC Parser, Parquet Formatter, Parquet Parser, Pivot, Repartition, Router, Sort, Transform, Union, Unique.