On this page

Snap type:

Read

Description:

This Snap reads ORC files from SLDB, HDFS, S3, and WASB, and converts the data into documents.

  • Expected upstream Snaps: [None]
  • Expected downstream Snaps: Any data transformation or formatting Snaps, such as Mapper or JSON Formatter Snaps.
  • Expected input: ORC files from SLDB, HDFS, S3, and WASB.
  • Expected output: A document with the columns and data of the Parquet file.

This Snap supports both HDFS (non-Kerberos) and ABFS (Azure Data Lake Storage Gen 2 ), WASB(Azure storage), and S3 protocols.


Prerequisites:

None

Support and limitations:
Account: 

Depending on the source of the data you want to read, you will need to provide valid account information for either AWS S3 Account or Azure Storage Account.

Views:


InputThis Snap has at most one document input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.


Settings

Label


Required. The name of the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory


The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

Basic directory URI structure

  • HDFS: hdfs://<hostname>:<port>/
  • S3: s3:///<S3_bucket_name>/<file_path>
  • WASB: wasb:///<WASB_directory>/<file_name>
  • ABFS:
    • abfs:///<filesystem>/<path>/
    • abfs://<filesystem>@<accountname>.<endpoint>/<path>
  • ABFSS
    • abfss:///<filesystem>/<path>/
    • abfss://<filesystem>@<accountname>.<endpoint>/<path>

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

With the ABFS protocol, SnapLogic creates a temporary file to store the incoming data. Therefore, the hard drive where the JCC is running should have enough space to temporarily store all the account data coming in from ABFS.

The Directory property is not used in the pipeline execution or preview, and is used only in the Suggest operation. When you press the Suggest icon, the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

Example:

  • wasb:///snaplogic/srikanth_test123/RedWoodcity

Default valuehdfs://<hostname>:<port>/

Filter

File

Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

Example: 

  • sample.orc
  • tmp/another.orc
  • _filename

Default value:  [None]

Examples


Reading from a Local Instance of HDFS

You can configure the ORC Reader Snap to read from a specific directory in a local HDFS instance. In the example below, it reads from the file.orc file in the /tmp directory.



Reading from a Local Instance of S3

You can configure the ORC Reader Snap to read from a specific directory in a local S3 instance. In the example below, it reads from the /<file-path>/file.orc file.


Troubleshooting

See Also