Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this page

...

Snap type:

Read

Description:

This Snap reads ORC files from SLDB, HDFS, S3, and WASB, and converts the data into documents.

  • Expected upstream Snaps: [None]
  • Expected downstream Snaps: Any data transformation or formatting Snaps, such as Mapper or JSON Formatter Snaps.
  • Expected input: ORC files from SLDB, HDFS, S3, and WASB.
  • Expected output: A document with the columns and data of the Parquet file.
Prerequisites:

None

Support and limitations:The Snap works
  • Works with SLDB, HDFS, S3, and WASB.
Ultra pipelines: Spark: Not supported in /wiki/spaces/SD/pages/1437917 pipelines
Account: 

Depending on the source of the data you want to read, you will need to provide valid account information for either AWS S3 Account or Azure Storage Account.

Views:


InputThis Snap has at most one document input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.


Settings

Label


Required. The name of the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory


The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

Basic directory URI structure

  • HDFS: hdfs://<hostname>:<port>/
  • S3: s3:///<S3_bucket_name>/<file_path>
  • WASB: wasb:///<WASB_directory>/<file_name>
  • ABFS:
    • abfs:///<filesystem>/<path>/
    • abfs://<filesystem>@<accountname>.<endpoint>/<path>
  • ABFSS
    • abfss:///<filesystem>/<path>/
    • abfss://<filesystem>@<accountname>.<endpoint>/<path>

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

Note

With the ABFS protocol, SnapLogic creates a temporary file to store the incoming data. Therefore, the hard drive where the JCC is running should have enough space to temporarily store all the account data coming in from ABFS.

The Directory property is not used in the pipeline execution or preview, and is used only in the Suggest operation. When you press the Suggest icon, the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

Example:

  • wasb:///snaplogic/srikanth_test123/RedWoodcity

Default valuehdfs://<hostname>:<port>/

Filter

Insert excerpt
HDFS Writer
HDFS Writer
nopaneltrue

File

Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

Example: 

  • sample.orc
  • tmp/another.orc
  • _filename

Default value:  [None]

Multiexcerpt include macro
nameSnap Execution
pageAnaplan Read

Multiexcerpt include macro
nameSnap_Execution_Introduced
pageAnaplan Read

...