Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this page

...

Snap type:

Read

Description:

This Snap reads ORC files from SLDB, HDFS, S3, and WASB, and converts the data into documents.

  • Expected upstream Snaps: [None]
  • Expected downstream Snaps: Any data transformation or formatting Snaps, such as Mapper or JSON Formatter Snaps.
  • Expected input: ORC files from SLDB, HDFS, S3, and WASB.
  • Expected output: A document with the columns and data of the Parquet file.
Note

This Snap supports both HDFS (non-Kerberos) and ABFS (Azure Data Lake Storage Gen 2 ), WASB(Azure storage), and S3 protocols.


Prerequisites:

None

Support and limitations:
Known Issue:

The upgrade of Azure Storage library from v3.0.0 to v8.3.0 has caused the following issue when using the WASB protocol:
When you use invalid credentials for the WASB protocol in Hadoop Snaps (HDFS Reader, HDFS Writer, ORC Reader, Parquet Reader, Parquet Writer), the pipeline does not fail immediately, instead it takes 13-14 minutes to display the following error:

reason=The request failed with error code null and HTTP code 0. , status_code=error

SnapLogic® is actively working with Microsoft®Support to resolve the issue.

Account: 

Depending on the source of the data you want to read, you will need to provide valid account information for either AWS S3 Account or Azure Storage Account.

Views:


InputThis Snap has at most one document input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.


Settings

Label


Required. The name of the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory


The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

Basic directory URI structure

  • HDFS: hdfs://<hostname>:<port>/
  • S3: s3:///<S3_bucket_name>/<file_path>
  • WASB: wasb:///<WASB_directory>/<file_name>
  • ABFS:
    • abfs:///<filesystem>/<path>/
    • abfs://<filesystem>@<accountname>.<endpoint>/<path>
  • ABFSS
    • abfss:///<filesystem>/<path>/
    • abfss://<filesystem>@<accountname>.<endpoint>/<path>

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

Note

With the ABFS protocol, SnapLogic creates a temporary file to store the incoming data. Therefore, the hard drive where the JCC is running should have enough space to temporarily store all the account data coming in from ABFS.

The Directory property is not used in the pipeline execution or preview, and is used only in the Suggest operation. When you press the Suggest icon, the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

Example:

  • wasb:///snaplogic/srikanth_test123/RedWoodcity

Default valuehdfs://<hostname>:<port>/

Filter

Insert excerpt
HDFS Writer
HDFS Writer
nopaneltrue

File

Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

Example: 

  • sample.orc
  • tmp/another.orc
  • _filename

Default value:  [None]

Snap Execution

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default ValueExecute only


Example: Validate & Execute

...

Insert excerpt
Hadoop Directory Browser
Hadoop Directory Browser
nopaneltrue

Multiexcerpt include macro
nameTemporary Files
pageJoin

See Also

...