Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Pagepage

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Snap type:

Read

Description:

This Snap reads ORC files from SLDB, HDFS, S3, and WASB, and converts the data into documents.

  • Expected upstream Snaps: [None]
  • Expected downstream Snaps: Any data transformation or formatting Snaps, such as Mapper or JSON Formatter Snaps.
  • Expected input: ORC files from SLDB, HDFS, S3, and WASB.
  • Expected output: A document with the columns and data of the Parquet file.
Note

This Snap supports both HDFS (non-Kerberos) and ABFS (Azure Data Lake Storage Gen 2 ), WASB(Azure storage), and S3 protocols.


Prerequisites:

None

Support and limitations:The Snap works
  • Works with SLDB, HDFS, S3, and WASB.
Ultra pipelines: 
  • Spark: Not supported in Spark pipelines
  • Account: 

    Depending on the source of the data you want to read, you will need to provide valid account information for either AWS S3 Account or Azure Storage Account.

    Views:


    InputThis Snap has at most one document input view.
    OutputThis Snap has exactly one document output view.
    ErrorThis Snap has at most one document error view and produces zero or more documents in the view.


    Settings

    Label


    Required. The name of the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

    Directory


    The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

    Basic directory URI structure

    • HDFS: hdfs://<hostname>:<port>/
    • S3: s3:///<S3_bucket_name>/<file_path>
    • WASB: wasb:///<WASB_directory>/<file_name>
    • ABFS:
      • abfs:///<filesystem>/<path>/
      • abfs://<filesystem>@<accountname>.<endpoint>/<path>
    • ABFSS
      • abfss:///<filesystem>/<path>/
      • abfss://<filesystem>@<accountname>.<endpoint>/<path>

    When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

    Note

    With the ABFS protocol, SnapLogic creates a temporary file to store the incoming data. Therefore, the hard drive where the JCC is running should have enough space to temporarily store all the account data coming in from ABFS.

    The Directory property is not used in the pipeline execution or preview, and is used only in the Suggest operation. When you press the Suggest icon, the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

    Example:

    • wasb:///snaplogic/srikanth_test123/RedWoodcity

    Default valuehdfs://<hostname>:<port>/

    Filter

    Insert excerpt
    HDFS Writer
    HDFS Writer
    nopaneltrue

    File

    Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

    Example: 

    • sample.orc
    • tmp/another.orc
    • _filename

    Default value:  [None]

    Multiexcerpt include macro
    nameSnap Execution
    pageAnaplan Read

    Multiexcerpt include macro
    nameSnap_Execution_Introduced
    pageAnaplan Read

    Examples

    ...

    Expand
    titleReading from a Local Instance of HDFS

    Reading from a Local Instance of HDFS

    You can configure the ORC Reader Snap to read from a specific directory in a local HDFS instance. In the example below, it reads from the file.orc file in the /tmp directory.


    ...

    Expand
    titleReading from a Local Instance of S3

    Reading from a Local Instance of S3

    You can configure the ORC Reader Snap to read from a specific directory in a local S3 instance. In the example below, it reads from the /<file-path>/file.orc file.


    Troubleshooting

    Insert excerpt
    Hadoop Directory Browser
    Hadoop Directory Browser
    nopaneltrue

    Related Information

    ...

    Multiexcerpt include macro
    nameTemporary Files
    pageJoin

    See Also

    Insert excerpt
    Hadoop Snap Pack
    Hadoop Snap Pack
    nopaneltrue

    ...