Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Snap type:

Format


Description:

This Snap formats the incoming document from the upstream Snaps to the RC (Row columnar) file format used for storing data in an optimized way to answer aggregate queries faster.

  • Expected upstream Snaps: The upstream Snap should output table oriented data with columns and rows.
  • Expected downstream Snaps: The RC File Formatter Snap outputs binary data, so the downstream Snap must be a data output Snap (i.e. File Writer, HDFS Writer, etc.).


Prerequisites:

[None]


Support and limitations:Ultra pipelines: Supported Works in Ultra Task Pipelines.
  • Spark modeNot supported in Spark mode.
  • Account: 

    Accounts are not used with this Snap. 


    Views:


    InputThis Snap has at most one document input view.
    OutputThis Snap has at most one document output view.
    ErrorThis Snap has at most one document error view and produces zero or more documents in the view.



    Settings



    Label


    Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

    Hive Metastore URL



    Hive Metastore URI, such as: thrift://localhost:9083

    Example: thrift://hive.metastore.com:9083 u

    Default value: [None]


    Database


    Database which holds the schema for the outgoing RC File data.

    Example: hive_db
     

    Default value:  [None]


    Table


    Table whose schema should be used for parsing the outgoing RC file data.

    Example: hive_tbl
     

    Default value:  [None]


    Column paths


    Required. Paths where the column values appear in the document.

    Example:
        Column Name: Fun
        Column Path: $column_from_input_data
        Column Type: string
     

    Default value:  [None]


    Snap Execution


    Select one of the following three modes in which the Snap executes:

    • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

    • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

    • Disabled: Disables the Snap and all Snaps that are downstream from it.

    Default ValueExecute only
    Example: Validate & Execute

    Troubleshooting

    Insert excerpt
    Hadoop Directory Browser
    Hadoop Directory Browser
    nopaneltrue

    Insert excerpt
    Hadoop Snap Pack
    Hadoop Snap Pack
    nopaneltrue