/
RC File Parser

RC File Parser

On this Page

Snap type:

Parse


Description:

This Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.

  • Expected upstream Snaps: The upstream Snap should be a binary data source Snap sourcing an RC File from some data store.
  • Expected downstream Snaps: The RC File Parser outputs table data with columns and rows.  The downstream Snap should be able to parse this information.


Prerequisites:

[None]


Support and limitations:Works in Ultra Tasks.
Account: 

Accounts are not used with this Snap.


Views:
InputThis Snap has exactly one binary input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.

Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Hive Metastore URL


Hive Metastore URI, such as: thrift://localhost:9083

Example: thrift://hive.metastore.com:9083 u

Default value: [None]


Database


Database which holds the schema for the incoming RC File data.
 

Examplehive_db

Default value:  [None]


Table


Table whose schema should be used for parsing the incoming RC file data
 

Examplehive_tbl

Default value [None]


Column definition


Manually configure the column definition for the incoming RC File data.
 

Example

Column Name: Fun column 1 

Column Type: string

Default value [None]


Snap Execution

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default ValueExecute only
Example: Validate & Execute

  

Troubleshooting

Writing to S3 files with HDFS version CDH 5.8 or later

When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:

  1. Go to HDFS configuration.
  2. In Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml, add an entry with the following details:
    • Name: fs.s3a.threads.max
    • Value: 15
  3. Click Save.
  4. Restart all the nodes.
  5. Under Restart Stale Services, select Re-deploy client configuration.
  6. Click Restart Now.