RC File Parser
Kalpana Malladi
Aparna Tayi (Unlicensed)
Lakshmi Manda (Deactivated)
On this Page
Snap type: | Parse | |||||||
---|---|---|---|---|---|---|---|---|
Description: | This Snap parses RC file data and converts them into documents that can be processed by downstream Snaps.
| |||||||
Prerequisites: | [None] | |||||||
Support and limitations: | Works in Ultra Tasks. | |||||||
Account: | Accounts are not used with this Snap. | |||||||
Views: |
| |||||||
Settings | ||||||||
Label | Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | |||||||
Hive Metastore URL | Hive Metastore URI, such as: thrift://localhost:9083 Example: thrift://hive.metastore.com:9083 u Default value: [None] | |||||||
Database | Database which holds the schema for the incoming RC File data. Example: hive_db Default value: [None] | |||||||
Table | Table whose schema should be used for parsing the incoming RC file data Example: hive_tbl Default value: [None] | |||||||
Column definition | Manually configure the column definition for the incoming RC File data. Example: Column Name: Fun column 1 Column Type: string Default value: [None] | |||||||
Snap Execution | Select one of the following three modes in which the Snap executes:
Default Value: Execute only |
Troubleshooting
Writing to S3 files with HDFS version CDH 5.8 or later
When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:
- Go to HDFS configuration.
- In Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml, add an entry with the following details:
- Name: fs.s3a.threads.max
- Value: 15
- Click Save.
- Restart all the nodes.
- Under Restart Stale Services, select Re-deploy client configuration.
- Click Restart Now.
Release | Snap Pack Version | Date | Type | Updates |
---|---|---|---|---|
February 2025 | main29887 | Stable | Updated and certified against the current SnapLogic Platform release. | |
November 2024 | 439patches29616 | Latest | Fixed an issue with the Parquet Writer Snap, where string-formatted timestamps were stored and retrieved as invalid data because of improper handling. Now, the Snap properly handles the string-formatted timestamps through the Timestamp parquet type option. The Timestamp parquet type dropdown option enables you to choose the appropriate Parquet type for your timestamp schema based on the format of the timestamp data. | |
November 2024 | main29029 | Stable | Updated and certified against the current SnapLogic Platform release. | |
August 2024 | main27765 | | Stable | Upgraded the |
May 2024 | 437patches27226 | - | The upgrade of the Azure Storage library from v3.0.0 to v8.3.0 has impacted the Hadoop Snap Pack causing the following issue when using the WASB protocol. Known Issue When you use invalid credentials for the WASB protocol in Hadoop Snaps (HDFS Reader, HDFS Writer, ORC Reader, Parquet Reader, Parquet Writer), the pipeline does not fail immediately, instead it takes 13-14 minutes to display the following error: reason=The request failed with error code null and HTTP code 0. , status_code=error SnapLogic® is actively working with Microsoft®Support to resolve the issue. Learn more about the Azure Storage Library Upgrade. | |
May 2024 | 437patches27471 | Latest | Fixed a resource leak issue with the following Hadoop Snaps, which involved too many stale instances of | |
May 2024 | 437patches26370 | Latest | Enhanced the HDFS Writer Snap with the Write empty file checkbox to enable you to write an empty or a 0-byte file to all the supported protocols that are recognized and compatible with the target system or destination. | |
May 2024 | main26341 | Stable | The Azure Data Lake Account has been removed from the Hadoop Snap Pack because Microsoft retired the Azure Data Lake Storage Gen1 protocol on February 29, 2024. We recommend replacing your existing Azure Data Lake Accounts (in Binary or Hadoop Snap Packs) with other Azure Accounts. | |
February 2024 | 436patches25902 | Latest | Fixed a memory management issue in the HDFS Writer, HDFS ZipFile Writer, ORC Writer, and Parquet Writer Snaps, which previously caused out-of-memory errors when multiple Snaps were used in the pipeline. The Snap now conducts a pre-allocation memory check, dynamically adjusting the write buffer size based on available memory resources when writing to ADLS. | |
February 2024 | 435patches25410 | Latest | Enhanced the AWS S3 Account for Hadoop with an External ID that enables you to access Hadoop resources securely. | |
February 2024 | main25112 | Stable | Updated and certified against the current SnapLogic Platform release. | |
November 2023 | 435patches23904 | Latest |
| |
November 2023 | 435patches23780 | Latest | Fixed an issue related to error routing to the output view. Also fixed a connection timeout issue. | |
November 2023 | main23721 | Stable | Updated and certified against the current SnapLogic Platform release. | |
August 2023 | 434patches23173 | Latest | Enhanced the Parquet Writer Snap with a Decimal Rounding Mode dropdown list to enable the rounding method for decimal values when the number exceeds the required decimal places. | |
August 2023 | 434patches22662 | Latest |
Behavior change: When you select the Use datetime types checkbox in the Parquet Reader Snap, the Snap displays the LocalDate and DateTime in the output for INT32 (DATE) and INT64 (TIMESTAMP_MILLIS) columns. When you deselect this checkbox, the columns retain the previous datatypes and display | |
August 2023 | main22460 |
| Stable | Updated and certified against the current SnapLogic Platform release. |
May 2023 | 433patches22180 | Latest | Introduced the HDFS Delete, which deletes the specified file, group of files, or directory from the supplied path and protocol in the Hadoop Distributed File System (HDFS). | |
May 2023 | 433patches21494 | Latest | The Hadoop Directory Browser Snap now returns all the output documents as expected after implementing pagination for the ABFS protocol. | |
May 2023 | main21015 | Stable | Upgraded with the latest SnapLogic Platform release. | |
February 2023 | 432patches20820 | Latest | Fixed an authorization issue that occurs with the Parquet Writer Snap when it receives empty document input. | |
February 2023 | 432patches20209 | Latest | The Apache Commons Compress library has been upgraded to version 1.22. | |
February 2023 | 432patches20139 | Latest | The Kerberos Account that is available for a subset of snaps in the Hadoop Snap pack now supports a configuration that enables you to read from and write to the Hadoop Distributed File System (HDFS) managed by multiple Hadoop clusters. You can specify the location of the Hadoop configuration files in the Hadoop config directory field. The value in this field overrides the value that is set on the Snaplex system property used for configuring a single cluster. | |
February 2023 | main19844 | Stable | Upgraded with the latest SnapLogic Platform release. | |
November 2022 | main18944 | Stable | The AWS S3 and S3 Dynamic accounts now support a maximum session duration of an IAM role defined in AWS. | |
August 2022 | main17386 | Stable | Extended the AWS S3 Dynamic Account support to ORC Reader and ORC Writer Snaps to support AWS Security Token Service (STS) using temporary credentials. | |
4.29 Patch | 429patches16630 | Latest |
| |
4.29 | main15993 | Stable | Enhanced the AWS S3 Account for Hadoop account to include the S3 Region field that allows cross-region or proxied cross-region access to S3 buckets in the Parquet Reader and Parquet Writer Snaps. | |
4.28 Patch | 428patches15216 | Latest | Added the AWS S3 Dynamic account for Parquet Reader and Parquet Writer Snaps. | |
4.28 | main14627 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.27 Patch | 427patches13769 | Latest | Fixed an issue with the Hadoop Directory Browser Snap where the Snap was not listing the files in the given directory for Windows VM. | |
4.27 Patch | 427patches12999 | Latest | Enhanced the Parquet Reader Snap with int96 As Timestamp checkbox, which when selected enables the Date Time Format field. You can use this field to specify a date-time format of your choice for int96 data-type fields. The int96 As Timestamp checkbox is available only when you deselect Use old data format checkbox. | |
4.27 | main12833 |
| Stable | Enhanced the Parquet Writer and Parquet Reader Snaps with Azure SAS URI properties, and Azure Storage Account for Hadoop with SAS URI Auth Type. This enables the Snaps to consider SAS URI given in the settings if the SAS URI is selected in the Auth Type during account configuration. |
4.26 | 426patches12288 | Latest | Fixed a memory leak issue when using HDFS protocol in Hadoop Snaps. | |
4.26 | main11181 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.25 Patch | 425patches9975 | Latest | Fixed the dependency issue in Hadoop Parquet Reader Snap while reading from AWS S3. The issue is caused due to conflicting definitions for some of the AWS classes (dependencies) in the classpath. | |
4.25 | main9554 | Stable |
| |
4.24 Patch | 424patches9262 | Latest | Enhanced the AWS S3 Account for Hadoop to support role-based access when you select IAM role checkbox. | |
4.24 Patch | 424patches8876 |
| Latest | Fixes the missing library error in Hadoop Snap Pack when running Hadoop Pipelines in JDK11 runtime. |
4.24 | main8556 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.23 Patch | 423patches7440 | Latest | Fixes the issue in HDFS Reader Snap by supporting to read and write files larger than 2GB using ABFS(S) protocol. | |
4.23 | main7430 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.22 | main6403 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.21 Patch | hadoop8853 | Latest | Updates the Parquet Writer and Parquet Reader Snaps to support the yyyy-MM-dd format for the DATE logical type. | |
4.21 | snapsmrc542 |
| Stable | Upgraded with the latest SnapLogic Platform release. |
4.20 Patch | hadoop8776 | Latest | Updates the Hadoop Snap Pack to use the latest version of org.xerial.snappy:snappy-java for compression type Snappy, in order to resolve the java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I error. | |
4.20 | snapsmrc535 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.19 Patch | hadoop8270 | Latest | Fixes an issue with the Hadoop Parquet Writer Snap wherein the Snap throws an exception when the input document includes one or all of the following:
| |
4.19 | snaprsmrc528 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.18 Patch | hadoop8033 | Latest | Fixed an issue with the Parquet Writer Snap wherein the Snap throws an error when working with WASB protocol. | |
4.18 | snapsmrc523 | Stable |
| |
4.17 | ALL7402 | Latest | Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers. | |
4.17 | snapsmrc515 | Latest | Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box. | |
4.16 | snapsmrc508 | Stable | Added a new property, Output for each file written, to handle multiple binary input data in the HDFS Writer Snap. | |
4.15 | snapsmrc500 | Stable |
| |
4.14 Patch | hadoop5888 | Latest | Fixed an issue wherein the Hadoop snaps were throwing an exception when a Kerberized account is provided, but the snap is run in a non-kerberized environment. | |
4.14 | snapsmrc490 | Stable |
| |
4.13 Patch | hadoop5318 | Latest |
| |
4.13 | snapsmrc486 | Stable |
| |
4.12 Patch | hadoop5132 | Latest | Fixed an issue with the HDFS Reader Snap wherein the pipeline becomes stale while writing to the output view. | |
4.12 | snapsmrc480 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.11 Patch | hadoop4275 | Latest | Addressed an issue with Parquet Reader Snap leaking file descriptors (connections to HDFS data nodes). The Open File descriptor values work stable now, | |
4.11 | snapsmrc465 | Stable | Added Kerberos support to the standard mode Parquet Reader and Parquet Writer Snaps. | |
4.10 Patch | hadoop4001 | Latest | Supported HDFS Writer to write to the encryption zone. | |
4.10 Patch | hadoop3887 | Latest | Addressed the suggest issue for the HDFS Reader on Hadooplex. | |
4.10 Patch | hadoop3851 | Latest |
| |
4.10 Patch | hadoop3838 | Latest | Made HDFS Snaps work with Zone encrypted HDFS. | |
4.10 | snapsmrc414 | Stable |
| |
4.9 Patch | hadoop3339 | Latest | Addressed the following issues:
| |
4.9.0 Patch | hadoop3020 | Latest | Added missing dependency org.iq80.snappy:snappy to Hadoop Snap Pack. | |
4.9 | snapsmrc405 | Stable | Upgraded with the latest SnapLogic Platform release. | |
4.8 | snapsmrc398 | Stable | Snap-aware error handling policy enabled for Spark mode in Sequence Formatter and Sequence Parser. This ensures the error handling specified on the Snap is used. | |
4.7.0 Patch | hadoop2343 | Latest | Spark Validation: Resolved an issue with validation failing when setting the output file permissions. | |
4.7 | snapsmrc382 | Stable |
| |
4.6 | snapsmrc362 | Stable |
| |
4.5 | snapsmrc344 | Stable |
| |
4.4.1 | Stable |
| ||
4.4 | Stable |
| ||
4.3.2 | Stable |
|
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2025 SnapLogic, Inc.