Sequence Parser
- Kalpana Malladi
- Aparna Tayi (Unlicensed)
- Lakshmi Manda
On this Page
Snap type: | Parse | |||||||
---|---|---|---|---|---|---|---|---|
Description: | This Snap parses Hadoop sequence file data and converts them into documents that can be processed by downstream Snaps. | |||||||
Prerequisites: | [None] | |||||||
Support and limitations: | Works in Ultra Task Pipelines. | |||||||
Account: | Accounts are not used with this Snap. | |||||||
Views: |
| |||||||
Settings | ||||||||
Label | Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | |||||||
Key class | Key class used in the sequence file. | |||||||
Value class | Value class used in the sequence file. | |||||||
Snap Execution | Select one of the following three modes in which the Snap executes:
Default Value: Execute only |
Troubleshooting
When running HDFS version later than CDH 5.8, the Hadoop Snap Pack may fail to write to S3 files. To overcome this, make the following changes in the Cloudera manager:Writing to S3 files with HDFS version CDH 5.8 or later
August 2024 main27765 Stable Upgraded the The upgrade of the Azure Storage library from v3.0.0 to v8.3.0 has impacted the Hadoop Snap Pack causing the following issue when using the WASB protocol. Known Issue When you use invalid credentials for the WASB protocol in Hadoop Snaps (HDFS Reader, HDFS Writer, ORC Reader, Parquet Reader, Parquet Writer), the pipeline does not fail immediately, instead it takes 13-14 minutes to display the following error: reason=The request failed with error code null and HTTP code 0. , status_code=error SnapLogic® is actively working with Microsoft®Support to resolve the issue. Learn more about the Azure Storage Library Upgrade. Fixed a resource leak issue with the following Hadoop Snaps, which involved too many stale instances of Enhanced the HDFS Writer Snap with the Write empty file checkbox to enable you to write an empty or a 0-byte file to all the supported protocols that are recognized and compatible with the target system or destination. The Azure Data Lake Account has been removed from the Hadoop Snap Pack because Microsoft retired the Azure Data Lake Storage Gen1 protocol on February 29, 2024. We recommend replacing your existing Azure Data Lake Accounts (in Binary or Hadoop Snap Packs) with other Azure Accounts. 436patches25902 Fixed a memory management issue in the HDFS Writer, HDFS ZipFile Writer, ORC Writer, and Parquet Writer Snaps, which previously caused out-of-memory errors when multiple Snaps were used in the pipeline. The Snap now conducts a pre-allocation memory check, dynamically adjusting the write buffer size based on available memory resources when writing to ADLS. Enhanced the AWS S3 Account for Hadoop with an External ID that enables you to access Hadoop resources securely. Fixed an issue with the Parquet Writer Snap that displayed an error Fixed an issue with the Parquet Writer Snap that previously failed to handle the conversion of BigInt/int64 (larger numbers) after the 4.35 GA now converts them accurately. Fixed an issue related to error routing to the output view. Also fixed a connection timeout issue. Enhanced the Parquet Writer Snap with a Decimal Rounding Mode dropdown list to enable the rounding method for decimal values when the number exceeds the required decimal places. Enhanced the Parquet Writer Snap with the support for LocalDate and DateTime. The Snap now shows the schema suggestions for LocalDate and DateTime correctly. Enhanced the Parquet Reader Snap with the Use datetime types checkbox that supports LocalDate and DateTime datatypes. Behavior change: When you select the Use datetime types checkbox in the Parquet Reader Snap, the Snap displays the LocalDate and DateTime in the output for INT32 (DATE) and INT64 (TIMESTAMP_MILLIS) columns. When you deselect this checkbox, the columns retain the previous datatypes and display August 2023 main22460 Stable Updated and certified against the current SnapLogic Platform release. Introduced the HDFS Delete, which deletes the specified file, group of files, or directory from the supplied path and protocol in the Hadoop Distributed File System (HDFS). 432patches20820 Fixed an authorization issue that occurs with the Parquet Writer Snap when it receives empty document input. The Apache Commons Compress library has been upgraded to version 1.22. The Kerberos Account that is available for a subset of snaps in the Hadoop Snap pack now supports a configuration that enables you to read from and write to the Hadoop Distributed File System (HDFS) managed by multiple Hadoop clusters. You can specify the location of the Hadoop configuration files in the Hadoop config directory field. The value in this field overrides the value that is set on the Snaplex system property used for configuring a single cluster. February 2023 The AWS S3 and S3 Dynamic accounts now support a maximum session duration of an IAM role defined in AWS. Enhanced the AWS S3 Account for Hadoop account to include the S3 Region field that allows cross-region or proxied cross-region access to S3 buckets in the Parquet Reader and Parquet Writer Snaps. Fixed an issue with the Hadoop Directory Browser Snap where the Snap was not listing the files in the given directory for Windows VM. 4.27 main12833 Stable Enhanced the Parquet Writer and Parquet Reader Snaps with Azure SAS URI properties, and Azure Storage Account for Hadoop with SAS URI Auth Type. This enables the Snaps to consider SAS URI given in the settings if the SAS URI is selected in the Auth Type during account configuration. Fixed a memory leak issue when using HDFS protocol in Hadoop Snaps. Fixed the dependency issue in Hadoop Parquet Reader Snap while reading from AWS S3. The issue is caused due to conflicting definitions for some of the AWS classes (dependencies) in the classpath. Enhanced the AWS S3 Account for Hadoop to support role-based access when you select IAM role checkbox. Fixes the missing library error in Hadoop Snap Pack when running Hadoop Pipelines in JDK11 runtime. Fixes the issue in HDFS Reader Snap by supporting to read and write files larger than 2GB using ABFS(S) protocol. Updates the Parquet Writer and Parquet Reader Snaps to support the yyyy-MM-dd format for the DATE logical type. Updates the Hadoop Snap Pack to use the latest version of org.xerial.snappy:snappy-java for compression type Snappy, in order to resolve the java.lang.UnsatisfiedLinkError: org.xerial.snappy.SnappyNative.maxCompressedLength(I)I error. Fixes an issue with the Hadoop Parquet Writer Snap wherein the Snap throws an exception when the input document includes one or all of the following: Fixed an issue with the Parquet Writer Snap wherein the Snap throws an error when working with WASB protocol. Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers. Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box. Added a new property, Output for each file written, to handle multiple binary input data in the HDFS Writer Snap. Fixed an issue wherein the Hadoop snaps were throwing an exception when a Kerberized account is provided, but the snap is run in a non-kerberized environment. snapsmrc486 Fixed an issue with the HDFS Reader Snap wherein the pipeline becomes stale while writing to the output view. snapsmrc480 Addressed an issue with Parquet Reader Snap leaking file descriptors (connections to HDFS data nodes). The Open File descriptor values work stable now, Added Kerberos support to the standard mode Parquet Reader and Parquet Writer Snaps. Supported HDFS Writer to write to the encryption zone. Addressed the suggest issue for the HDFS Reader on Hadooplex. Made HDFS Snaps work with Zone encrypted HDFS. snapsmrc414 Addressed the following issues: Added missing dependency org.iq80.snappy:snappy to Hadoop Snap Pack. snapsmrc398 Snap-aware error handling policy enabled for Spark mode in Sequence Formatter and Sequence Parser. This ensures the error handling specified on the Snap is used. Spark Validation: Resolved an issue with validation failing when setting the output file permissions. snapsmrc382 snapsmrc344Release Snap Pack Version Date Type Updates org.json.json
library from v20090211 to v20240303, which is fully backward compatible.May 2024 437patches27226 - May 2024 437patches27471 Latest ProxyConnectionManager
and significantly impacted memory utilization.May 2024 437patches26370 Latest May 2024 main26341 Stable February 2024 Latest February 2024 435patches25410 Latest February 2024 main25112 Stable Updated and certified against the current SnapLogic Platform release. November 2023 435patches23904 Latest Failed to write parquet data
when the decimal value passed from the second input view exceeded the specified scale.November 2023 435patches23780 Latest November 2023 main23721 Stable Updated and certified against the current SnapLogic Platform release. August 2023 434patches23173 Latest August 2023 434patches22662 Latest string
and integer
values in the output.May 2023 433patches22180 Latest May 2023 433patches21494 Latest The Hadoop Directory Browser Snap now returns all the output documents as expected after implementing pagination for the ABFS protocol. May 2023 main21015 Stable Upgraded with the latest SnapLogic Platform release. February 2023 Latest February 2023 432patches20209 Latest February 2023 432patches20139 Latest main19844 Stable Upgraded with the latest SnapLogic Platform release. November 2022 main18944 Stable August 2022 main17386 Stable Extended the AWS S3 Dynamic Account support to ORC Reader and ORC Writer Snaps to support AWS Security Token Service (STS) using temporary credentials. 4.29 Patch 429patches16630 Latest 4.29 main15993 Stable 4.28 Patch 428patches15216 Latest Added the AWS S3 Dynamic account for Parquet Reader and Parquet Writer Snaps. 4.28 main14627 Stable Upgraded with the latest SnapLogic Platform release. 4.27 Patch 427patches13769 Latest 4.27 Patch 427patches12999 Latest Enhanced the Parquet Reader Snap with int96 As Timestamp checkbox, which when selected enables the Date Time Format field. You can use this field to specify a date-time format of your choice for int96 data-type fields. The int96 As Timestamp checkbox is available only when you deselect Use old data format checkbox. 4.26 426patches12288 Latest 4.26 main11181 Stable Upgraded with the latest SnapLogic Platform release. 4.25 Patch 425patches9975 Latest 4.25 main9554 Stable 4.24 Patch 424patches9262 Latest 4.24 Patch 424patches8876 Latest 4.24 main8556 Stable Upgraded with the latest SnapLogic Platform release. 4.23 Patch 423patches7440 Latest 4.23 main7430 Stable Upgraded with the latest SnapLogic Platform release. 4.22 main6403 Stable Upgraded with the latest SnapLogic Platform release. 4.21 Patch hadoop8853 Latest 4.21 snapsmrc542 Stable Upgraded with the latest SnapLogic Platform release. 4.20 Patch hadoop8776 Latest 4.20 snapsmrc535 Stable Upgraded with the latest SnapLogic Platform release. 4.19 Patch hadoop8270 Latest 4.19 snaprsmrc528 Stable Upgraded with the latest SnapLogic Platform release. 4.18 Patch hadoop8033 Latest 4.18 snapsmrc523 Stable 4.17 ALL7402 Latest 4.17 snapsmrc515 Latest 4.16 snapsmrc508 Stable 4.15 snapsmrc500 Stable 4.14 Patch hadoop5888 Latest 4.14 snapsmrc490 Stable 4.13 Patch hadoop5318 Latest 4.13 Stable 4.12 Patch hadoop5132 Latest 4.12 Stable Upgraded with the latest SnapLogic Platform release. 4.11 Patch hadoop4275 Latest 4.11 snapsmrc465 Stable 4.10 Patch hadoop4001 Latest 4.10 Patch hadoop3887 Latest 4.10 Patch hadoop3851 Latest 4.10 Patch hadoop3838 Latest 4.10 Stable 4.9 Patch hadoop3339 Latest 4.9.0 Patch hadoop3020 Latest 4.9 snapsmrc405 Stable Upgraded with the latest SnapLogic Platform release. 4.8 Stable 4.7.0 Patch hadoop2343 Latest 4.7 Stable 4.6 snapsmrc362 Stable 4.5 Stable 4.4.1 Stable 4.4 Stable 4.3.2 Stable
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2024 SnapLogic, Inc.