Page Comparison

...

You can use this Snap to convert documents to the Parquet format and write the data to HDFS, ADL (Azure Data Lake), ABFS (Azure Data Lake Blob File Storage Gen 2), WASB (Azure storage), or an S3 bucket. This Snap supports a nested schema such as LIST and MAP. You can also use this Snap to write schema information to the Catalog Insert Snap.

This Snap supports HDFS, ADL (Azure Data Lake), ABFS (Azure Data Lake Blob File Storage Gen 2), WASB (Azure storage), and S3 protocols.

...

Auto schema generation in this Snap involves the exclusion of excludes null fields. For example, if the Snap receives ten input documents during preview execution, and four of these documents contain null values for certain fields in all instances, those four fields are disregarded during schema generation. The schema only includes fields with at least one non-null value among the preview input documents.
"Generate template" is unsupported for a nested structure like MAP and LIST type. Generate template is a link within the schema editor accessed through the Edit Schema button.
All expression Snap properties can be evaluated (when the '=' button is pressed) from pipeline parameters only, not from input documents from upstream Snaps. Input documents are data to be formatted and written to the target files.
The security model configured for the Groundplex (SIMPLE or KERBEROS authentication) must match the security model of the remote server. Due to the limitations of the Hadoop library, we can only create the necessary internal credentials to configure the Groundplex.
Parquet Snaps work well in a Linux environment. However, due to limitations in the Hadoop library on Windows, their functioning in a Windows environment may not always be as expected. We recommend you use a Linux environment for working with Parquet Snaps.

...

Download hadoop.dlland winutils.exe https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin (SnapLogic’s Hadoop version is 3.2.2)
Create a temporary directory.
Place the hadoop.dlland winutils.exe files in this path: cC:\winutilshadoop\bin
Set the environment variable HADOOP_HOME to point to cC:\winutilshadoop
Add cC:\winutilshadoop\bin to the environment variable PATH as shown below:
Add the JVM options in the Windows Snaplex:jcc.jvm_options= -Djava.library.path=C:\hadoop\testbin
If you already have an existing jvm_options, then add: "-Djava.library.path=C:\hadoop\testbin" after the space.
For example:jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\hadoop\testbin
Restart the JCC for configurations to take effect.

...

Error

Reason

Resolution

Unable to connect to the Hive Metastore.

This error occurs when the Parquet Writer Snap is unable to fetch schema for Kerberos-enabled Hive Metastore.

Pass the Hive Metastore's schema directly to the Parquet Writer Snap. To do so:

Enable the 'Schema View' in the Parquet Writer Snap by adding the second Input View.
1. Connect a Hive Execute Snap to the Schema View. Configure the Hive Execute Snap to execute the DESCRIBE TABLE command to read the table metadata and feed it to the schema view.

Parquet Snaps may not work as expected in the Windows environment.

Because of the limitations in the Hadoop library on Windows, the functioning of Parquet Snaps is does not function as expected.

To use the Parquet Writer Snap on a Windows Snaplex, follow these steps:

Create a temporary directory. For example: C:\test\.
Place two files, "hadoop.dll" and "winutils.exe", in the newly created temporary directory. Use this link https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin to download hadoop.dll and winutills.exe. (SnapLogic’s code base Hadoop is 3.2.2).
Add the JVM options in the Windows Plex as shown below:
jcc.jvm_options = -Djava.library.path=C:\\test
If you already have existing jvm_options, then add the following "-Djava.library.path=C:\\test" after the space. For example:
jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\\test
Restart the JCC for configurations to take effect.

Failure: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)',

Because of the limitations in the Hadoop library on Windows, Parquet Snaps does not function as expected.

To resolve this issue, follow these steps:

Download hadoop.dlland winutils.exe https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin (SnapLogic’s Hadoop version is 3.2.2)
Create a temporary directory.
Place the hadoop.dlland winutils.exe files in this path: C:\hadoop\bin
Set the environment variable HADOOP_HOME to point to C:\hadoop
Add C:\hadoop\bin to the environment variable PATH as shown below:
Variable name: PATH
Variable value: VEN_HOME%\bin;%HADOOP_HOME%\bin
Add the JVM options in the Windows Snaplex:jcc.jvm_options= -Djava.library.path=C:\hadoop\bin
If you already have an existing jvm_options, then add: "-Djava.library.path=C:\hadoop\bin" after the space.
For example:jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\hadoop\bin
Restart the JCC for configurations to take effect.

...

Versions Compared

Old Version 9

New Version Current

Key