...
Download hadoop.dll
and winutils.exe
https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin (SnapLogic’s Hadoop version is 3.2.2)
Create a temporary directory.
Place the hadoop.dll
and winutils.exe
files in this path: C:\\hadoop\bin
Set the environment variable HADOOP_HOME
to point to C:\\hadoop
Add C:\hadoop\bin
to the environment variable PATH as shown below:
Add the JVM options in the Windows Snaplex:jcc.jvm_options= -Djava.library.path=C:\\hadoop\bin
If you already have an existing jvm_options
, then add: "-Djava.library.path=C:\\hadoop\bin"
after the space.
For example:jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\\hadoop\bin
Restart the JCC for configurations to take effect.
...
Multiexcerpt include macro |
---|
macro_uuid | 1e39aa69-4f7d-4506-b1d1-ba9c89581bd9 |
---|
name | Known Issue for WASB Protocol |
---|
templateData | eJyLjgUAARUAuQ== |
---|
page | HDFS Reader |
---|
addpanel | false |
---|
|
Learn more about the Azure Storage library upgrade.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|
Input | Document | | Mapper | This Snap has one or two document input views. When you enable the second input view, the Snap ignores other schema settings such as Schema button or Hive Metastore related properties, but it accepts the schema from the second input view only. However, when you disable the second input view, the Snap prepares to receive the Schema with the provided information on the Hive Metastore URL property. The supported data types are: Primitive: Boolean, Integer, Float, double, and byte-array Local: map, list
The Snap expects a Hive Execute Snap that contains the "Describe table" statement in the second input view.
|
Output | Document | | Mapper | A document with a filename for each Parquet file written. For example: {"filename" : "hdfs://localhost/tmp/2017/april/sample.parquet"} |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are: Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error. Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records. Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.
Learn more about Error handling in Pipelines. |
...
Error | Reason | Resolution |
---|
Unable to connect to the Hive Metastore. | This error occurs when the Parquet Writer Snap is unable to fetch schema for Kerberos-enabled Hive Metastore. | Pass the Hive Metastore's schema directly to the Parquet Writer Snap. To do so: Enable the 'Schema View' in the Parquet Writer Snap by adding the second Input View. Connect a Hive Execute Snap to the Schema View. Configure the Hive Execute Snap to execute the DESCRIBE TABLE command to read the table metadata and feed it to the schema view.
|
Parquet Snaps may not work as expected in the Windows environment. | Because of the limitations in the Hadoop library on Windows, Parquet Snaps does not function as expected. | To use the Parquet Writer Snap on a Windows Snaplex, follow these steps: Create a temporary directory. For example: C:\\testhadoop\. Place two files, "hadoop.dll " and "winutils.exe ", in the newly created temporary directory. Use this link https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin to download hadoop.dll and winutills.exe. (SnapLogic’s code base Hadoop is 3.2.2). Add the JVM options in the Windows Plex as shown below:
jcc.jvm_options = -Djava.library.path=C:\\testhadoop If you already have existing jvm_options , then add the following "-Djava.library.path=C:\\testhadoop " after the space. For example:
jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\\testhadoop Restart the JCC for configurations to take effect.
|
Failure: 'boolean org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(java.lang.String, int)',
| Because of the limitations in the Hadoop library on Windows, Parquet Snaps does not function as expected. | To resolve this issue, follow these steps: Download hadoop.dll and winutils.exe https://github.com/cdarlint/winutils/tree/master/hadoop-3.2.2/bin (SnapLogic’s Hadoop version is 3.2.2) Create a temporary directory. Place the hadoop.dll and winutils.exe files in this path: C:\\hadoop\bin Set the environment variable HADOOP_HOME to point to C:\\hadoop Add C:\hadoop\bin to the environment variable PATH as shown below: Variable name: PATH Variable value: VEN_HOME%\bin;%HADOOP_HOME%\bin Add the JVM options in the Windows Snaplex:jcc.jvm_options= -Djava.library.path=C:\\hadoop\bin If you already have an existing jvm_options , then add: "-Djava.library.path=C:\\hadoop\bin" after the space. For example:jcc.jvm_options = -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=8000 -Djava.library.path=C:\\\hadoop\bin Restart the JCC for configurations to take effect.
|
...
Expand |
---|
title | Understanding the Pipeline |
---|
|
Image RemovedImage Added Image RemovedImage Added
|
Downloads
Info |
---|
Download and import the Pipeline into SnapLogic. Configure Snap accounts, as applicable. Provide Pipeline parameters, as applicable.
|
...