On this Page

Table of Contents

maxLevel	2
exclude	Older Versions\|Additional Resources\|Related Links\|Related Information

Overview

Use the HDFS ZipFile Read Snap to extract and read archive files in HDFS directories and produce a stream of unzipped documents in the output.

For the HDFS protocol, use a SnapLogic on-premises Groundplex. Also, ensure that the instance is within the Hadoop cluster and that SSH authentication is established.

Note
This Snap supports the HDFS 2.4.0 protocol.

Expected Input and Output

Expected Input: Documents containing information that identifies the directory and ZIP files that must be read.
Expected Output: A binary stream containing unzipped documents from the specified ZIP files.
Expected Upstream Snaps: Required. Any Snap that offers a list of ZIP files in its output view. Examples: HDFS ZipFile Writer, ZipFile Read.
Expected Downstream Snaps: Any Snap that accepts document data in its input view. Examples: CSV Parser, HDFS Writer, File Writer.

Prerequisites

The user executing the Snap must have Read permissions on the concerned Hadoop directory.

Configuring Accounts

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Configuring Hadoop Accounts for information on setting up this type of account.

Configuring Views

Input	This Snap has at most one document input view.
Output	This Snap has exactly one binary output view.
Error	This Snap has at most one document error view.

Troubleshooting

None at this time.

Limitations and Known Issues

None at this time.

Modes

Ultra Pipelines: Works in Ultra Pipelines.

Snap Settings

Label

Required. The name for the Snap. Modify this to be more specific, especially if there are more than one of the same Snap in the pipeline.

Directory

The URL for the data source (directory). The Snap supports both HFDS and ABFS(S) protocols.

Syntax for a typical HDFS URL:

Paste code macro
hdfs://hadoopcluster.domain.com:8020/<user>/<folder_details>

Syntax for a typical ABFS and an ABFSS URL:

Paste code macro
abfs:///<filesystem>/<path>/ abfs://<filesystem>@<accountname>.<endpoint>/<path> abfss:///<filesystem>/<path>/ abfss://<filesystem>@<accountname>.<endpoint>/<path>

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

Default value: [None]

File Filter

Insert excerpt

	HDFS Writer
	HDFS Writer
nopanel	true

File

The relative path and name of the file that must be read.

Example:

sample.csv
tmp/another.csv
$filename

Default value: [None]

User Impersonation

Insert excerpt

	HDFS Reader
	HDFS Reader
nopanel	true

Prevent URL Encoding

Insert excerpt

	ZipFile Read
	ZipFile Read
nopanel	true

Multiexcerpt include macro

name	Snap Execution
page	SOAP Execute

Multiexcerpt include macro

name	Snap_Execution_Introduced
page	Anaplan Read

Note

The binary document header content-location of the HDFS ZipFile Writer input is the name within the ZIP file. (Example: foo.txt). The Snap does not include the 'base directory'. It could contain subdirectories though. On the other hand, the binary document header content-location of the output of the HDFS ZipFile Reader is the name of the ZIP file, the base directory, and the content location provided to the writer. Thus, while each Snap works well independent of each other, it's currently not possible to have a Reader > Writer > Reader combination in a pipeline without using other intermediate Snaps to provide the binary document header information.

Examples

Excerpt

Writing and Reading a ZIP File in HDFS

The first part of this example demonstrates how you can use the HDFS ZipFile Write Snap to zip and write a new file into HDFS. The second part of this example demonstrates how you can unzip and check the contents of the newly-created ZIP file.

Click here to download this pipeline. You can also downloaded this pipeline from the Downloads section below.

Expand

title	Understanding the Sample Pipeline

Create the pipeline as shown below:

The Hadoop Directory Browser Snap

Use a Hadoop Directory Browser Snap to first check the contents of the target directory. This will help you check whether the new file got added to the HDFS directory as expected, later in the example.

Enter the Directory URL as appropriate and specify the File filter as *.zip. This instructs the Snap to list out all the ZIP files in the target directory.

If the Snap executes as expected, you should see the contents of your target directory, as shown below:

Generating a File for Upload

You now need to choose a file to upload into the target directory. You could either select a file directly or use a JSON Generator Snap coupled with a JSON Formatter Snap, as in the example pipeline.

The HDFS ZipFile Writer Snap

Your file is now ready. Configure the HDFS ZipFile Writer Snap to upload the file as a ZIP file into the target directory in HDFS, as shown below.

The Hadoop Directory Browser Snap

Use a Copy Snap to perform two tasks after the ZIP file is created: first, to check whether the new file was created as expected and second, to try and read the contents to the newly-created ZIP file from the target HDFS directory.

To check whether the new file was created, add an HDFS Directory Browser Snap to the pipeline.

If the ZIP file was created, you should see it in the output, as shown below:

HDFS ZipFile Reader

Once you have confirmed that the new ZIP file has been created, use the HDFS ZipFile Reader Snap to read the new ZIP file. If the contents of the new ZIP file is the same as the contents of the input file, you know that the pipeline works!

To read the output of the HDFS ZipFile Read Snap, use a File Reader Snap:

If the contents of the new file is the same as the contents of the original file, you know the example works.

Click here to download this Pipeline. You can also downloaded this pipeline from the Downloads section below.

Troubleshooting

Insert excerpt

	Hadoop Directory Browser
	Hadoop Directory Browser
nopanel	true

Downloads

Multiexcerpt include macro

name	download_instructions
page	OpenAPI

Attachments

patterns	.slp,.zip

Insert excerpt

	Hadoop Snap Pack
	Hadoop Snap Pack
nopanel	true

Versions Compared

Old Version 26

New Version 27

Key

Overview

Expected Input and Output

Prerequisites

Configuring Accounts

Configuring Views

Troubleshooting