Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Overview

Use the HDFS ZipFile Write Snap to read in-coming data and write it to a ZIP file in an HDFS directory. This Snap also enables you to specify file access permissions for the new ZIP file. You can also configure how the Snap handles the new ZIP file if the destination directory already has another ZIP file with the same name.

For the HDFS protocol, use a SnapLogic on-premises Groundplex and ensure that its instance is within the Hadoop cluster and that SSH authentication is established.

Note

The HDFS protocol supported by this Snap is HDFS 2.4.0. This Snap supports both HDFS & ABFS (Azure Data Lake Storage Gen 2 ) protocols.

Image Added

Expected Input and Output

Prerequisites

Configuring Accounts

Accounts are not used with this Snap.

OR
  • Expected Input: Binary data stream containing documents to be written to a ZIP file.
  • Expected Output: Zipped file containing the incoming documents.
  • Expected Upstream Snaps: Required. Any Snap that offers binary data in its output view. Examples: JSON Formatter, HDFS Reader, File Reader.
  • Expected Downstream Snaps: Any Snap that takes document data as input. Examples: Mapper, HDFS Reader.

Prerequisites

The user executing the Snap must have Write permissions on the concerned directory.

Configuring Accounts

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See <link to Snap Pack's account page> for  See Configuring Hadoop Accounts for information on setting up this type of account.

Configuring Views

Input

This Snap has exactly at least one document input view.
OutputThis Snap has exactly at most one document output view.
ErrorThis Snap has at most one document error view.

Troubleshooting

None at this time.

Limitations and Known Issues

None at this time.

Modes

  • Ultra pipelines: Does not work in Ultra pipelines.Spark mode: Does not work in Spark modePipelinesWorks in Ultra Pipelines.

Snap Settings


Downloads
  • Drop-down lists: List out all the options present in the drop-down menu describing the options in terms of what they do and when they are to be used.
  • Conditional properties: If a property is not marked with a * but is required to be configured based on configuration of other properties then mark those as Conditional, explaining in the property, whose configuration warrants this property's configuration, that it is to be configured.
  • Read-only: If the property cannot be edited but displays some content based on the Snap's configuration

Check the Documenting Snap/Account settings page for a guideline on framing content for each type of property.

LabelRequired. The name for the Snap. Modify this to be more specific, especially if there are more than one of the same Snap in the pipeline.
Info
titleInstructions: Delete after reading
Execute during preview

Select this property to execute the Snap when the pipeline is validated.

Default value: Not selected

Examples

Directory

The URL for the data source (directory). The Snap supports both HFDS and ABFS(S) protocols.

Syntax for a typical HDFS URL:

Paste code macro
hdfs://hadoopcluster.domain.com:8020/<user>/<folder_details> 

Syntax for a typical ABFS and an ABFSS URL:

Paste code macro
abfs:///<filesystem>/<path>/
abfs://<filesystem>@<accountname>.<endpoint>/<path>
abfss:///<filesystem>/<path>/
abfss://<filesystem>@<accountname>.<endpoint>/<path>

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

Note

With the ABFS protocol, SnapLogic creates a temporary file to store the incoming data. Therefore, the hard drive where the JCC is running should have enough space to temporarily store all the account data coming in from ABFS.

Default value: [None]

File

The relative path and name of the file that must be created post execution.

Example: 

  • sample.zip
  • tmp/another.zip
  • $filename

Default value:  [None]

User Impersonation
Insert excerpt
HDFS Reader
HDFS Reader
nopaneltrue
File Action
Required. Use this field to specify what you want the Snap to do if the file you want it to create already exists. Available options are: Overwrite, Ignore, and Error.
  • Overwrite: If the target file exists, the Snap overwrites the file.

  • Ignore: If the file already exists, the Snap neither throws an exception nor does it overwrite the file, but creates an output document indicating that the new data has been ignored.
  • Error: The error displays in the Pipeline Run Log if the file already exists.

Default value: Overwrite

File Permissions

File permission sets to be assigned to the file. To assign file permissions:

  1. Click the + button against File permissions. This adds a row to the fieldset.
  2. Click the Suggestible icon in the User type field and select the user type for which you want to enable access. This drop-down offers the following options:
    • Owner: This is the user account under whose name the new file will be created.
    • Group: This is the user group to which the user being impersonated belongs.
    • Others: These are all other users who have at least Read access to the concerned directory.
  3. Click the Suggestible icon in the File permissions field and select the permission you want to enable for the user type selected in the User type field.
Base directoryEnter here the name of the root directory in the ZIP file.
Use input view label

If selected, the input view label is used for all names of the files added to the zip file. Otherwise, the input view ID is used instead, when input the binary stream does not have its content-location in its header. When this option is selected, if there are more than one binary input streams in an input view, for the second input stream and after, the file names will be the input view label appended with '_n'. If the label is in the format of 'name.ext', '_n' will be append to the 'name', e.g. name_2.ext for the second input stream.

Example: If this option is selected, if Base directory is testFolder and the input view label is test.csv, the file name for the first binary input stream in that input view will be testFolder/test.csv, and the second, testFolder/test_2.csv, and the third, testFolder/test_3.csv, and so on.

Default value: Not selected

Multiexcerpt include macro
nameSnap Execution
pageSOAP Execute


Multiexcerpt include macro
nameExecution_Detail_Write
pageSOAP Execute


Note
The binary document header content-location of the HDFS ZipFile Writer input is the name within the ZIP file. (Example: foo.txt). The Snap does not include the 'base directory'. It could contain subdirectories though. On the other hand, the binary document header content-location of the output of the HDFS ZipFile Reader is the name of the ZIP file, the base directory, and the content location provided to the writer. Thus, while each Snap works well independent of each other, it's currently not possible to have a Reader > Writer > Reader combination in a pipeline without using other intermediate Snaps to provide the binary document header information.

Troubleshooting

Insert excerpt
Hadoop Directory Browser
Hadoop Directory Browser
nopaneltrue

Examples


Insert excerpt
HDFS ZipFile Reader
HDFS ZipFile Reader
nopaneltrue


Downloads

Multiexcerpt include macro
namedownload_instructions
pageOpenAPI

Attachments
patterns*.slp,*.zip

Additional Resources

  • Glossary

  • Getting started with SnapLogic
  • Snap History

    PaneltitleSnap History

    Insert excerpt
    Hadoop Snap Pack
    Hadoop Snap Pack
    nopaneltrue