Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Addressed Feb 2021 comment in https://mysnaplogic.atlassian.net/browse/SNAP-7269

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Snap type:

Write

Description:

This Snap reads a binary data stream from its input view and writes a file in HDFS (Hadoop File System). It also helps pick a file by suggesting a list of directories and files. For the hdfs protocol, please use a SnapLogic on-premises Groundplex and make sure that its instance is within the Hadoop cluster and SSH authentication has already been established. The Snap also supports the webhdfs protocol, which does not require a Groundplex and works for all versions of Hadoop. The Snap also supports write writing to a kerberized cluster through the hdfs protocol.

Note
 HDFS 2.4.0 is supported for the HDFS protocol.

Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, see the section on User Impersonation below.

Prerequisites:

[None]

Support and limitations:

Limitations

  • Append (File action) is supported for ADL protocol only.
  • File names with the following special characters are not supported in the HDFS Writer Snap: '+', '?', '/', ':'.

Limitations with File Permissions for Various Users

  • With "File permissions for various users": File names with the following special characters are not supported in the HDFS Writer Snap: '+', '?', '/', ':'.
  • Without "File permissions for various users": File names with the following special characters are not supported in the HDFS Writer Snap: ':', '/'.
Account: 

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports Azure storage account, Azure Data Lake account, Kerberos account, or no account. 

This Snap works with the following accounts:

Views:


InputThis Snap has exactly one binary input view. Examples of Snaps that can be connected to this input are CSV Formatter, JSON Formatter, and XML Formatter.
Output

This Snap has exactly one document output view.

The following is an example of the output document map data:

Code Block
{
        "filename": "hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/sample.csv",
        "fileAction": "overwritten" }

The value of the "fileAction" field can be "overwritten" or "created" or "ignored". The value "ignored" indicate that the Snap did not overwrite the existing file because the value of the File action property is "IGNORE".


Error

This Snap has at most one document error view and produces zero or more documents in the view.


Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory


Required. The URL for HDFS directory. It should start with hdfs or webhdfs file with hdfs file protocol in the form of:

  • hdfs://<hostname>:<port>/<path to directory>/
  • webhdfs://<hostname>:<port>/<path to directory>/
  • wasb:///<container name>/<path to directory>/
  • wasbs:///<container name>/<path to directory>/
  • adl://<container name>/<path to directory>/ 
  • abfs(s):///filesystem/<path>/
  • abfs(s)://filesystem@accountname.endpoint/<path>

The Directory property is not used in the pipeline execution or preview and used only in the Suggest operation. When you press the Suggest icon, it will display a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

Example

  • hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/webhdfs://cdh-qa-2.fullsail.snaplogic.com:50070/user/ec2-user/csv/
  • wasb:///snaplogic/testDir/
  • wasbs:///snaplogic/testDir/
  • $dirname
  • adl://snapqa/ 
  • abfs(s):///filesystem2/dir1
  • abfs(s)://filesystem2@snaplogicaccount.dfs.core.windows.net/dir1

Default value:  hdfs://<hostname>:<port>/

Note

SnapLogic automatically appends "azuredatalakestore.net" to the store name you specify when using Azure Data Lake; therefore, you do not need to add 'azuredatalakestore.net' to the URI while specifying the directory.


File Filter



Excerpt

Use glob patterns to display a list of directories or files when you click the Suggest icon in the Directory or File property. A complete glob pattern is formed by combining the value of the Directory property with the Filter property. If the value of the Directory property does not end with "/", the Snap appends one, so that the value of the Filter property is applied to the directory specified by the Directory property.

Default Value: *

For more information on glob patterns, click the link below.


Expand
titleGlob Pattern Interpretation Rules

Glob Pattern Interpretation Rules

The following rules are used to interpret glob patterns:

  • The * character matches zero or more characters of a name component without crossing directory boundaries. For example, the *.csv pattern matches a path that represents a file name ending in .csv, and *.* matches all file names that contain a period.

  • The ** characters match zero or more characters across directories; therefore, it matches all files or directories in the current directory and in its subdirectories. For example, /home/** matches all files and directories in the /home/ directory.

  • The ? character matches exactly one character of a name component. For example, 'foo.?' matches file names that start with 'foo.' and are followed by a single-character extension.

  • The \ character is used to escape characters that would otherwise be interpreted as special characters. The expression \\ matches a single backslash, and \{ matches a left brace, for example.

  • The ! character is used to exclude matching files from the output. 
  • The [ ] characters form a bracket expression that matches a single character of a name component out of a set of characters. For example, '[abc]' matches 'a', 'b', or 'c'. The hyphen (-) may be used to specify a range, so '[a-z]' specifies a range that matches from 'a' to 'z' (inclusive). These forms can be mixed, so '[abce-g]' matches 'a', 'b', 'c', 'e', 'f' or 'g'. If the character after the [ is a ! then it is used for negation, so '[!a-c]' matches any character except 'a', 'b', or 'c'.

    Within a bracket expression, the '*', '?', and '\' characters match themselves. The '-' character matches itself if it is the first character within the brackets, or the first character after the !, if negating.

  • The '{ }' characters are a group of sub-patterns where the group returns a match if any sub-pattern in the group matches the contents of a target directory. The ',' character is used to separate sub-patterns. Groups cannot be nested. For example, the pattern '*.{csv, json}' matches file names ending with '.csv' or '.json'.

  • Leading dot characters in a file name are treated as regular characters in match operations. For example, the '*' glob pattern matches file name ".login".

  • All other characters match themselves.

Examples:

  • '*.csv' matches all files with a csv extension in the current directory only.
  • '**.csv' matches all files with a csv extension in the current directory and in all its subdirectories.
  • *[!{.pdf,.tmp}] excludes all files with the extension PDF or TMP.



File




Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

Example:

  • sample.csv
  • tmp/another.csv
  • $filename

Default value: [None]

Flush interval (kB)

Enter the flush interval in kilobytes to flush a specified size of data during the file upload. This Snap can flush the output stream each time a given size of data is written to the target file server.

Info

If the Flush interval is 0, the Snap flushes maximum frequency after each byte block is written. The larger the flush interval, the less frequent are the flushes. This field is useful if the file upload experiences an intermittent failure. However, more frequent flushes result in slower file upload. The default value of -1 indicates no flush during the upload.


Default value: -1
Number Of Retries

Specify the maximum number of attempts to be made to receive a response. 

Note
  • The request is terminated if the attempts do not result in a response.
  • Retry operation, which is the attempts to receive a response occurs, only when the Snap loses the connection with the server.

Default value: 0

Retry Interval (seconds)

Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception.

Default value: 1

File action


Required. The action to perform if the file already exists. Available options are: Overwrite, Append, Ignore and Error.
  • Overwrite - The Snap attempts to write the file without checking for the file's existence for a better performance, and the "fileAction" field will be "overwritten" in the output view data.

  • Append - The Snap appends records in the incoming documents to the existing file.

  • Ignore - If the file already exists, the Snap does not throw an exception and does not overwrite the file, but writes an output document indicating that it has been 'ignored'.

  • Error - The error displays in the Pipeline Run Log if the file already exists.

Default value: Overwrite

Note

Append is supported for ADL and ABFS(S) protocols only.


User Impersonation

Insert excerpt
HDFS Reader
HDFS Reader
nopaneltrue

File permissions for various users


File permission sets to be assigned to the file.

User type

It should be 'owner' or 'group' or 'others'. Each row can have only one user type and each user type should appear only once. Please select one from the suggested list.

Example:  owner, group, others

File permissions

It can be any combination of {read, write, execute} separated by '+' character. Please select one from the suggested list.

Example:  read, write, execute, read+write, read+write+execute


Output for each file written

Enables you to produce a different output document for each file that is written. If the Snap receives multiple binary input data and the File expression property is dynamically evaluated to a filename by using the Content-Location field from the input metadata, each binary data can be written to a different target file.

By default, the Snap produces only one output document with a filename that corresponds to the last file that was written.

Default value: Not selected

Multiexcerpt include macro
nameSnap Execution
pageSOAP Execute

Multiexcerpt include macro
nameExecution_Detail_Write
pageSOAP Execute

Troubleshooting

Insert excerpt
Hadoop Directory Browser
Hadoop Directory Browser
nopaneltrue


Insert excerpt
Hadoop Snap Pack
Hadoop Snap Pack
nopaneltrue