...
...
...
...
...
...
...
...
...
...
...
In this
...
article
Table of Contents | ||||
---|---|---|---|---|
|
...
Snap type:
Write
...
Overview
This Snap reads a binary data stream from its input view and writes a file in HDFS (Hadoop File System). It also helps pick a file by suggesting a list of directories and files. For
...
the HDFS protocol,
...
use a SnapLogic on-premises Groundplex and
...
ensure its instance is within the Hadoop cluster and SSH authentication has already been established
...
. The Snap also supports
...
writing to a
...
Kerberized cluster through
...
the HDFS protocol. This Snap supports HDFS, ABFS (Azure Data Lake Storage Gen 2 ), and WASB (Azure storage) protocols. HDFS 2.4.0 is supported for the HDFS protocol. It also supports reading from HDFS Encryption.
...
Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, see the section on User Impersonation below.
...
[None]
...
...
Snap Type
The HDFS Writer Snap is a Write-type Snap.
Prerequisites
None.
Support for Ultra Pipelines
Works in Ultra Pipelines.
Limitations
Append (File action) is supported for ADL protocol only.
File names with the following special characters are not supported in the HDFS Writer Snap: '+', '?', '/', ':'.
...
Limitations with File Permissions for Various Users
- With "File permissions for various users": File names with the following special characters are not supported in the HDFS Writer Snap: '+', '?', '/', ':'.
- Without "File permissions for various users": File names with the following special characters are not supported in the HDFS Writer Snap: ':', '/'.
...
This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports Azure storage account, Azure Data Lake account, Kerberos account, or no account.
This Snap works with the following accounts:
...
Known Issues
The upgrade of Azure Storage library from v3.0.0 to v8.3.0 has caused the following issue when using the WASB protocol:
When you use invalid credentials for the WASB protocol in Hadoop Snaps (HDFS Reader, HDFS Writer, ORC Reader, Parquet Reader, Parquet Writer), the pipeline does not fail immediately, instead it takes 13-14 minutes to display the following error:
reason=The request failed with error code null and HTTP code 0. , status_code=error
SnapLogic® is actively working with Microsoft®Support to resolve the issue.
Learn more about Azure Storage library upgrade.
Snap Views
...
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Binary |
|
| Binary input data. |
Output | Document |
|
| The following is an example of the output document map data: |
...
|
...
|
...
|
...
|
...
Directory
Required. The URL for HDFS directory. It should start with hdfs or webhdfs file protocol in the form of:
...
The value of the "fileAction" field can be "overwritten" or "created" or "ignored". The value "ignored" indicate that the Snap did not overwrite the existing file because the value of the File action property is "IGNORE". |
Error |
---|
...
This Snap has at most one document error view and produces zero or more documents in the view.
...
Settings
Label
...
Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Info |
---|
|
Field | Field Types | Description |
---|---|---|
Label*
| String | Specify a unique name for the Snap. |
Directory
| String/Expression/Suggestion | Specify the URL for HDFS directory. It should start with hdfs file protocol in the following format:
The Directory property is not used in the |
...
Pipeline execution or preview and used only in the Suggest operation. When you |
...
click the |
...
Suggestion icon, |
...
the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property |
...
. |
...
Default value: hdfs://<hostname>:<port>/
|
...
|
...
|
...
|
...
|
...
|
...
Filter
...
File filter
| String/Expression | Specify the Glob filter pattern. Use glob patterns to display a list of directories or files when you click the Suggest icon in the Directory or File property. A complete glob pattern is formed by combining the value of the Directory property with the Filter property. If the value of the Directory property does not end with "/", the Snap appends one, so that the value of the Filter property is applied to the directory specified by the Directory property |
---|
...
Default Value: *
...
.
|
Glob Pattern Interpretation Rules
| |
File |
---|
...
Default Value: N/A
| String/Expression/Suggestion | Specify the filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you |
---|
...
click the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property. |
Flush interval (kB) Default Value:-1 |
---|
...
- sample.csv
- tmp/another.csv
- $filename
Default value: [None]
File action
...
0 | Interval | Specify the flush interval in kilobytes to flush a specified size of data during the file upload. This Snap can flush the output stream each time a given size of data is written to the target file server. If the Flush interval is 0, the Snap flushes maximum frequency after each byte block is written. The larger the flush interval, the less frequent are the flushes. This field is useful if the file upload experiences an intermittent failure. However, more frequent flushes result in slower file upload. The default value of -1 indicates no flush during the upload. | ||
---|---|---|---|---|
Number Of Retries Default Value: 0 | Integer/Expression | Specify the maximum number of attempts to be made to receive a response.
| ||
Retry Interval (seconds) Default Value: 1 | Integer/Expression | Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. | ||
File action*
| Dropdown list | Select an action to perform if the specified file already exists:
|
Default value: Overwrite
...
The Append operation is supported for |
...
File permissions for various users
File permission sets to be assigned to the file.
...
FILE, SFTP, FTP, FTPS and ADL protocols only. For any other protocols that are not supported by Append, we recommend that you use the File Operations, File Writer, and File Delete Snaps and follow this procedure.
Note: This approach might involve disk overhead, therefore ensure that you have enough disk space in your system. | ||
File permissions for various users | Use this field set to select the user and the desired file permissions. Limitations with File Permissions for Various Users
| |
---|---|---|
User type Default Value: N/A | String/Expression/Suggestion | It should be 'owner' or 'group' or 'others'. Each row can have only one user type and each user type should appear only once. Please select one from the suggested list. |
File permissions Default Value: N/A |
...
read, write, execute, read+write, read+write+execute | String/Expression/Suggestion | It can be any combination of {read, write, execute} separated by '+' character. Please select one from the suggested list. |
---|---|---|
User Impersonation |
...
| Checkbox | Select this check box to enable user impersonation. Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, see the section on User Impersonation below. |
---|---|---|
Output for each file written
| Checkbox | Enables you to produce a different output document for each file that is written. If the Snap receives multiple binary input data and the File expression property is dynamically evaluated to a filename by using the By default, the Snap produces only one output document with a filename that corresponds to the last file that was written. |
Default value: Not selected
Write empty file
Default Value: Deselected | Checkbox | Select this checkbox to write an empty file to all the supported protocols when the binary input document has no data. |
---|---|---|
Snap Execution |
...
| Dropdown list |
|
---|
Troubleshooting
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Examples
Writing to an Azure Storage Layer
Snap Pack History
Insert excerpt | ||||
---|---|---|---|---|
|
...