On this Page
Table of Contents | ||||
---|---|---|---|---|
|
Snap type: | Read | |||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Description: | This Snap reads data from HDFS (Hadoop File System) and produces a binary data stream at the output. For the hdfs protocol, please use a SnapLogic on-premises Groundplex and make sure that its instance is within the Hadoop cluster and SSH authentication has already been established. The Snap also supports the webhdfs protocol, which does not require a Groundplex and works for all versions of Hadoop. The The Snap also supports reading from a Kerberized cluster using the HDFS protocol.
Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, see the section on User Impersonation below. | |||||||||||||||||||||||||||||
Prerequisites: | [None] | |||||||||||||||||||||||||||||
Limitations and Known Issues: |
| |||||||||||||||||||||||||||||
Account: | This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports Azure storage account, Azure Data Lake account, Kerberos account, or no account. Account types supported by each protocol are as follows:
Required settings for account types are as follows:
IAM Roles for Amazon EC2global.properties jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE Please note this feature is supported in only in Groundplex nodes hosted in the EC2 environment. For more information on IAM Roles, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html Kerberos Account UI Configuration
| |||||||||||||||||||||||||||||
Views: |
| |||||||||||||||||||||||||||||
Settings | ||||||||||||||||||||||||||||||
Label | Required. Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | |||||||||||||||||||||||||||||
Directory | Specify the URL for the data source (directory). The Snap supports the following protocols.
When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields. The Directory property is not used in the pipeline execution or preview and used only in the Suggest operation. When you press the Suggest icon, it will display a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property. Examples:
Default value: hdfs://<hostname>:<port>/
| |||||||||||||||||||||||||||||
File Filter | Specify the Glob filter pattern.
| Insert excerpt | | HDFS Writer | HDFS Writer | nopanel | true
property is applied to the directory specified by the Directory property. |
Expand | ||
---|---|---|
| ||
The following rules are used to interpret glob patterns: The * character matches zero or more characters of a name component without crossing directory boundaries. For example, the *.csv pattern matches a path that represents a file name ending in .csv, and *.* matches all file names that contain a period. The ** characters match zero or more characters across directories; therefore, it matches all files or directories in the current directory and in its subdirectories. For example, /home/** matches all files and directories in the /home/ directory. The ? character matches exactly one character of a name component. For example, 'foo.?' matches file names that start with 'foo.' and are followed by a single-character extension. The \ character is used to escape characters that would otherwise be interpreted as special characters. The expression \\ matches a single backslash, and \{ matches a left brace, for example. The ! character is used to exclude matching files from the output. The [ ] characters form a bracket expression that matches a single character of a name component out of a set of characters. For example, '[abc]' matches 'a', 'b', or 'c'. The hyphen (-) may be used to specify a range, so '[a-z]' specifies a range that matches from 'a' to 'z' (inclusive). These forms can be mixed, so '[abce-g]' matches 'a', 'b', 'c', 'e', 'f' or 'g'. If the character after the [ is a ! then it is used for negation, so '[!a-c]' matches any character except 'a', 'b', or 'c'. Within a bracket expression, the '*', '?', and '\' characters match themselves. The '-' character matches itself if it is the first character within the brackets, or the first character after the !, if negating. The '{ }' characters are a group of sub-patterns where the group returns a match if any sub-pattern in the group matches the contents of a target directory. The ',' character is used to separate sub-patterns. Groups cannot be nested. For example, the pattern '*.{csv, json}' matches file names ending with '.csv' or '.json'. Leading dot characters in a file name are treated as regular characters in match operations. For example, the '*' glob pattern matches file name ".login". All other characters match themselves. Examples: '*.csv' matches all files with a csv extension in the current directory only. '**.csv' matches all files with a csv extension in the current directory and in all its subdirectories. *[!{.pdf,.tmp}] excludes all files with the extension PDF or TMP. |
File
The name of the file to be read. This can also be a relative path under the directory given in the Directory property. It should not start with a URL separator "/".
The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.
If this property is left blank (the * wildcard is used) when the Snap is executed, all files under the directory matching the glob filter will be read.
Example:
- sample.csv
- tmp/another.csv
- $filename
- _filename
Default value: [None]
Excerpt | |||||||
---|---|---|---|---|---|---|---|
Select this check box to enable user impersonation.
Default value: Not selected For more information on working with user impersonation, click the link below.
|
Specify the maximum number of attempts to be made to receive a response.
Info |
---|
|
Default value: 0
Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception.
Default value: 1
Multiexcerpt include macro | ||||
---|---|---|---|---|
|
Select one of the three following modes in which the Snap executes: Available options are:
- Validate & Execute: Performs limited execution of the Snap and generates a data preview during Pipeline validation; then performs full execution of the Snap (unlimited records) during Pipeline runtime.
- Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
Disabled: Disables the Snap and all Snaps downstream from it.
Troubleshooting
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|