HDFS Delete
In this article
Overview
You can use this Snap to delete the specified file, group of files, or directory from the supplied path and protocol in the Hadoop Distributed File System (HDFS), Azure Blob File System (ABFS), Windows Azure Storage Blob (WASB) and Azure Data Lake (ADL).
Snap Type
The Hadoop Distributed File System (HDFS) Delete Snap is a write-type Snap.
Prerequisites
None.
Support for Ultra Pipelines
Supports Ultra Pipelines.
Limitations and Known Issues
None.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The file filter, file, and directory details of the file to be deleted. |
Output | Document |
|
| The deleted file or a group of files. |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Asterisk ( * ): Indicates a mandatory field.
Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Upload icon ( ): Indicates that you can upload files.
Field Name | Field Type | Description |
---|---|---|
Label*
Default Value: HDFS delete
| String | The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. |
Directory
Default Value: hdfs://<hostname>:<port>/
| String/Expression/Suggestion | Specify the URL for the HDFS directory. It should start with the HDFS file protocol in the following format:
The Directory property is used only in the Suggest operation. When you click the Suggestion icon, the Snap displays a list of subdirectories under the specific directory. It generates the list by applying the value specified in the File Filter property. |
File filter
Default Value: * Example: ?
| String/Expression | Specify the Glob filter pattern. A file filter is a criteria to include or exclude specific files when processing data in HDFS. Use glob patterns to display a list of directories or files when you click the Suggest icon in the Directory or File property. A complete glob pattern is formed by combining the value of the Directory property with the Filter property. If the value of the Directory property does not end with "/", the Snap appends one so that the value of the Filter property is applied to the directory specified by the Directory property. The following rules are used to interpret glob patterns: The * character matches zero or more characters of a name component without crossing directory boundaries. For example, the *.csv pattern matches a path representing a file name ending in .csv, and *.* matches all file names containing a period.The ** characters match zero or more characters across directories. Therefore, it matches all files or directories in the current directory and its subdirectories. For example, /home/** matches all files and directories in the /home/ directory. The ? character matches exactly one character of a name component. For example, 'foo.?' matches file names that start with 'foo.' and are followed by a single-character extension. The \ character is used to escape characters that would otherwise be interpreted as special characters. For example, the expression \\ matches a single backslash, and \{ matches a left brace. The ! character is used to exclude matching files from the output. The [ ] characters form a bracket expression that matches a single character of a name component out of a set of characters. For example, '[abc]' matches 'a', 'b', or 'c'. The hyphen (-) may be used to specify a range, so '[a-z]' specifies a range that matches from 'a' to 'z' (inclusive). These forms can be mixed, so '[abce-g]' matches 'a', 'b', 'c', 'e', 'f' or 'g'. If the character after the [ is a !, it is used for negation, so '[!a-c]' matches any character except 'a', 'b', or 'c'. The '*', '?', and '\' characters match within a bracket expression. The '-' character matches itself if it is the first character within the brackets, or the first character after the !, if negating. The '{ }' characters are a group of sub-patterns where the group returns a match if any sub-pattern in the group matches the contents of a target directory. The ',' character is used to separate sub-patterns. Groups cannot be nested. For example, the pattern '*.{csv, json}' matches file names ending with '.csv' or '.json'. Leading dot characters in a file name are treated as regular characters in match operations. For example, the '*' glob pattern matches the file name ".login". All other characters match themselves. Examples: '*.csv' matches all files with a CSV extension in the current directory only. '**.csv' matches all files with a csv extension in the current directory and all its subdirectories. *[!{.pdf,.tmp}] excludes all files with the extension PDF or TMP. |
File
Default Value: N/A
| String/Expression/Suggestion | Specify the file name or a relative path to a file under the directory specified in the Directory property. It should not start with a URL separator "/". The value of the File property depends on the name of the directory specified in the Directory property and the criterion specified in the File filter property. |
User Impersonation
Default Value: Deselected | Checkbox | Select this checkbox to enable user impersonation. Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, refer to the section on User Impersonation below. |
Delete Directory Default Value: Deselected | Checkbox/Expression | Select this checkbox to enable you to delete all the paths in the specified directory. |
Number Of Retries
Default Value: 0 Example: 12 | Integer/Expression | Specify the maximum number of attempts to be made to receive a response. |
Retry Interval (seconds)
Default Value: 1 Example: 30 | Integer/Expression | Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. |
Snap Execution Default Value: Execute Only | Dropdown list | Select one of the following three modes in which the Snap executes:
|
Troubleshooting
Error | Reason | Resolution |
---|---|---|
Remote filesystem access failed. | The user credentials or URL might be incorrect, or the remote server may be inaccessible. It indicates a problem with the communication between the nodes in your Hadoop cluster or an issue with the underlying HDFS. | Check the user credentials and URL and retry. Check the permissions and access rights of the Hadoop files and directories. Ensure that you have the required permissions to access and modify the data. |
A directory is not a valid string. | The expression or value specified in the Directory property is either not existing in HDFS or not accessible. | Please check if a valid expression is entered in the Directory property and if the correct document data is at the input view. |
Deleting multiple JSON files from Azure Data Lake Storage
In the given scenario, multiple JSON files with file names containing special characters are created for uploading to the Azure Data Lake Storage.
Configure the HDFS Writer Snap with specific details, such as the destination directory where the files should be added in the Azure Data Lake Storage. You can see that the file is written to the Azure Data Lake Storage in the output preview.
Snap configuration | Output preview |
---|---|
|
|
You can delete the same file from the Azure Data Lake Storage with the HDFS delete Snap.
Snap Configuration | Output preview |
---|---|
|
|