HDFS Delete

In this article

Overview

You can use this Snap to delete the specified file, group of files, or directory from the supplied path and protocol in the Hadoop Distributed File System (HDFS), Azure Blob File System (ABFS), Windows Azure Storage Blob (WASB) and Azure Data Lake (ADL).

Snap Type

The Hadoop Distributed File System (HDFS) Delete Snap is a write-type Snap.

Prerequisites

None.

Support for Ultra Pipelines

Supports Ultra Pipelines. 

Limitations and Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • HDFS Reader

  • HDFS Writer

The file filter, file, and directory details of the file to be deleted.

Output

Document

  • Min: 1

  • Max: 1

  • ORC Writer

  • Snowflake Insert

The deleted file or a group of files.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current Pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

  • Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label*

 

Default Value: HDFS delete
Example: Hadoop delete

 

String

The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory

 

Default Value: hdfs://<hostname>:<port>/
Example:

  • hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/

String/Expression/Suggestion

Specify the URL for the HDFS directory. It should start with the HDFS file protocol in the following format:

  • hdfs://<hostname>:<port>/<path to directory>/

  • wasb:///<container name>/<path to directory>/

  • wasbs:///<container name>/<path to directory>/

  • adl://<container name>/<path to directory>/ 

  • abfs(s):///filesystem/<path>/

  • abfs(s)://filesystem@accountname.endpoint/<path>

The Directory property is used only in the Suggest operation. When you click the Suggestion icon, the Snap displays a list of subdirectories under the specific directory. It generates the list by applying the value specified in the File Filter property.

File filter

 

Default Value: *

Example: ?

 

String/Expression

Specify the Glob filter pattern. A file filter is a criteria to include or exclude specific files when processing data in HDFS.

Use glob patterns to display a list of directories or files when you click the Suggest icon in the Directory or File property. A complete glob pattern is formed by combining the value of the Directory property with the Filter property. If the value of the Directory property does not end with "/", the Snap appends one so that the value of the Filter property is applied to the directory specified by the Directory property.

The following rules are used to interpret glob patterns:

The * character matches zero or more characters of a name component without crossing directory boundaries. For example, the *.csv pattern matches a path representing a file name ending in .csv, and *.* matches all file names containing a period.The ** characters match zero or more characters across directories. Therefore, it matches all files or directories in the current directory and its subdirectories. For example, /home/** matches all files and directories in the /home/ directory.

The ? character matches exactly one character of a name component. For example, 'foo.?' matches file names that start with 'foo.' and are followed by a single-character extension.

The \ character is used to escape characters that would otherwise be interpreted as special characters. For example, the expression \\ matches a single backslash, and \{ matches a left brace.

The ! character is used to exclude matching files from the output. 

The [ ] characters form a bracket expression that matches a single character of a name component out of a set of characters. For example, '[abc]' matches 'a', 'b', or 'c'. The hyphen (-) may be used to specify a range, so '[a-z]' specifies a range that matches from 'a' to 'z' (inclusive). These forms can be mixed, so '[abce-g]' matches 'a', 'b', 'c', 'e', 'f' or 'g'. If the character after the [ is a !, it is used for negation, so '[!a-c]' matches any character except 'a', 'b', or 'c'.

The '*', '?', and '\' characters match within a bracket expression. The '-' character matches itself if it is the first character within the brackets, or the first character after the !, if negating.

The '{ }' characters are a group of sub-patterns where the group returns a match if any sub-pattern in the group matches the contents of a target directory. The ',' character is used to separate sub-patterns. Groups cannot be nested. For example, the pattern '*.{csv, json}' matches file names ending with '.csv' or '.json'.

Leading dot characters in a file name are treated as regular characters in match operations. For example, the '*' glob pattern matches the file name ".login".

All other characters match themselves.

Examples:

'*.csv' matches all files with a CSV extension in the current directory only.

'**.csv' matches all files with a csv extension in the current directory and all its subdirectories.

*[!{.pdf,.tmp}] excludes all files with the extension PDF or TMP.

File

 

Default Value: N/A
Example:

  • sample.csv

  • tmp/another.csv

  • $filename

String/Expression/Suggestion

Specify the file name or a relative path to a file under the directory specified in the Directory property. It should not start with a URL separator "/".  The value of the File property depends on the name of the directory specified in the Directory property and the criterion specified in the File filter property.

User Impersonation

 

Default Value: Deselected

Checkbox

Select this checkbox to enable user impersonation.

Hadoop allows you to configure proxy users to access HDFS on behalf of other users; this is called impersonation. When user impersonation is enabled on the Hadoop cluster, any jobs submitted using a proxy are executed with the impersonated user's existing privilege levels rather than those of the superuser associated with the cluster. For more information on user impersonation in this Snap, refer to the section on User Impersonation below.

Delete Directory

Default Value: Deselected

Checkbox/Expression

Select this checkbox to enable you to delete all the paths in the specified directory.

Number Of Retries

 

Default Value: 0

Example: 12

Integer/Expression

Specify the maximum number of attempts to be made to receive a response. 

Retry Interval (seconds)

 

Default Value: 1

Example: 30

Integer/Expression

Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception.

Snap Execution

Default Value: Execute Only
Example: Validate & Execute

Dropdown list

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, it performs full execution of the Snap (unlimited records) during pipeline runtime.

  • Execute only: Performs complete execution of the Snap during pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error

Reason

Resolution

Error

Reason

Resolution

Remote filesystem access failed.

The user credentials or URL might be incorrect, or the remote server may be inaccessible. It indicates a problem with the communication between the nodes in your Hadoop cluster or an issue with the underlying HDFS.

Check the user credentials and URL and retry. Check the permissions and access rights of the Hadoop files and directories. Ensure that you have the required permissions to access and modify the data.

A directory is not a valid string.

The expression or value specified in the Directory property is either not existing in HDFS or not accessible.

Please check if a valid expression is entered in the Directory property and if the correct document data is at the input view.

Deleting multiple JSON files from Azure Data Lake Storage

In the given scenario, multiple JSON files with file names containing special characters are created for uploading to the Azure Data Lake Storage.

Configure the HDFS Writer Snap with specific details, such as the destination directory where the files should be added in the Azure Data Lake Storage. You can see that the file is written to the Azure Data Lake Storage in the output preview.

Snap configuration

Output preview

Snap configuration

Output preview

 

 

You can delete the same file from the Azure Data Lake Storage with the HDFS delete Snap.

Snap Configuration

Output preview

Snap Configuration

Output preview

 

 

Downloads

  File Modified

File HDFS_MultiFile_Delete(abfs).slp

Aug 08, 2023 by Kalpana Malladi

Snap Pack History

Related Content

Â