Hadoop Directory Browser

Hadoop Directory Browser

This page is no longer maintained (May 13, 2026). For the most current information, go to https://docs.snaplogic.com/snaps/sp-hadoop/snap-hadoop-directory-browser.html

On this Page

Snap type

Read

Description

This Snap browses a given directory path in the Hadoop file system (using the HDFS protocol) and generates a list of all the files in the directory and subdirectories. Use this Snap to identify the contents of a directory before you run any command that uses this information.

Currently, the Hadoop Directory Browser Snap supports URIs using HDFS & ABFS (Azure Data Lake Storage Gen 2 ) protocols.

For example, if you need to iteratively run a specific command on a list of files, this Snap can help you view the list of all available files.

  • Path (string): The path to the directory being browsed.

  • Type (string): The type of file.

  • Owner (string): The name of the owner of the file.

  • Creation date (datetime): The date the file was created. In the Hadoop file system, this can often show up as 'null' due to limited API functionality.

  • Size (in bytes) (int): The size of the file.

  • Permissions (string): Read, Write, Execute.

  • Update date (datetime): Date of update.

  • Name (string): Name of the file.

Input and Output

  • Expected upstream Snaps: Any Snap that offers a directory URI. This can be even a CSV Generator with a collection of, say file names and their URIs.

  • Expected downstream Snaps: A document listing out attributes of the files contained in the directory specified.

  • Expected input: Directory Path to be browsed and the File Filter Pattern to be applied. For example: Directory Path: hdfs://hadoopcluster.domain.com:8020/<user>/<folder_details>; File Filter: *.conf

  • Expected output: The attributes of the files contained in the directory specified that matching the filter pattern.

 

Prerequisites

A Groundplex needs to be configured as a Hadoop client for this integration to work. The user executing the Snap must have at least Read permissions on the concerned directory.

Support and limitations

Works in Ultra Tasks.

Account

This Snap uses account references created on the Accounts page of the SnapLogic Manager to handle access to this endpoint. 

This Snap supports Azure Data Lake Gen2 OAuth2 and Kerberos accounts.

Views

Input

This Snap has at most one optional document input view. It contains values for the directory path to be browsed and the glob filter to be applied to select the contents.

Output

This Snap has exactly one output view that provides the various attributes (such as Name, Type, Size, Owner, Last Modification Time) of the contents of the given directory path. Only those contents are selected that match the given glob filter.

Error

This Snap has at most one document error view and produces zero or more documents in the view.

Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory 

 

The URL for the data source (directory). The Snap supports both HFDS and ABFS(S) protocols.

Syntax for a typical HDFS URL:

Syntax for a typical ABFS and an ABFSS URL:

When you use the ABFS protocol to connect to an endpoint, the account name and endpoint details provided in the URL override the corresponding values in the Account Settings fields.

Default value: [None]

File filter

Required. The GLOB pattern to be applied to select the contents (files/sub-folders) of the directory. You cannot recursively navigate the directory structures.

The File filter property can be a JavaScript expression, which will be evaluated with the values from the input view document.

Example:

  • *.txt

  • ab????xx.*x

  • *.[jJ][sS][oO][nN](as of the May 29th, 2015 release)

Default value: [None]

User Impersonation

Select this check box to enable user impersonation. For more information on working with user impersonation, click the link below.

Ignore empty result

If selected, no document will be written to the output view when the result is empty. If this property is not selected and the Snap receives an input document, the input document is passed to the output view. If this property is not selected and there is no input document, an empty document is written to the output view.

Default value: Selected

Snap Execution

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default ValueExecute only




Example: Validate & Execute