File Poller

In this article

Overview

You can use this Snap to poll the target directory and find file names matching the specified pattern.

  • The Snap continues polling at the intervals specified in the Polling interval property until the timeout (specified in the Polling timeout property) is reached. After polling is done, the Snap lists all files whose names match the specified pattern.

  • This Snap can be used in situations where an operation must be triggered when a specific file is found in the target directory. The pipeline can be configured with additional Snaps to process the Snap's output and delete the matched file before the Polling interval value is reached.

  • The File Poller Snap uses the case-sensitive filter pattern, regardless of the operating system.

  • This Snap polls the target directory only. Subdirectories, if any, are ignored. Use the Directory Browser Snap if you want to poll files in the directory and all subdirectories, and to poll a directory only once.

snap-file-poller-overview.png

Snap Type

The File Poller Snap is a Read-type Snap.

Prerequisites

Support for Ultra Pipelines

Works in Ultra Pipelines.

Limitations

For S3 folders, the Snap currently supports polling the target directory for a maximum of 10,000 files. If there are more than that, the Snap does not provide any output.

Known Issues

The Snap is expected to fail if there is no account selected. However, the Snap may execute successfully without any account if all the following conditions exist:

  • The Snap is executed in an EC2-instance Snaplex where your pipeline runs with an IAM role.

  • The S3 bucket accessed by the Snap includes the necessary permissions for use with the specific IAM role.

  • The following global property is set as a node property in the plex:
    jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

Behavior Change

The File Poller Snap now honors the value specified in the Polling timeout field instead of polling indefinitely in case of poor file polling operations. To handle indefinite polling operations the polling is done in a separate thread. However, when the execution time exceeds the value specified in the Polling timeout, a timeout exception is written to the log to prevent the polling from getting stuck and the Snap continues polling depending on the Polling timeout.

  • If the Polling timeout value is greater than 0, the Snap polls until the end of polling window.

  • If it is less than 0, the Snap stops polling.

  • If it is -1, the Snap continues polling.

Supported Protocols

Account types supported by each protocol are as follows:

Protocol

Account types

Protocol

Account types

sldb

no account

s3

AWS S3

ftp

Basic Auth

sftp

Basic Auth, SSH Auth 

ftps

Basic Auth

hdfs

no account

smb

SMB

wasb

Azure Storage

wasbs

Azure Storage

gs

Google Storage

file

Local file system

The FTPS file protocol works only in explicit mode. The implicit mode is not supported.

Required settings for account types are as follows:

Account Type

Settings

Account Type

Settings

Basic Auth

Username, Password

AWS S3

Access-key ID, Secret key

SSH Auth

Username, Private key, Key Passphrase

SMB

Domain, Username, Password

Azure Storage

Account name, Primary access key

Google Storage

Approval prompt, Application scope, Auto-refresh token
(Read-only properties are Access token, Refresh token, Access token expiration, OAuth2 Endpoint, OAuth2 token and Access type.)

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

 

  • Min: 1

  • Max: 1

  • Mapper

  • JSON Generator

An optional document to evaluate expressions in the Directory and/or File filter properties. Note that each input document will trigger the execution of the Snap.

Output

Document

 

  • Min: 1

  • Max: 1

  • Mapper

  • File Reader

  • JSON Formatter

A full path in each document as a value for a key "path". If multiple files match the filter, the same number of documents will be provided in the output view after each interval.

[ { "path" : "sftp://sftp.smart.com/home/voo/test1.csv" }, { "path" : "sftp://sftp.smart.com/home/voo/test2.csv" } ]

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

 

Field Name

Field Type

Description

Label*


Default ValueFile Poller
Example: File Poller

String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory 

 

Default Value N/A
Example

  • s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>

    For region names and their details, see AWS Regions and Endpoints.

  • sftp://ftp.snaplogic.com:22/home/test/dir

  • ftp://ftp.snaplogic.com/test/csv

  • $directory 

  • _directory (A key-value pair with "directory" key should be defined as a pipeline parameter. Ensure that the '=' button is enabled when using parameters.)

  • file:///D:/testFolder/  (if the Snap is executed in the Windows Groundplex and needs to access the D: drive)

  • wasb:///Snaplogic/testDir/ or wasbs:///Snaplogic/testDir/

  • gs:///testBucket/testDir/ 

 

 

String/Expression

Specify the URL path to the directory where files will be searched in the following format: 

  [protocol]://[host][:port]/[path]

The supported file protocols are:

  • s3:

  • file:

  • ftp:

  • ftps:

  • sftp: 

  • hdfs:

  • sldb: 

  • smb:

  • wasb:

  • wasbs:

  • gs:

File filter*


Default Value: N/A
Example:

  • *.txt

  • ab????xx.csv

String/Expression

Specify a GLOB pattern to be applied to select one or more files in the directory. The File filter property can be a JavaScript expression which will be evaluated with values from the input view document.  [None]

Polling interval in seconds*


Default value: 30
Example: 10

Integer

Specify the time gap between each poll request (in seconds).

 

Polling timeout*

 

Default value: 30
Example: 20

Integer

Specify a period of time after which file polling must end. If the Polling timeout is set to:

  • Greater than