Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Snap type:

Read

 

Description:

This Snap polls a directory looking for files matching the specified pattern.

  • Expected upstream Snaps: Any Snap with a document output view, such as Mapper, JSON Generator.
  • Expected downstream Snaps: Any Snap with a document input view, such as File Reader, Mapper, JSON Formatter.
  • Expected input: An optional document to evaluate expressions in the Directory and/or File filter properties. Please note that each input document will trigger the execution of the Snap.
  • Expected output: A full path in each document as a value for a key "path". If there are multiple files matching the filter, the same number of documents will be provided in the output view after each interval.


Code Block
[
        {
                "path" :  "sftp://sftp.smart.com/home/voo/test1.csv"
        },
        {
                "path" :  "sftp://sftp.smart.com/home/voo/test2.csv"
        }
]
 
Prerequisites:

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain the access rights to the S3 buckets. To enable this feature, the following line should be added to global.properties and the jcc (node) restarted:
jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

Please note this feature is supported in the EC2-type Groundplex only.

For more information on IAM Roles, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

 

Support and limitations:
  • Ultra pipelines: May work in Ultra Pipelines.
  • Spark mode: Not supported in Spark mode.

 

Account: 

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports several account types, as listed in the table below, or no account. See Binary Account for information on setting up these types of accounts. Account types supported by each protocol are as follows:

 

ProtocolAccount types
sldbno account
s3AWS S3
ftpBasic Auth
sftpBasic Auth, SSH Auth 
ftpsBasic Auth
hdfsno account
webhdfsno account
smbSMB
wasbAzure Storage
wasbsAzure Storage
gs

Google Storage

fileLocal file system

Required settings for account types are as follows:

 

Account TypeSettings
Basic AuthUsername, Password
AWS S3Access-key ID, Secret key
SSH AuthUsername, Private key
SMBDomain, Username, Password
Azure StorageAccount name, Primary access key
Google StorageApproval prompt, Application scope, Auto-refresh token
(Read-only properties are Access token, Refresh token, Access token expiration, OAuth2 Endpoint, OAuth2 token and Access type.)


 

Views:


InputThis Snap has at most one document input view.
OutputThis Snap has exactly one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view. If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, resolution, and stacktrace.

 

Settings

Label


 

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Directory 

 

This property is a URL path to the directory where files will be searched. The supported file protocols are:

  • s3:
  • file:
  • ftp:

  • ftps:
  • sftp: 
  • hdfs:
  • webhdfs:
  • sldb: 

  • smb:

  • wasb:

  • wasbs:

  • gs:

The property can be a JavaScript expression which will be evaluated with values from the input view document and the pipeline parameters. The property should have the syntax:

        [protocol]://[host][:port]/[path]

Please note "://" is a separator between the file protocol and the rest of the URL and the host name and the port number should be between "://" and "/". If the port number is omitted, a default port for the protocol is used. The hostname and port number are omitted in the sldb and s3 protocols.

 

This property should be an absolute path for all protocols except sldb. For sldb, the Snap can access only the same project directory or the shared project directory, and cannot access other project directories.


You may leave this property blank to indicate the current sldb project where the pipeline belongs to.

 

Example

  • If you want this property to refer to the sldb project (or shared project) directory where the pipeline of this Snap belongs to, enter "sldb:///" or leave it blank.
  • If the pipeline is created in a project other than the shared project and you want this property to refer to the shared project, enter "shared" or "sldb:///shared".
  • s3:///[bucket_name]/[dir_path]
  • sftp://ftp.snaplogic.com:22/home/test/dir
  • ftp://ftp.snaplogic.com/test/csv
  • $directory (The value of the $directory is obtained from the input document and the document should have an entry with the "directory" key. You must press the '=' button.)
  • _directory (A key/value pair with "directory" key should be defined as a pipeline parameter. You must press the '=' button.)
  • file:///D:/testFolder/  (if the Snap is executed in the Windows Groundplex and needs to access D: drive)
  • wasb:///Snaplogic/testDir/ or wasbs:///Snaplogic/testDir/ (if the name of the container is 'Snaplogic')
  • gs:///testBucket/testDir/ (if the bucket name is 'testBucket')

Default value:  [None]

 

File filter

Required.

A GLOB pattern to be applied to select one or more files in the directory. The File filter property can be a JavaScript expression which will be evaluated with values from the input view document.
Example:

  • *.txt
  • ab????xx.csv

Default value [None]

 

Polling interval in seconds

Required.

A time interval in seconds to search the directory

Example: 10

Default value: 30

 

Polling timeout

Required.

A time to end polling. Its unit is selected in the next property.

Example:

  • -1 (to poll indefinitely)
  • 0 (to poll once)
  • 60 (its unit shown in the next property)

Default value: 30

 

Polling-timeout unit

Unit for polling timeout. Allowed values are SECONDS, MINUTES and HOURS.

Example: SECONDS

Default value: MINUTES

 

...