In this article
Table of Contents |
---|
minLevel | 1 |
---|
maxLevel | 3 |
---|
outline | false |
---|
style | default |
---|
type | list |
---|
printable | true |
---|
|
...
Multiexcerpt include macro |
---|
macro_uuid | 36fcee4c-2e9a-48ac-898f-956567780f54 |
---|
name | EC2Prerequisite |
---|
page | File Reader |
---|
|
...
Support for Ultra Pipelines
Works in Ultra Pipelines.
Limitations
For S3 folders, the Snap currently supports polling the target directory for a maximum of 10,000 files. If there are more than that, the Snap does not provide any output.
...
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|
Input | Document | | | An optional document to evaluate expressions in the Directory and/or File filter properties. Note that each input document will trigger the execution of the Snap. |
Output | Document | | Mapper File Reader JSON Formatter
| A full path in each document as a value for a key "path". If multiple files match the filter, the same number of documents will be provided in the output view after each interval. Code Block |
---|
[
{
"path" : "sftp://sftp.smart.com/home/voo/test1.csv"
},
{
"path" : "sftp://sftp.smart.com/home/voo/test2.csv"
}
] |
|
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab: Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error. Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records. Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.
Learn more about Error handling in Pipelines. |
Snap Settings
Info |
---|
Asterisk ( * ): Indicates a mandatory field. Suggestion icon (): Indicates a list that is dynamically populated based on the configuration. Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic. Add icon ( ): Indicates that you can add fields in the field set. Remove icon ( ): Indicates that you can remove fields from the field set. Upload icon ( ): Indicates that you can upload files.
|
...
Field Name | Field Type | Description |
---|
Label* Default Value: File Poller Example: File Poller
| String | Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. |
---|
Directory Default Value: N/A Example: s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>
For region names and their details, see AWS Regions and Endpoints. sftp://ftp.snaplogic.com:22/home/test/dir ftp://ftp.snaplogic.com/test/csv
$directory
_directory (A key-value pair with "directory" key should be defined as a pipeline parameter. Ensure that the '=' button is enabled when using parameters.)
file:///D:/testFolder/ (if the Snap is executed in the Windows Groundplex and needs to access the D: drive)
wasb:///Snaplogic/testDir/ or wasbs:///Snaplogic/testDir/
gs:///testBucket/testDir/
| String/Expression | Specify the URL path to the directory where files will be searched in the following format: [protocol]://[host][:port]/[path] The supported file protocols are: s3: file: ftp: ftps: sftp: hdfs: sldb: smb: wasb: wasbs: gs:
Note |
---|
The protocol and the rest of the URL should be separated by "://". The host name and the port number should be between "://" and "/". Not all file protocols support "//", use "///" instead. For example, if polling files in SLDB and S3 (see the examples shown above). This property should be an absolute path for all protocols except SLDB. For SLDB, the Snap can access only the same project directory or the shared project directory, and cannot access other project directories. If you want this property to refer to the SLDB project (or shared project) directory where the pipeline of this Snap belongs to, enter "sldb:///" or leave it blank. If the pipeline is created in a project other than the shared project and you want this property to refer to the shared project, enter "shared" or "sldb:///shared". If the port number is omitted, a default port for the protocol is used. The hostname and port number are omitted in the SLDB and S3 protocols. Ensure the file name, folder name, or the file path does not contain '?' character because it is not fully supported and when present, the Snap might fail. This Snap supports S3 Virtual Private Cloud (VPC) endpoint. For example, s3://my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com
|
|
---|
File filter* Default Value: N/A Example:
| String/Expression | Specify a GLOB pattern to be applied to select one or more files in the directory. The File filter property can be a JavaScript expression which will be evaluated with values from the input view document. [None] Excerpt |
---|
Expand |
---|
title | Glob Pattern Interpretation Rules |
---|
| Use glob patterns in this filter to select one or more files in the directory. For example: - *.java Matches file names ending in .java.
- *.* Matches file names containing a dot.
- *.{java,class} Matches file names ending with .java or .class.
- foo.? Matches file names starting with foo. and a single character extension.
The following rules are used to interpret glob patterns: - The * character matches zero or more characters of a name component without crossing directory boundaries.
- The ? character matches exactly one character of a name component.
- The backslash character (\) is used to escape characters that would otherwise be interpreted as special characters. For example, the expression \\ matches a single backslash, and "\{" matches a left brace.
- The ! character is used to exclude matching files from the output.
- The [ ] characters are a bracket expression that match a single character of a name component out of a set of characters. For example, [abc] matches 'a', 'b', or 'c'. The hyphen (-) may be used to specify a range; so, [a-z] specifies a range that matches from 'a' to 'z' (inclusive). These forms can be mixed; so, [abce-g] matches 'a'", 'b', 'c', 'e', 'f' or 'g'. If the character after the '[' is an '!', then it is used for negation; so, [!a-c] matches any character except 'a', 'b', or 'c'.
- Within a bracket expression, the *, ?, and \ characters match themselves. The (-) character matches itself if it is the first character within the brackets, or the first character after the '!', if negating.
- The { } characters are a group of subpatterns, where the group matches if any subpattern in the group matches. The ',' character is used to separate subpatterns. Groups cannot be nested.
- Leading period / dot characters in file names are treated as regular characters in match operations. For example, the '*' glob pattern matches file name '.login'.
- Some special characters are not supported. A partial list of unsupported special characters: #, ^, â, ê, î, ç, ¿, SPACE.
|
|
|
---|
Polling interval in seconds* Default value: 30 Example: 10
| Integer | Specify the time gap between each poll request (in seconds). |
---|
Polling timeout* Default value: 30 Example: 20 | Integer | Specify a period of time after which file polling must end. If the Polling timeout is set to: Greater than 0, for example, 60 seconds, the polling stops after 60 seconds. 0, the Snap processes only one poll. -1, the Snap polls continually.
Note |
---|
Configure this field based on the expected number of files in the target directory. If there are many files and this field's value is small, the Snap may complete the operation and stop before the file is found. |
|
---|
Polling-timeout unit Default value: MINUTES Example: SECONDS
| Dropdown list | Specify a value for polling timeout. |
---|
Only Output on Change Default value: Selected | Checkbox | Select this check box to instruct the Snap to provide an output only when there is a change in the contents of the polled directory. When selected, the Snap provides an output during its initial run if it finds matching documents. However, it provides polling results in the next run only if the polled directory has newer files that match the pattern specified. Default value: Selected |
---|
Number of retries Minimum value: 0 Default value: 0 Example: 3 | Integer | Specify the maximum number of retry attempts that the Snap must make in case there is a network failure, and the Snap is unable to read the target file. If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed. Ensure that the local drive has sufficient free disk space to store the temporary local file. |
---|
Retry interval (seconds) Minimum value: 1 Default value: 1 Example: 3 | Integer | Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.
|
---|
Advanced properties | Use this field set to define specific settings for polling files. |
---|
Properties | Dropdown list | Choose either of the following options: SAS URI Exit on first matches
|
Values | String/Expression | Note |
---|
Ensure that the URI is specified in the format described here. If you provide SAS URI in this field, then: the Primary access key given in the account settings is overridden during authentication. If you do not provide the SAS URI, the Snap considers the Primary access key in the account settings. only this URL is used and the Snap ignores the SAS URI settings that you have configured in the associated account.
|
If you choose Exit on first matches, set this field to true to stop the Snap from executing after writing the first list of file paths that match the filter pattern to the output view. If the field is not configured or is set to false, then the Snap continues to poll the directory until the Polling timeout is reached.
|
Snap execution
Default Value: Validate & Execute Example: Execute only | Dropdown list | Select one of the following three modes in which the Snap executes: Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime. Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data. Disabled: Disables the Snap and all Snaps that are downstream from it.
|
---|
...
Error | Reason | Resolution |
---|
Algorithm negotiation fail: algorithmName="server_host_key" jschProposal="<algorithms>" serverProposal="ssh-rsa"
| The library that we use for SFTP connections no longer supports deprecated signature protocols by default. ( This changed with the 4.33 GA release. | Add the algorithm to the serverProposal in the global . properties file. You can also enable support for RSA-SHA1 authentication in the Node Properties tab on the Update Snaplex dialog in SnapLogic Manager. In the Node Properties tab of your target Snaplex, add the following key/value Key Value pair under Global Properties: Key: jcc.jvm_options Value: -Djsch.server_host_key=ssh-rsa -Djsch.client_pubkey=ssh-rsa Click Update, and then restart the Snaplex node.
Learn more: Configuration Options |
com.amazonaws.AbortedException - Cannot access AWS S3 service
| If you have set the Polling Timeout value to a few seconds, it results in the S3 request getting canceled. | Increase the value of Polling Timeout (in seconds) for the Snap to work successfully. We recommend that you set the Polling Timeout value to the default value of 30 minutes or more to fetch all the data from S3 data. |
Examples
...
Write a List of Files in a Specific Directory
...