File Reader

On this Page

Snap type:

Read

Description:

This Snap reads any type of data from various sources (such as SLDB, HTTP, S3, SFTP, HDFS, etc.) and produces a binary data stream at the output.

  • Expected upstream Snaps: Upstream Snap is optional. Any Snap with a document output view can be connected upstream.

  • Expected downstream Snaps: Any Snap with a binary input view can be connected downstream, such as File Writer, CSV Parser, JSON Parser, XML Parser.

  • Expected input: The Snap does not require input data. Input documents may be used to evaluate any JavaScript expression in the File property.

  • Expected output: Binary data read from the source specified in the File property with header information about the binary stream. The binary data and header information can be previewed at the output of the Snap.

An example of the output preview on the File property value of "http://www.facebook.com" is as follows:

By clicking the link "Preview binary0...", you can preview the content of the binary output data, an HTML text in this example.


Prerequisites:

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, the following line should be added to global.properties and the jcc (node) restarted:
jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

Please note this feature is supported in the EC2-type Groundplex only.

For more information on IAM Roles, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html

Support and limitations:

  • Works in Ultra Task Pipelines.

  • For most file protocols, the Snap behaves the same in both Snaplex and Groundplex. However, the HDFS protocol works only in the Groundplex. The Hadoop cluster must be open to the Groundplex server instance without any authentication.

  • When reading a file over HTTP, the File Reader Snap displays an error if the number of bytes consumed does not match the Content-Length header value present in the response.

Known Issues:This Snap fails for SMB file path with the error: unable to create new native thread.

Account: 

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap supports several account types, as listed in the table below, or no account. See Configuring Binary Accounts for information on setting up accounts that work with this Snap.

Account types supported by each protocol are as follows:  

Protocol

Account types

sldb

no account

s3

AWS S3, S3 Dynamic

ftp

Basic Auth

sftp

Basic Auth, SSH Auth 

ftps

Basic Auth

hdfs

no account

http

no account, Basic Auth

https

no account, Basic Auth

smb

SMB

file

no account

wasb

Azure Storage

wasbs

Azure Storage

gs

Google Storage

adl

Azure Data Lake  

Required settings for account types are as follows:

Account Type

Settings

Basic Auth

Username, Password

AWS S3

Access-key ID, Secret key, Server-side encryption

S3 Dynamic

Access-key ID, Secret key, Security token, Server-side encryption

SSH Auth

Username, Private key, Key Passphrase

SMB

Domain, Username, Password

Azure Storage

Account name, Primary access key

Google Storage

Approval prompt, Application scope, Auto-refresh token 

Azure Data Lake

Tenant ID, Access ID, Secret Key

Views:

Input

This Snap has at most one document input view. It may contain value(s) to evaluate the JavaScript expression in the File property.

Output

This Snap has exactly one binary output view and provides the binary data stream read from the specified source.

Error

This Snap has at most one document error view and produces zero or more documents in the view. 

If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, resolution, and stacktrace:

{ resolution: "Check for URL syntax and file access permission" stacktrace: "java.io.FileNotFoundException: ... error: "Unable to read from shared/lead.s.csv" reason: "File not found on Snapxl.elastic.Snaplogic.com at /api/1/rest/slfs/QA/shared/lead.s.csv" }

Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

File

Required. This property is a URL for a regular file. It should start with a file protocol.

The supported file protocols are:

  • http:

  • https:

  • s3:

  • ftp:

  • ftps:

  • sftp: 

  • hdfs:

  • sldb: 

  • smb:

  • file: (only for use with a Groundplex)

  • wasb:

  • wasbs:

  • gs:

  • adl:

The File property can be a JavaScript expression which will be evaluated with values from the input view document and the pipeline parameters. The File property has the syntax:

        [protocol]://[host][:port]/[path]

Please note "://" is a separator between the file protocol and the rest of the URL and the host name and the port number should be between "://" and "/". The hostname and port number are omitted in the sldb and s3 protocols. If the port number is omitted, a default port for the protocol is used.

The File property should be an absolute path for all protocols except sldb. For sldb files, the Snap can access only files in the same project directory or the shared project directory, and cannot access files in other projects. 

The file:/// protocol is supported only on Groundplex. In Cloudplex configurations, please use sldb or other file protocols. When using the file:/// protocol, the file access is conducted using the permissions of the user in whose name the Snaplex is running (by default Snapuser). File system access is to be used with caution, and it is the customer's own responsibility to ensure that file system is cleaned up after use. 

Example

  • If a pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the same project, enter "asset.json" or "sldb:///asset.json".

  • If a pipeline is created in a project other than the shared project and you want to read the "asset.json" file from the shared project, enter "shared/asset.json" or "sldb:///shared/asset.json".

  • If a pipeline is created in the shared project and you want to read the "asset.json" file from the shared project, enter "asset.json" or "sldb:///asset.json".

  • s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>/<file_name>

    For region names and their details, see AWS Regions and Endpoints.
    Example: s3:///mybucket@s3.eu-west-1.amazonaws.com/test.json
  • sftp://ftp.snaplogic.com:22/dir/filename

  • smb://smb.snaplogic.com:445/test_files/csv/input.csv

  • $filename (The value of the $filename is obtained from the input document and the document should have an entry with the "filename" key.)

  • _filename (A key/value pair with "filename" key should be defined as a pipeline parameter.)

  • file:///D:/testFolder/  (if the Snap is executed in the Windows Groundplex and needs to access D: drive)

  • wasb:///Snaplogic/testDir/sample.csv  (to read 'sample.csv' file in the 'testDir' folder in the 'Snaplogic' container)

  • gs:///mybucket/csv/test.csv (to read 'test.csv' file in the 'csv/' folder of the 'mybucket' bucket)

  • adl://storename/folder/filename (to read the file from a location of the storage)

Default value:  [None]

Prevent URL encoding

When enabled, this will prevent the Snap from automatically URL encoding the file path (including the query string if it exists). Enable this setting to use the file path value as-is.  

When disabled, the following are some of the common characters that are automatically encoded by the Snap: 

Character name 

Character  

URL Encoded value

backslash    

  \

 %5C

Pound

 #   

 %23

space       


 %20 

percent   

 %   

  %25 

Left-angle

<

%3C

Right-angle

>

%3E

Left-angle

[

%5B

Right-square

]

%5D

Right-curly

{

%7B

Right-curly

}

%7D

And these are some of the characters that are not automatically encoded by the Snap:

Character name 

Character  

URL Encoded value

semi-colon    

 ;   

 %3B

question mark     

?

 %3F

forward slash      

/

 %2F

colon       

 : 

  %3A 

ampersand      

 &  

%26

equals   

 =  

%3D

plus        

+  

%2B

dollar   

 $    

 %24

comma  

   ,     

%2C

Default value: Not selected  

Enable staging

If selected, the Snap downloads the source file into a local temporary file. When the download is completed, it streams the data from the temporary file to the output view. This property prevents the Snap from being blocked by slow downstream pipeline. The local disk should have sufficient free space as large as the expected file size. 

Default value: Not selected

Number of retries

Specifies the maximum number of retry attempts that the Snap must make in case there is a network failure, and the Snap is unable to read the target file.

If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.

Example:  3

Minimum value: 0

Default value: 0

Retry interval (seconds)

Specifies the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.

Example:  3

Minimum value: 1

Default value: 1

Advanced properties

Use this field set to define specific settings for polling files. Click to add a new row for defining an advanced property. This field set comprises the following fields:
  • SAS URI
  • Exit on first matches

Specifies the URI of the Shared Access Storage (SAS) to be accessed. Click '+' to add the SAS URI. 

Supported SAS types are:

  • Service SAS on container
  • Service SAS on blob
  • Account SAS

Example: https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-
29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60
-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D

Default value: N/A

Indicates how the Snap must be executed. Available options are:

  • Validate & Execute: Performs limited execution of the Snap (up to 50 records) during Pipeline validation; performs full execution of the Snap (unlimited records) during Pipeline execution.
  • Execute only: Performs full execution of the Snap during Pipeline execution; does not execute the Snap during Pipeline validation.
  • Disabled: Disables the Snap and, by extension, its downstream Snaps.

Default value: Validate & Execute

Examples

SFTP File Read

Example pipeline for an SFTP file read as shown below:


Sample for AWS S3 Support


See Also

Snap Pack History