On this Page
Table of Contents | ||||
---|---|---|---|---|
|
Snap type:
Read
Description:
This Snap reads any type of data from various sources (such as SLDB, HTTP, S3, SFTP, HDFS, etc.) and produces a binary data stream at the output.
Expected upstream Snaps: Upstream Snap is optional. Any Snap with a document output view can be connected upstream.
Expected downstream Snaps: Any Snap with a binary input view can be connected downstream, such as File Writer, CSV Parser, JSON Parser, XML Parser.
Expected input: The Snap does not require input data. Input documents may be used to evaluate any JavaScript expression in the File property.
Expected output: Binary data read from the source specified in the File property with header information about the binary stream. The binary data and header information can be previewed at the output of the Snap.
An example of the output preview on the File property value of "http://www.facebook.com" is as follows:
Code Block |
---|
[ { "": "Preview binary0...", "content-type": "text/html; charset=utf-8", "x-frame-options": "DENY",
"connection": "keep-alive", "transfer-encoding": "chunked", "date": "Thu, 23 Oct 2014 00:24:40
GMT", "content-location": "https://www.facebook.com", "pragma": "no-cache", "p3p": "CP=\"Facebook
does not have a P3P policy. Learn why here: http://fb.me/p3p\"", "cache-control": "private, no-
cache, no-store, must-revalidate", "x-xss-protection": "0", "x-content-type-options": "nosniff", "x-
fb-debug":
"N6wiHWAvz9kzpPUoM5vTm+yZzCZyiSrHXFXumHQixfMd0Qi+VDm514PkrrmQu2ISuuMTTFtUTqDZgDVG4blPTw==",
"expires": "Sat, 01 Jan 2000 00:00:00 GMT", "set-cookie": "reg_ext_ref=deleted; expires=Thu, 01-Jan-
1970 00:00:01 GMT; Max-Age=0; path=/; domain=.facebook.com" } ] |
By clicking the link "Preview binary0...", you can preview the content of the binary output data, an HTML text in this example.
Prerequisites:
Multiexcerpt macro | ||
---|---|---|
| ||
IAM Roles for Amazon EC2The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, the following line should be added to global.properties and the jcc (node) restarted: Please note this feature is supported in the EC2-type Groundplex only. For more information on IAM Roles, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html |
Support and limitations:
Works in Ultra Task Pipelines.
For most file protocols, the Snap behaves the same in both Snaplex and Groundplex. However, the HDFS protocol works only in the Groundplex. The Hadoop cluster must be open to the Groundplex server instance without any authentication.
When reading a file over HTTP, the File Reader Snap displays an error if the number of bytes consumed does not match the Content-Length header value present in the response.
unable to create new native thread
.Views:
Input | This Snap has at most one document input view. It may contain value(s) to evaluate the JavaScript expression in the File property. |
---|---|
Output | This Snap has exactly one binary output view and provides the binary data stream read from the specified source. |
Error | This Snap has at most one document error view and produces zero or more documents in the view. If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, resolution, and stacktrace:
|
Snap Settings
Label*
Excerpt |
---|
Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline. |
File*
Specify the URL for a regular file. It should start with a file protocol.
The supported file protocols are:
http:
https:
s3:
ftp:
ftps:
sftp:
hdfs:
sldb:
smb:
file: (only for use with a Groundplex)
wasb:
wasbs:
gs:
adl:
Reading files from Project and Shared Project Spaces
If a PipelineIn this article
Table of Contents | ||||
---|---|---|---|---|
|
Overview
This Snap reads any type of data from various sources (such as SLDB, HTTP, S3, SFTP, HDFS, etc.) and produces a binary data stream at the output.
Snap Type
The File Reader Snap is a Read type Snap.
Prerequisites
Multiexcerpt macro | ||
---|---|---|
| ||
IAM Roles for Amazon EC2The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, the following line should be added to global.properties and the jcc (node) restarted: Please note this feature is supported in the EC2-type Groundplex only. For more information on IAM Roles, see http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/iam-roles-for-amazon-ec2.html |
Support for Ultra Pipelines
Works in Ultra Pipelines.
Limitations
For most file protocols, the Snap behaves the same in both Snaplex and Groundplex. However, the HDFS protocol works only in the Groundplex. The Hadoop cluster must be open to the Groundplex server instance without any authentication.
When reading a file over HTTP, the File Reader Snap displays an error if the number of bytes consumed does not match the Content-Length header value present in the response.
Known Issues
This Snap fails for SMB file path with the error: unable to create new native thread
.
Snap Views
Input | Document |
| Upstream Snap is optional. Any Snap with a document output view can be connected upstream. | Input may contain value(s) to evaluate the JavaScript expression in the File property. | ||
Output | Document |
|
| Binary data read from the source specified in the File property with header information about the binary stream. An example of the output preview on the File property value of "http://www.facebook.com" is as follows:
| ||
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Field | Field Type | Description | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Label* Default Value: File Reader | String |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
File* Default Value: N/A
| same
| Pipeline
| ".||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Info |
| parameter.)
Note |
---|
The File value should be an absolute path for all protocols except for SLDB. For files in SLDB, the Snap can read only files in the same Project Directory or the Shared Project Directory. It cannot access files from other Projects. Typically, the file names in the Reader Snaps are read from incoming document which might have a structure different from the relative path. For optimal results, we recommend that you build absolute paths to their projects and then add the file name. |
Info |
---|
|
Example:
If a pipeline is created in a project other thanparameter.)
file:///D:/testFolder/
(if the Snap is executed in the Windows Groundplex and needs to access D: drive)wasb:///Snaplogic/testDir/sample.csv
(to read 'sample.csv' file in the 'testDir' folder in the 'Snaplogic' container)gs:///mybucket/csv/test.csv
(to read 'test.csv' file in the 'csv/' folder of the 'mybucket' bucket)adl://storename/folder/filename
(to read the file from a location of the storage)
Specify the URL for a regular file that must begin with a file protocol. The supported file protocols are:
http:
https:
s3:
ftp:
ftps:
sftp:
hdfs:
sldb:
smb:
file: (only for use with a Groundplex)
wasb:
wasbs:
gs:
adl:
This Snap supports S3 Virtual Private Cloud (VPC) endpoint.
Info |
---|
Reading files from Project and Shared Project Spaces
|
|
|
If a pipeline is created in the shared project and you want to read the "asset.json" file from the shared project, enter "asset.json" or "sldb:///asset.json".
s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>/<file_name>
Example:
s3:///mybucket@s3.eu-west-1.amazonaws.com/test.json
sftp://ftp.snaplogic.com:22/dir/filename
smb://smb.snaplogic.com:445/test_files/csv/input.csv
$filename (The value of the $filename is obtained from the input document and the document should have an entry with the "filename" key.)
_filename
(A key/value pair with "filename" key should be defined as a pipeline parameter.)
file:///D:/testFolder/
(if the Snap is executed in the Windows Groundplex and needs to access D: drive)
wasb:///Snaplogic/testDir/sample.csv
(to read 'sample.csv' file in the 'testDir' folder in the 'Snaplogic' container)
gs:///mybucket/csv/test.csv
(to read 'test.csv' file in the 'csv/' folder of the 'mybucket' bucket)
adl://storename/folder/filename
(to read the file from a location of the storage)
Default value: [None]
|
Info |
---|
File value as an Expression The File value can be a JavaScript expression which is evaluated with values from the input view document and the Pipeline parameters. The syntax for file value is:
|
Note |
---|
The File value should be an absolute path for all protocols except for SLDB. For files in SLDB, the Snap can read only files in the same Project Directory or the Shared Project Directory. It cannot access files from other Projects. Typically, the file names in the Reader Snaps are read from incoming document which might have a structure different from the relative path. For optimal results, we recommend that you build absolute paths to their projects and then add the file name. |
Info |
---|
|
Prevent URL encoding
Default value: Deselected
When enabled, this will prevent the Snap from automatically URL encoding the file path (including the query string if it exists). Enable this setting to use the file path value as-is.
When disabled, the following are some of the common characters that are automatically encoded by the Snap:
Character name | Character | URL Encoded value |
---|---|---|
backslash | \ | %5C |
Pound | # | %23 |
space | %20 | |
percent | % | %25 |
Left-angle | < | %3C |
Right-angle | > | %3E |
Left-angle | [ | %5B |
Right-square | ] | %5D |
Right-curly | { | %7B |
Right-curly | } | %7D |
And these are some of the characters that are not automatically encoded by the Snap:
Character name | Character | URL Encoded value |
---|---|---|
semi-colon | ; | %3B |
question mark | ? | %3F |
forward slash | / | %2F |
colon | : | %3A |
ampersand | & | %26 |
equals | = | %3D |
plus | + | %2B |
dollar | $ | %24 |
comma | , | %2C |
Default value: Not selected
Enable staging
Default value: Deselected
If selected,
thethe Snap downloads the source file into a local temporary file. When the download is completed, it streams the data from the temporary file to the output view. This property
prevents theprevents the Snap from being blocked by slow downstream pipeline.
TheThe local disk should have sufficient free space as large as the expected file size.
Default value: Not selectedNote |
---|
Some Snaps may take a long time to process large amounts of data. This, in turn, could lead to connection timeouts, causing the pipeline to fail. Selecting this property saves the data on your local disk, enabling you to avoid such timeouts. |
timeouts. |
Number of retries
Default Value: 0
Example: 3
Specify the maximum number of retry attempts that the Snap must make in case there is a network failure, and the Snap is unable to read the target file.
If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.
Info |
---|
Ensure that the local drive has sufficient free disk space to store the temporary local file. |
Minimum value: 0
Retry interval (seconds)
Default Value:
1
Example: 3
Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.
Minimum value
: 1Default Value: 1
Example: 3
Advanced properties
- Properties
- Values
Multiexcerpt macro | ||
---|---|---|
| ||
The URI of the Shared Access Storage (SAS) to be accessed. Supported SAS types are:
|
Values
Specify the value for the SAS URI.
Default Value: N/A
Example: https://myaccount.blob.core.windows.net/sascontainer/sasblob.txt?sv=2015-04-05&st=2015-04-
29T22%3A18%3A26Z&se=2015-04-30T02%3A23%3A26Z&sr=b&sp=rw&sip=168.1.5.60
-168.1.5.70&spr=https&sig=Z%2FRHIX5Xcg0Mq2rqI3OlWTjEg2tYkboXr1P9ZUXDtkk%3D
Specify the value for the SAS URI.
Note |
---|
Ensure that the URI is specified in the format described here. If the SAS URI value is provided in the Snap settings, then the settings provided in the account (if any account is attached) are ignored. |
Snap Execution
Multiexcerpt include macro | ||||
---|---|---|---|---|
|
Note |
---|
|
Examples
Expand | ||||||
---|---|---|---|---|---|---|
| ||||||
HDFSFor hdfs:// file access, please use a SnapLogic on-premises Groundplex and make sure that its instance is within the Hadoop cluster and SSH authentication has already been established. You can access HDFS files in the same way as other file protocols in File Reader and File Writer Snaps. There is no need to use any account in the Snap.
An example for HDFS is:
hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/sample.csv |
SFTP File Read
Example pipeline for an SFTP file read as shown below:
Note |
---|
|
Sample for AWS S3 Support
See Also
Insert excerpt | ||||||
---|---|---|---|---|---|---|
|