S3 File Reader

On this Page

Overview

The S3 File Reader Snap reads data from an S3 bucket. When you provide a Version ID, the Snap reads a specific version of an S3 file object.

Important:

We plan to introduce additional S3 features exclusively in Amazon S3 Snaps, while Binary Snaps with S3 support will not contain these updates. Therefore, we recommend you to use the Amazon S3 Snap Pack for all your S3 operations within your pipelines. However, Binary Snaps will be retained as is to maintain backward compatibility, but be aware that we will no longer provide S3 support for the Binary Snaps. Learn more: Migration from Binary Snaps to Amazon S3 Snaps.

Snap Type

The S3 File Reader Snap is a Read-type Snap that reads data from an S3 bucket.

Prerequisites

  • IAM Roles for Amazon EC2.

  • The IAM_CREDENTIAL_FOR_S3 feature is to access S3 files from Groundplex nodes hosted in the EC2 environment. No Access-key ID and Secret key in the AWS S3 account is needed.

  • The IAM credential stored in the EC2 metadata provides access rights to the S3 buckets. 

    • IAM role is supported only in the Groundplex nodes hosted in the EC2 environment.

    • The IAM Role stored on the EC2 instance requires List, Read, and Write permissions.

    • S3 account validation is not supported when you enable the IAM role property. 

Learn more about IAM Roles for Amazon EC2.

  1. Open Manager.

  2. Open the Snaplexes tab of the project that contains the EC2-based Groundplex.

  3. Click the Groundplex to open its Properties.

  4. Open the Node Properties tab.

  5. Click + to add a new row in the Global properties section.

  6. Enter jvm_options in Key and -DIAM_CREDENTIAL_FOR_S3=TRUE in Value.

  7. Restart the JCC (node).

Support for Ultra Pipelines

Works in Ultra Pipelines.

Limitations

The current Snap functionality supports AWS S3 Cloud Service and applies to the AWSGovCloud setup.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • Mapper

  • File Writer

An upstream Snap is optional and any Snap with a document output view can be connected upstream (such as Mapper, File Writer, and so on). Any document with key-value pairs to evaluate expression properties in the S3 File Reader Snap. Each input document, if any, will cause one read operation of the Snap.

Output

Binary

 

  • Min: 0

  • Max: 1

  • CSV Parser

  • JASON Parser

  • XML Parser

Any Snap with a binary input view can be connected downstream, such as CSV Parser, JSON Parser, XML Parser, and so on. Binary data read from AWS S3 specified in the File property with header information about the binary stream. The binary data and header information can be previewed at the output of the Snap.

{ "content-length": "96258" "last-modified": { "_snaptype_datetime": "2014-06-26T23:27:01.000 UTC"} "content-disposition": "attachment; filename="leads.csv"" "content-location": "s3:///mr_test/leads.csv" "content-type": "text/csv" "etag": "730145bec198288e9f428193fde851b7" }

 

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the fieldset.

  • Remove icon ( ): Indicates that you can remove fields from the fieldset.

  • Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

 

Description

Field Name

Field Type

 

Description

Label*

 

Default Value: S3 File Reader
Example: S3 File Reader

String

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

 

File

 

Default Value: s3:///
Example:

String/Expression/Suggestion

Specify the URL for the S3 file, from where the binary data is to be read. It must start with "s3:///". The suggest feature can be used to view the list of buckets, subdirectories and files. Bucket names are suggested if the property is empty or "s3:///". Once a bucket is selected, it can list subdirectories and files immediately below the bucket. Names of subdirectories end with a forward slash ("/"). The suggest feature is not supported if the properties in the S3 Dynamic account are parameters.

This Snap supports S3 Virtual Private Cloud (VPC) endpoint.

 

Using Expressions:

Click the Expression Enabler to enable the expressions.

For example, if the File property is "s3:///mybucket/out_" + Date.now() + ".csv" then the evaluated filename is s3:///mybucket/out_2013-11-13T00:22:31.880Z.csv.

Syntax:

s3:///<S3_bucket_name>@s3.<region_name>.amazonaws.com/<path>

For region names and their details, see AWS Regions and Endpoints.

Version ID

 

Examples:   xvcnB8gPi37l3hbOzlsRFxjVwQ.numQz

Default value:  [None]

String/Expression/Suggestion

Enter or select S3 file version ID. If the property is empty, the Snap reads the latest version. The suggest feature can be used to view the list of version IDs for the S3 file in the File property. The suggest feature is not supported if the properties in the S3 Dynamic account are parameterized. Each line in the suggested list also includes the last modified date and the file size to help select a version. When the property value is entered manually, only the version ID is required. The Snap ignores the last modified date and size information of a version when it reads the file. If the versioning of a S3 bucket is not enabled, no version ID is suggested. The versions of the following cases will be omitted in the suggested list since their files cannot be downloaded: 

  • If a file had existed before the versioning was enabled, its version does not have any version ID assigned to it.

  • Version ID's with 'Deleted Marker' resource type are also omitted in the suggested list.

Version ID suggestion interval

Use this field set to read a specific version of S3 file object. Enter the time interval for the Version ID suggestion. Enter two rows to provide a start date and an end dates. If only one row is provided, the interval will be from the date until now. If left empty, all version IDs are suggested. This property may be useful when a specific S3 file has many versions. This property is used for the Version ID suggestion only, and not used during the Snap preview or execution.

Year

 

Default Value: None
Example:  2017

 

Integer

Enter the year as a 4-digit integer.

 

Month

 

Default Value:  None
Example:  9, 09, 12

 

Integer

Enter the month as an integer.

 

 

Date

 

Default Value: None
Examples:  28, 09, 12

Integer

Enter the day of the month.

 

 

Zone


Default Value: None
Example:  US/Pacific

Suggestion

Enter or select a time zone ID from the suggested list. May be empty for UTC. Please note only zone IDs in the suggested list are supported.

 

Enable staging

 

Default Value: Deselected

 

Checkbox

If selected, the Snap downloads the source file into a local temporary file. When the download is completed, it streams the data from the temporary file to the output view. This property prevents the Snap from being blocked by slow downstream pipeline. The local disk should have sufficient free space as large as the expected file size. 

 

 

Number of retries

 

Default value: 0
Example:  3

Minimum value: 0

Integer/Expression

Specify the maximum number of retry attempts that the Snap must make in case there is a network failure, and the Snap is unable to read the target file.

If the value is larger than 0, the Snap overrides the Enable staging value to true and downloads the S3 file to a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap starts to stream the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.

Retry interval (seconds)

 

Default Value: 1
Example: 3

Minimum value: 1

Integer/Expression

Specify the minimum number of seconds for which the Snap must wait before attempting recovery from a network failure.

 

Get Object Tags

 

Default value: Deselected

Checkbox

Select this checkbox to include object tags in the header of the output binary data. See Object Tagging for more information on object tags.

You must have the S3:GetObjectTagging permission to be able to use this feature.

 

Snap Execution

Default Value: Validate & Execute
Example: Execute only

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

 

Optional Configuration

Account & Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Configuring Binary Accounts for information on setting up accounts that work with this Snap.

Required settings for account types are as follows: 

  • AWS S3 - Access-key ID, Secret key, Security token.

  • S3 Dynamic - Access-key ID, Secret key, Security token, Server-side encryption.

Example

Read Data from the S3 Bucket

This example pipeline demonstrates how to read data from an S3 bucket using the S3 File Reader Snap.

Step 1: Configure pipeline parameters as shown below.

Step 2: Configure the attributes to pass raw data in the JSON Generator Snap under Edit JSON.

Step 3: Configure the JSON Formatter Snap to format the binary data to document format.

Step 4: Configure the S3 File Reader to write the data into the S3 bucket.

Step 5: Configure the S3 File Reader Snap to read the specific object data from the S3 bucket.

You can view the document data in the S3 File Reader Output.

Typical Snap Configurations

The key configuration of the Snap lies in how the values are passed.

  • Without Expressions: Values are passed directly in the Snap.

  • With Expressions:

    • Using pipeline parameters: Values are passed as pipeline parameters:

 

Here are a few examples of how Snap's suggestion works:

 

 

 

 

 

Downloads

 

  File Modified

File Read data from S3 Bucket.slp

Sept 20, 2023 by Rachana Mannath

 

Related Content