In this article

Overview

You can use this Snap to execute a SELECT query on CSV, JSON, and Parquet S3 objects, which may be compressed in GZIP or BZIP2 format.

note

AWS S3 does not define any folder object and the '/' characters are part of the S3 object key names.
This Snap captures metadata and lineage information from the input document.

AWS S3 does not define any folder object and the '/' characters are part of the S3 object key names.
This Snap captures metadata and lineage information from the input document.

Snap Type

The S3 Select Snap is a Read-type Snap that reads a subset of your S3 data based on a SELECT query.

Prerequisites

A valid S3 account with the required permissions.

Support for Ultra Pipelines

Works in Ultra Pipelines.

Limitations

The conditional fields Field Delimiter, Record Delimiter, Quote Character, Quote Escape Character, and File Header Info associated with the CSV Data Format are not displayed for non-CSV data formats. They are also not displayed if the Data Format field is expression-enabled. If you plan to make the Data Format expression-enabled to process the CSV-related properties, follow these steps:

Set CSV as the Data Format.
Configure the conditional fields.
Enable the expression.

You must follow these steps because the values of these properties are used during the execution only if the value of the Data Format is set to CSV, even when the associated conditional fields are not displayed.

Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input

Document

Min: 0
Max: 1

Mapper
JSON Parser

An upstream Snap is optional. Any document with key-value pairs to evaluate expression properties.

Output

Binary

Min: 0
Max: 1

CSV Parser
JSON Parser
Mapper (binary input view)
File Writer

An example of the output binary data is as following:

Output preview of table containing preview in text format

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.
Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.
Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

Asterisk ( * ): Indicates a mandatory field.
Suggestion icon ( ): Indicates a list that is dynamically populated based on the configuration.
Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.
Add icon ( ): Indicates that you can add fields in the fieldset.
Remove icon ( ): Indicates that you can remove fields from the fieldset.
Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

Field Dependency

Description

Label*

Default Value: S3 Select
Example: Select Invoices from Q4 2018

String

None

The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline.

Bucket*

Default Value: None
Example:

mybucket
mybucket@eu-west-1
Example of S3 Bucket in S3 region:
- mybucket@s3.eu-west-1.amazonaws.com
Example of S3 VPC endpoint:
- my-bucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com
- testBucket@bucket.vpce-028b7814794578709-vu0vvauy.s3.us-west-2.vpce.amazonaws.com

String/Expression/Suggestion

None

Specify the S3 bucket name where S3 objects are located.

Do not add S3:/// before bucket name, because the Snap can fail.

note

Bucket names are unique globally and can be accessed without its region name in most cases. If you cannot access a bucket name without its region name, you can specify the region information in the following syntax
- <S3_bucket_name>@<region_name>
You can also access the S3 bucket in an S3 Virtual Private Cloud (VPC) endpoint by specifying the bucket name in the following syntax:
- <S3_bucket_name>@<VPC_S3_endpoint>

Note: If you enter an incorrect region name, but the bucket name is valid, the AWS S3 service may successfully access the bucket without any error.

Bucket names are unique globally and can be accessed without its region name in most cases. If you cannot access a bucket name without its region name, you can specify the region information in the following syntax
- <S3_bucket_name>@<region_name>
You can also access the S3 bucket in an S3 Virtual Private Cloud (VPC) endpoint by specifying the bucket name in the following syntax:
- <S3_bucket_name>@<VPC_S3_endpoint>

Note: If you enter an incorrect region name, but the bucket name is valid, the AWS S3 service may successfully access the bucket without any error.

Object Key*

Default Value: None
Examples:

test.csv
abc/test.json
abc/xyz/test.xml

String/Expression/Suggestion

None

Specify the S3 object key name, which may include one or more forward-slash ('/') characters.

note

The forward-slash character is part of the S3 object key name and there is no folder object defined in AWS S3. The maximum length of the suggested list is 1,000.

Select Query*

Default Value: None
Example: select * from S3Object

String/Expression

None

Enter a SELECT query to be executed on the S3 object. For a detailed description on using the SELECT command, refer to the Amazon documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html

Input Data Format

Compression Type*

Default Value: None
Example: GZIP

String/Expression/Dropdown list

None

Select one of the following compression types: NONE, GZIP, or BZIP2.

note

The Amazon S3 Select service does not support Compression Type BZIP2 or GZIP for the Parquet input data type.

Data format*

Default Value: CSV
Examples: JSON

String/Expression/Dropdown list

None

Select one of the data formats for the input data: CSV, JSON or Parquet.

Field Delimiter

Default Value: , (Comma)
Examples:

| (Pipe)
\t (tab)

String/Expression