In this article
...
You can use this Snap to execute a SELECT query on CSV, JSON, and Parquet S3 objects, which may be compressed in GZIP or BZIP2 format.
AWS S3 does not define any folder object and the '/' characters are part of the S3 object key names.
...
This Snap captures metadata and lineage information from the input document.
...
Snap Type
The S3 Select Snap is a Read-type Snap that reads a subset of your S3 data based on a SELECT query.
...
Known Issues
None.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| An upstream Snap is optional. Any document with key-value pairs to evaluate expression properties. |
Output | Binary |
|
| An example of the output binary data is as following: |
Error | Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:
Learn more about Error handling in Pipelines. |
Snap Settings
Info |
---|
|
Field Name | Field Type | Field Dependency | Description | ||
---|---|---|---|---|---|
Label* Default Value: S3 Select | String | None
| The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your Pipeline. | ||
Bucket* Default Value: None
| String/Expression/Suggestion | None | Specify the S3 bucket name where S3 objects are located.
Note: If you enter an incorrect region name, but the bucket name is valid, the AWS S3 service may successfully access the bucket without any error. | ||
Object Key* Default Value: None
| String/Expression/Suggestion | None | Specify the S3 object key name, which may include one or more forward-slash ('/') characters. The forward-slash character is part of the S3 object key name and there is no folder object defined in AWS S3. The maximum length of the suggested list is 1,000. | ||
Select Query* Default Value: None | String/Expression | None | Enter a SELECT query to be executed on the S3 object. For a detailed description on using the SELECT command, refer to the Amazon documentation: https://docs.aws.amazon.com/AmazonS3/latest/userguide/s3-glacier-select-sql-reference-select.html | ||
Input Data Format | |||||
Compression Type* Default Value: None | String/Expression/Dropdown list | None | Select one of the following compression types: NONE, GZIP, or BZIP2. The Amazon S3 Select service does not support Compression Type BZIP2 or GZIP for the Parquet input data type. | ||
Data format* Default Value: CSV | String/Expression/Dropdown list | None | Select one of the data formats for the input data: CSV, JSON or Parquet. | ||
Field Delimiter Default Value:
| String/Expression | Appears when you select CSV for Data Format. | Enter the field delimiter character used in the input data. | ||
Record Delimiter Default Value: LF (new line) | String/Expression/Suggestion | Appears when you select CSV for Data Format. | Enter the record delimiter used in the input data. | ||
Quote Character Default Value: | String/Expression | Appears when you select CSV for Data Format. | Enter the quote character used in the input data. | ||
Quote Escape Character Default Value: | String/Expression | Appears when you select CSV for Data Format. | Enter the escape character used to escape quote characters inside the values, where values are wrapped around quote characters in the input data. Note that some CSV file uses double quote character to escape double quotes inside values. For example, | ||
File Header Info Default Value: USE
| Dropdown list | Appears when you select CSV for Data Format. | Specify whether to use the header information in the SELECT query. | ||
Show Advanced Properties Default Value: Deselected | Checkbox | None | Select this checkbox to display the advanced properties. | ||
Maximum Retries* Default Value: 3 | Integer/Expression | Appears when you select Show Advanced Properties checkbox. | Specify the maximum number of retry attempts to perform in case of a temporary network loss. | ||
Snap Execution Default Value: | Dropdown list | None | Select one of the following three modes in which the Snap executes:
|
Examples
Selecting a Subset of Data from an Amazon S3 Object (CSV file)
...