Redshift - Select

On this Page

Snap type

Read

Description

This Snap allows you to fetch data from a database by providing a table name and configuring the connection. The Snap produces the records from the database on its output view which can then be processed by a downstream Snap. 

ETL Transformations & Data Flow

This Snap enables the following ETL operations:

Fetch data from an existing Redshift table using the user configuration, and feed it to downstream Snaps.

JSON paths can be used in a query and will have values from an incoming document substituted into the query. However, documents missing values for a given JSON path will be written to the Snap's error view. After a query is executed, the query's results are merged into the incoming document overwriting any existing keys' values. The original document is output if there are no results from the query.


Queries produced by the Snap have an equivalent format:

SELECT * FROM [table] WHERE [where clause] ORDER BY [ordering] LIMIT [limit] OFFSET [offset]


If more powerful functionality is desired, then the Execute Snap should be used.

Input & Output

  • InputThis Snap can have an upstream Snap that can pass a document output view such as Mapper or JSON Generator.

  • Output: A document or a set of documents that contains the result of query for each input document. If no input document is provided, the query will be done only once.
Limitations and Known Issues:

If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

Prerequisites

None.

Support for Ultra Pipelines

Works in Ultra Task Pipelines.

Behavior Change

Starting with version main22460, in the Redshift Select Snap:

  • When you create a table in Redshift, by default, all column names are displayed in lowercase in the output.
  • When you enter column names in uppercase in the Output Field property, the column names are displayed in lowercase in the output.


As of the March 2023 release, in the Redshift Select Snap, when you configure Output fields and deselect the Match data types checkbox, the display of the label name for the timestamptz data type in the output preview varies with the Redshift JDBC and the PostgreSQL JDBC drivers. For the Redshift JDBC driver, the Snap prefixes Redshift to the Timestamp label. For the PostgreSQL JDBC driver, Snap displays the labels as configured in the Snap settings. This does not impact the performance of the Snap.
Note: The behavior of the Snap remains the same when you select the Match data types checkbox, regardless of using the PostgreSQL or the Redshift driver—the label names are displayed as configured in the Snap settings.

Configurations

Account and Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Redshift Account for information on setting up this type of account.

Views

InputThis Snap allows none or one input view. If the input view is defined, then the where clause can substitute incoming values for a given expression (in such as to use it as a lookup).
OutputThis Snap has one output view by default and produces one document for each row in the table. A second view can be added to dump out the metadata for the table as a document. The metadata document can then be fed into the second input view of Redshift Insert or Bulk Load Snap so that the table is created in Redshift with a similar schema as the source table. See the Redshift Snaps for more information.
Error

This Snap has at most one error view and produces zero or more documents in the view.
 

Settings

Label


Required The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Schema name


The database schema name. Selecting a schema filters the Table name list to show only those tables within the selected schema. The property is suggestible and will retrieve available database schemas during suggest values.

Table name


Required Name of table to execute insert on
Example: people

Where clause 

Where clause of select statement. This supports document value substitution (such as $person.firstname will be substituted with the value found in the incoming document at the path). However, you may not use a value substitution after "IS" or "is" word. Please see the examples below:

Examples

  • email = 'you@example.com' or email = $email
  • email IS NOT NULL
  • email IS NULL
 

Order by: Column names 

Enter in the columns in the order in which you want to order by. The default database sort order will be used.

Example

name

email  

Limit offset

Starting row for the query.
Example: 0
Default value: [None] 

Limit rows 

Number of rows to return from the query.
Example: 10

Default value: [None] 

Output fields

Enter or select output field names for SQL SELECT statement. To select all fields, leave it at default.

Example: email, address, first, last, etc.

Default value: [None]

Fetch Output Fields In Schema

Select this check box to include only the selected fields or columns in the Output Schema (second output view). If you do not provide any Output fields, all the columns are visible in the output.
If you provide output fields, we recommend you to select Fetch Output Fields In Schema check box.

Default value: Not selected

Pass through


If checked, the input document will be passed through to the output view under the key 'original'.

Default value: Selected

Ignore empty result


If selected, no document will be written to the output view when a SELECT operation does not produce any result. If this property is not selected and the Pass through property is selected, the input document will be passed through to the output view.

Default value: Not selected

Auto commit

Select one of the options for this property to override the state of the Auto commit property on the account. The Auto commit at the Snap-level has three values: TrueFalse, and Use account setting. The expected functionality for these modes are:

  •  True - The Snap will execute with auto-commit enabled regardless of the value set for Auto commit in the Account used by the Snap.
  •  False - The Snap will execute with auto-commit disabled regardless of the value set for Auto commit in the Account used by the Snap.
  • Use account setting - The Snap will execute with Auto commit property value inherited by the Account used by the Snap.

Default value: False

Number of retries

Specifies the maximum number of attempts to be made to receive a response. The request is terminated if the attempts do not result in a response.

If the value is larger than 0, the Snap first downloads the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and attempts to download the file again from the beginning. When the download is successful, the Snap streams the data from the temporary file to the downstream Pipeline. All temporary local files are deleted when they are no longer needed.

Ensure that the local drive has sufficient free disk space to store the temporary local file.

Example: 3

Default value: 0

Retry interval (seconds)

Specifies the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. 

Example:  10

Default value: 1

Match data types

Conditional. This property applies only when the Output fields property is provided with any field value(s).

If this property is selected, the Snap tries to match the output data types same as when the Output fields property is empty (SELECT * FROM ...). The output preview would be in the same format as the one when SELECT * FROM is implied and all the contents of the table are displayed.

Default value: Not selected

Staging mode

Required when the value in the Number of retries field is greater than 0. 

Specify the location from the following options to store input documents between retries:

  • In memory: The query results are stored in the Snaplex memory. If the query is too large to fit in the memory space, it may cause the Snap to fail, choose the On disk option.
  • On disk: The query results are stored on the disk in a temporary (tmp) directory that is managed by the SnapLogic platform. This directory is deleted automatically when the Snap terminates.

Snap Execution


Default Value: Validate & Execute
Example: Execute only

Select an option to specify how the Snap must be executed. Available options are:

  • Validate & Execute: Performs limited execution of the Snap (up to 50 records) during Pipeline validation; performs full execution of the Snap (unlimited records) during Pipeline execution.

  • Execute only: Performs full execution of the Snap during Pipeline execution; does not execute the Snap during Pipeline validation.

  • Disabled: Disables the Snap and, by extension, its downstream Snaps.

For the 'Suggest' in the Order by columns and the Output fields properties, the value of the Table name property should be an actual table name instead of an expression. If it is an expression, it will display an error message "Could not evaluate accessor:  ..." when the 'Suggest' button is clicked. This is because, at the time the "Suggest" button is clicked, the input document is not available for the Snap to evaluate the expression in the Table name property. The input document is available to the Snap only during the preview or execution time.

Troubleshooting

ErrorReason Resolution

type "e" does not exist

This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers.

Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline.

Basic Use Case


Following is an example using several of the Redshift properties to select two rows of data from table account in schema public with a where condition clause.

Typical Snap Configurations


The Key configurations for the Snap are:

  • Without Expression
  • With Expression

Following examples are using the sample data: demo_guest.csv (available in Downloads below). Please use the Redshift Bulk Load to load the file into the Redshift instance or create a table using:

CREATE TABLE "public"."demo_guest" (             id varchar(20),             name varchar(20),             inst_dt timestamp )

The pipeline can be found here: redshift-select_2017_06_12.slp (available in Downloads below)

Without Expressions:  Select a table with WHERE condition and show the results in order. The configuration below is equivalent to the Query: 

SELECT * FROM "public"."demo_guest" WHERE "name" = 'Tom' ORDER BY "inst_dt";


  • With Expressions:
    • Query statement from the upstream

Query a table according to the input document. The Mapper Snap connects to the Snap and provides the needed upstream input document to the Redshift Select Snap. 

The Mapper configuration:

    • Redshift Select Snap configuration:

    • Pipeline Parameter

 Query a table according to the pipeline parameter. The following example pipeline used the 'id' defined in pipeline parameter to query the table.

 

Advanced Use Case


The following describes a pipeline, with a broader business logic involving multiple ETL transformations. The use case can be moving data from on-prem to cloud. Following is the sample pipeline. 

In this example, the goal is to move all account data from on-prem instances to Redshift CDW so users can run analytics on top of this. Files (account details) stored in MySQL (producer) are pushed to a particular topic in Confluent Kafka, File reader reads another file (account/leads) and is pushed to the same topic. Consumer can consume from the same topic and later move this to Redshift. Redshift Select can be used to verify the data moved, and then Tableau can consume this for Analytics. 

The ETL Transformations

In the pipeline #1:

  1. Extract: The MySQL Select Snap reads the documents from the MySQL Database.

  2. Load:  The Confluent Kafka Producer Snap loads the documents into a topic.


In the pipeline #3:

  1. Extract: The File Reader Snap reads the records to be be pushed to the Confluent Kafka topic.
  2. Transform: The Excel Parser Snap parses the records in an .xls format 
  3. Load: The Confluent Kafka Producer Snap loads the .xls documents into a topic.


In the pipeline #3:

  1. Extract: The Confluent Kafka Consumer Snap reads the documents from the same topic.
  2. Transform: The Mapper Snap maps the input documents to the Redshift Database
  3. Load: The Redshift Bulk Load Snap loads the documents into a table. 
  4. Read: The Redshift Select Snap reads the newly loaded documents.


Downloads

Important steps to successfully reuse Pipelines

  1. Download and import the pipeline into the SnapLogic application.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.

  File Modified

File redshift-select_2017_06_12.slp

Aug 11, 2017 by Aparna Tayi


Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateType  Updates

August 2024

main27765

 

Stable

  • Upgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.
  • Upgraded the JDBC driver for the Redshift Snap Pack to v2.1.0.29 to address the SQL Injection vulnerabilities. Pipelines using the Redshift Snaps are not impacted after the driver upgrade, because the latest JDBC driver is fully backward compatible.

May 2024437patches26634 LatestFixed an issue with Redshift - Execute Snap that produced logs causing node crashes.
May 2024main26341 StableUpdated the Delete Condition (Truncates a Table if empty) field in the Redshift - Delete Snap to Delete condition (deletes all records from a table if left blank) to indicate that all entries will be deleted from the table when this field is blank, but no truncate operation is performed.
February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023main23721

 

StableUpdated and certified against the current SnapLogic Platform release.
August 2023main22460 Stable
  • The Redshift-Bulk Load and Redshift-Bulk Upsert Snaps now support expression enablers for the Additional options field that enables you to use parameters.
  • The Redshift - Execute Snap now includes a new Query type field. When Auto is selected, the Snap tries to determine the query type automatically.


Behavior Change

Starting with version main22460, in the Redshift Select Snap:

  • When you create a table in Redshift, by default, all column names are displayed in lowercase in the output.
  • When you enter column names in uppercase in the Output Field property, the column names are displayed in lowercase in the output.

May 2023

main21015 

Stable

Upgraded with the latest SnapLogic Platform release.

February 2023

432patches20500

 Latest

The Redshift Account no longer fails when a URL is entered in the JDBC URL field and no driver is specified.

February 2023432patches20166 Latest

Updated the description for S3 Security Token field as follows:

Specify the S3 security token part of AWS Security Token Service (STS) authentication. It is not required unless a particular S3 credential is configured to require it.

February 2023432patches20101

  

Latest
  • The JDBC driver class for Redshift accounts is bundled with the com.amazon.redshift.jdbc42.Driver as the default driver. This upgrade is backward-compatible. The existing pipelines will continue to work as expected and the new pipelines will use the Redshift Driver as the default driver. SnapLogic will support providing fixes for the issues you might encounter with accounts that use the PostgreSQL driver only until November 2023.
    After November 2023, SnapLogic will not provide support for the issues with the PostgreSQL driver. Therefore, we recommend you to migrate from the PostgreSQL JDBC driver to the Redshift JDBC driver. Learn more about migrating from the PostgreSQL JDBC Driver to the Amazon Redshift Driver. (432patches20101)

  • The Instance type option in the Redshift Bulk Load Snap enables you to use the Amazon EC2 R6a instance. This property appears only when the parallelism value is greater than one.

February 2023432patches20035

 

Latest

The Redshift Snaps that earlier supported only Redshift Cluster now support Redshift Serverless as well. With Redshift Serverless, you can avoid setting up and managing data warehouse infrastructure when you run or scale analytics.

February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022main18944 Stable

The Redshift - Insert Snap now creates the target table only from the table metadata of the second input view when the following conditions are met:

  • The Create table if not present checkbox is selected.

  • The target table does not exist.

  • The table metadata is provided in the second input view.

August 2022430patches17189 Latest
August 2022main17386 Stable

The Redshift accounts support:

  • Expression enabler to pass values from Pipeline parameters.

  • Security Token for S3 bucket external staging.

4.29 Patch429patches16908 Latest
  • Enhanced the Redshift accounts with the following:

    • Expression enabler to pass values from Pipeline parameters.

    • Support for Security Token for S3 bucket external staging.

  • Fixed an issue with Redshift - Execute Snap where the Snap failed when the query contained comments with single or double quotes in it. Now the Pipeline executes without any error if the query contains a comment.

4.29 Patch

429patches15806

 Latest

Fixed an issue with Redshift Account and Redshift SSL Account where the Redshift Snaps failed when the S3 Secret key or S3 Access-key ID contained special characters, such as +.

4.29

main15993

 

Stable

Upgraded with the latest SnapLogic Platform release.

4.28main14627 StableUpdated the label for Delete Condition to Delete Condition (Truncates Table if empty) in the Redshift Delete Snap.
4.27 Patch427patches12999 LatestFixed an issue with the Redshift Bulk Load Snap, where the temporary files in S3 were not deleted for aborted or interrupted Pipelines.
4.27 Patch427patches12999 Latest
4.27main12833 Stable

Enhanced the Redshift - Execute Snap to invoke stored procedures.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25 Patch425patches11008 Latest

Updated the AWS SDK from version 1.11.688 to 1.11.1010 in the