Redshift - Execute

In this article

Overview

You can use the Redshift Execute Snap to execute arbitrary SQL queries. It executes DML (SELECT, INSERT, UPDATE, DELETE) type statements. This Snap works best with single queries.

Snap type

The Redshift Execute Snap is a Write-type Snap that writes the results of the executed SQL queries.

Prerequisites

Supported Redshift database environment and Redshift database account with valid access control.

Support for Ultra Pipelines

Works in Ultra Pipelines

Limitations

If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

  • When the SQL statement property is an expression, the pipeline parameters are shown in the suggest, but not the input schema.

  • Multiple queries might not work, because the underlying JDBC driver does not support multiple queries. We recommend you to use the Redshift - Multi Execute Snap for running multiple queries.

Behavior Change

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

  • Min: 0

  • Max: 1

  • Mapper

  • JSON Generator

A Snap that can contain data that is to be used with the JSON paths defined in the SQL, if any, or to be passed through.

If the input view is defined, then the where clause substitutes incoming values for a specific query.

Output

Document

 

  • Min: 0

  • Max: 1

  • JSON Formatter

  • Mapper

A document with the result set output, if any, and update count as return status from the Redshift SQL. The Snap output all records of a batch (as configured in your account settings) to the error view if the write fails during batch processing.

If an output view is available and an update/insert/merge/delete statement was executed, then the original document that was used to create the statement will be output with the status of the executed statement.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current Pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the field set.

  • Remove icon ( ): Indicates that you can remove fields from the field set.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label*


Default Value: Redshift - Execute
Example: Redshift - Execute

String

Specify the name for the Snap.

SQL statement*


Default Value: N/A
Example: "SELECT""EMPNO=$EMPNO and ENAME=$EMPNAME"

 

String/Expression

Specify the SQL statement to execute on the server. 

  • Redshift allows using \ (backslash) or ' (single quote) to escape special characters in the SQL. Therefore, we recommend that you use ' (single quote) in the SQL statement to escape special characters .

  • We recommend you to add a single query in the SQL Statement field.

Valid JSON paths that are defined in the WHERE clause for queries/statements are substituted with values from an incoming document. If the document is missing a value to be substituted into the query/statement, it will be written to the error view.

If ‘$' is not part of the JSON path, it can be escaped by writing it as \$ so that it can be executed as-is. For example, SELECT \$2, \$3 FROM mytable. If the character before $ is alphanumeric, then '$' does not have to be escaped. (for instance, SELECT metadata$filename ...)

Following is an example of the procedure:

CREATE OR REPLACE PROCEDURE sp_inout_proc (INOUT a int, b int, INOUT c int)
AS \$\$
BEGIN
a := b * a;
c := b * c;
END;
\$\$

Query type

 

Default Value: Auto
Example: Read

Dropdown list/Expression

Select the type of query for your SQL statement (Read or Write).

When Auto is selected, the Snap tries to determine the query type automatically.
If the execution result of the query is not as expected, you can change the query type to Read or Write.

Pass through


Default Value: Selected

Checkbox

Select this checkbox to pass the input document through to the output view under the key 'original'. This property applies only to the Execute Snaps with SELECT statement.

 

Ignore empty result


Default Value: Deselected

Checkbox

Select this checkbox to ignore empty result; no document will be written to the output view when a SELECT operation does not produce any result.
If you deselect this checkbox and select the Pass through checkbox, the input document will be passed through to the output view.

Number of retries

 

Default Value: 0
Example: 3

Integer/Expression

Specify the maximum number of retry attempts the Snap must make in case there is a network failure and is unable to read the target file. The request is terminated if the attempts do not result in a response.

  • If the Number of retries value is set to 0 (the default value), the retry option is disabled, and the Snap does not initiate a retry. The pipeline will not attempt to retry the operation in case of a failure—any failure encountered during the database operation will immediately result in the pipeline failing without any retry attempts to recover from the errors.

  • If the Number of retries value is greater than 0, the Snap initiates a download of the target file into a temporary local file. If any error occurs during the download, the Snap waits for the time specified in the Retry interval and then attempts to download the file again from the beginning. After the download is successful, the Snap streams the data from the temporary file to the downstream pipeline. All temporary local files are deleted when they are no longer needed.

Retry interval (seconds)

 

Default Value: 1
Example: 10

 

Specify the time interval between two successive retry requests. A retry happens only when the previous attempt resulted in an exception. 

 

Auto commit


Default Value: Use account setting
Example: True

Dropdown list

Select one of the following options:

  • True: The Snap enables the auto-commit. The value set on this field overrides the Auto commit property set at the account level.

  • False: The Snap disables the auto-commit. The value set on this field overrides the Auto commit property set at the account level.

  • Use account setting: The Snap uses the Auto commit value set in the Account. When you select this option, you must enable the Auto commit option in the account settings.

Snap Execution


Default Value: Execute only
Example: Validate & Execute

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error

Reason

Resolution

Error

Reason

Resolution

type "e" does not exist

This issue occurs due to incompatibilities with the recent upgrade in the Postgres JDBC drivers.

Download the latest 4.1 Amazon Redshift driver here and use this driver in your Redshift Account configuration and retry running the Pipeline.


Additional Information

Scenarios to successfully execute your SQL statements

  • The non-expression form uses bind parameters, so it is much faster than executing N arbitrary SQL expressions.

  • Using expressions that join strings together to create SQL queries or conditions has a potential SQL injection risk and hence unsafe. Ensure that you understand all implications and risks involved before using concatenation of strings with '=' Expression enabled.

  • The '$' sign and identifier characters, such as double quotes (“), single quotes ('), or back quotes (`), are reserved characters and should not be used in comments or for purposes other than their originally intended purpose.

Single quotes in values must be escaped

Any relational database (RDBMS) treats single quotes (') as special symbols. So, single quotes in the data or values passed through a DML query may cause the Snap to fail when the query is executed. Ensure that you pass two consecutive single quotes in place of one within these values to escape the single quote through these queries. For example:

If String 

To pass this value

Use

Has no single quotes

Schaum Series

'Schaum Series'

Contains single quotes

O'Reilly's Publication

'O''Reilly''s Publication'

Recommendations

  • Be cautious when running your queries, because you can drop your database and lock tables while executing SQL statements.

  • Running multiple queries might not work with the Redshift - Execute Snap. If you need to run multiple queries, we recommend you to use the Redshift -Multi Execute Snap.

ETL Transformations and Data Flow

This Snap enables the following ETL operations/flows:

  1. Extract data from an existing Redshift Table.

  2. Transform any input document SnapLogic types to Redshift JDBC types for any input and transform any output document Redshift JDBC types to SnapLogic types for output.

  3. Load data in the Redshift table.

The SQL (to be executed) is passed to Redshift. Here’s the detailed data flow:

  1. The Snaps collects the user account information, and the SQL statement (after any expression evaluation), and any JDBC jars defined in the Redshift database account. JDBC jars defined in the Redshift database account are at customer discretion and should be Redshift approved/supported.

  2. Valid JSON paths that are defined in the WHERE clause for queries/statements will be substituted with values from an incoming document as a prepared statement. The substituted values will be transformed from the SnapLogic type value to the appropriated JDBC type values based on the database's column type. If there are no JSON paths then a JDBC query will be utilized instead of a prepared statement.

  3. Successful execution may create a result set. The result set columns will be transformed from the JDBC type value to the SnapLogic type value.

  4. Data errors may occur, therefore an error view should be created to handle these conditions. If the batch has a data error the error data will be written to the error view and the rest of the batch will not be processed. However, a batch data error will not stop subsequent batches from executing.

  5. Select SQLs will not use auto-commit. For non-select SQL, commit happens at successful batch completion when the database account has Auto commit enabled. If the database account does not have Auto commit enabled, commit happens at end of the successful Snap run. Therefore the Auto commit the setting must be configured to be processed in the desired way. For example, If a downstream Snap needs to see the data in the database, auto-commit should be enabled.

  6. The database account uses a shared connection pool for efficiency and to prevent opening too many connections to a database. It is possible that another Snap with the same database account settings may be reusing the same connection as the Redshift Execute. To avoid reusing another Snap's connection for the purpose of isolating DML operations or debugging connection operations, use a different database account having different settings - this will cause the database connection to be unique to the Redshift Execute Snap.

Examples

Redshift - Execute Snap’s functionality as a standalone Snap in a pipeline

The use case requirement for this basic example is that we want to leverage the database query engine to provide uniqueness, a different output column, and a literal output value that the Redshift Select Snap cannot perform by itself.  This example demonstrates how you can leverage other database query engine capabilities (functions, joins, and so on) and create more complex results than just getting filtered data out of a table.

 


Typical Snap Configurations

The key configuration of the Snap lies in how you pass the SQL statements. 

  • Without Expression: Directly passing the required SQL statement in the Redshift Execute Snap.

JSON path example below, a statement is prepared and parameter values binded (from matching input document field) and executed using database account batch configuration of batch size 2.  The JSON Generator preview shows the input values flowing into the Redshift Execute.  The image at the right shows two output records. Note that only the input document with $name = "Danila" found a matching result in Redshift. The input document with $name = "Prasanna" was executed but returned no rows because the data was case sensitive and therefore did not match the database data value of "prassanna" .  Because the Pass through checkbox was select and the original document is preserved in the output view.

The error view was enabled above so that processing can continue to end. Below is the error in the error output view.

  • With Expressions

An input document field named $tracked is used instead of hard coding "true" from the basic example above.  An example value of $tracked is "'true'".

 

  • Pipeline Parameter: Pipeline parameter set to pass the required SQL Statement to the Redshift Execute Snap.  

Same expression example, this time using pipeline parameter instead of an input document field reference.

      


Inserting Precision Numbers Into a Redshift Table

This example demonstrates how to use the Redshift Execute to insert precision numbers into a Redshift table using JSON Path expressions which will have the Redshift Execute create prepared statements, bind the input document field references, and execute using batching where the Redshift database account is configured with a batch size of 2.  Below is a screenshot showing the more complex pipeline, the Redshift Execute configuration, and the inputs coming into the Redshift Execute.

  


The following image shows the error output view data:

Here is the regular output view data:




Executing a Stored Procedure

This example pipeline demonstrates how to call a stored procedure using the Redshift Execute Snap.

Create a store procedure sp_inout_proc as shown below. This stored procedure contains three values: a, b, and c.



On validating the Pipeline, the stored procedure is executed, and you can view only the status, message, and message description.


Next, we configure another Redshift Execute Snap to call the same sp_inout_proc stored procedure with these three values. 


After validating the Pipeline, the Snap gets the values from the procedure and the output of the stored procedure is displayed as follows:


Downloads