Redshift - S3 Upsert

On this Page

Snap Type:



This Snap executes a Redshift S3 upsert. This Snap directly upserts (inserts or updates) data from a file (source) on a specified Amazon S3 location to the target Redshift table. A temporary table is created on Redshift with the contents of the staging file. An update operation is then run to update existing records in the target table and/or an insert operation is run to insert new records into the target table.

Refer to AWS Amazon documentation for more information.

ETL Transformations & Data Flow

The Redshift S3 Upsert Snap loads the data from the given list of s3 files using the COPY command and inserts the data if not already in the the redshift table using INSERT ALL query or update if it exists.

Input & Output:

InputThis Snap can have an upstream Snap that can pass values required for expression fields.

OutputA document that contains the result providing the number of documents being inserted/ updated/ failed.


  • The Redshift account does need to specify the Endpoint, Database name, Username, and Password.
  • The Redshift account does need to specify the S3 Access-key ID, S3 Secret key, S3 Bucket, and S3 Folder.
  • The Redshift account security settings does need to allow access from the IP Address of the cloudplex or groundplex.

IAM Roles for Amazon EC2

The 'IAM_CREDENTIAL_FOR_S3' feature is used to access S3 files from EC2 Groundplex, without Access-key ID and Secret key in the AWS S3 account in the Snap. The IAM credential stored in the EC2 metadata is used to gain access rights to the S3 buckets. To enable this feature, the following line should be added to and the jcc (node) restarted:
jcc.jvm_options = -DIAM_CREDENTIAL_FOR_S3=TRUE

Please note this feature is supported in the EC2-type Groundplex only.

For more information on IAM Roles, see

Limitations and Known Issues:

None at the moment.


Account & Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. The S3 BucketS3 Access-key ID, and S3 Secret key properties are required for the Redshift- S3 Upsert Snap. The S3 Folder property may be used for the staging file. If the S3 Folder property is left blank, the staging file will be stored in the bucket. See Redshift Account for information on setting up this type of account.


InputThis Snap has one input view for the data and a second optional input view for the target table schema.
OutputThis Snap has at most one output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view. If you open an error view and expect to have all failed records routed to the error view, you must increase the error count value using Maximum error count field. If the number of failed records exceeds the Maximum error count, the pipeline execution will fail with an exception thrown and the failed records will not be routed to the error view.

None at the moment.



Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Schema name

Required. The database schema name. Selecting a schema filters the Table name list to show only those tables within the selected schema.

The values can be passed using the pipeline parameters but not the upstream parameter.

Example: schema123

Default value: [None]

Table name

Required. Table on which to execute the upsert operation. The property can be given in format of either <schema>.<table_name> or <table_name>. It is suggestible and will retrieve available tables under the schema (if given) during suggest values.

The values can be passed using the pipeline parameters but not the upstream parameter.


  • people
  • "public"."people"

Default value: [None]

Key columns

Required. Columns to use to check for existing entries in the target table.

Example: id

Default value: [None] 

S3 file list

Required. List of S3 files to be loaded into the target table as file names or as expressions.


Default value:  [None] 

IAM Role

Select this property if the bulk load/unload needs to be performed using an IAM role. If selected, ensure the properties (AWS account ID, role name and region name) are provided in the account.

Default value: Not selected

Server-side encryption

This defines the S3 encryption type to use when temporarily uploading the documents to S3 before the insert into the Redshift.  

Default value: Not selected

KMS Encryption type

Specifies the type of KMS S3 encryption to be used on the data. The available encryption options are:

  • None - Files do not get encrypted using KMS encryption
  • Server-Side KMS Encryption If selected, the output files on Amazon S3 are encrypted using this encryption with Amazon S3 generated KMS key. 

Default value: None

If both the KMS and Client-side encryption types are selected, the Snap gives precedence to the SSE,  and displays an error prompting the user to select either of the options only.

KMS key

Conditional. This property applies only when the encryption type is set to Server-Side Encryption with KMS. This is the KMS key to use for the S3 encryption. For more information about the KMS key, refer to AWS KMS Overview and Using Server Side Encryption

Default value: [None]

Truncate data

Truncate existing data before performing data load. 

With the Bulk Update Snap, instead of doing truncate and then update, a Bulk Insert would be faster.

Default value: Not selected

Update statistics

Update table statistics after data load by performing an analyze operation on the table.

Default value: Not selected

Accept invalid characters

Accept invalid characters in the input. Invalid UTF-8 characters are replaced with a question mark when loading.

Default value: Selected

Maximum error count

Required. A maximum number of rows which can fail before the bulk load operation is stopped. By default, the load stops on the first error.

Example: 10 (if you want the pipeline execution to continue as far as the number of failed records is less than 10)
Default value: 100

Truncate columns

Truncate column values which are larger than the maximum column length in the table.

Default value: Selected

Load empty strings

If selected, empty string values in the input documents are loaded as empty strings to the string-type fields. Otherwise, empty string values in the input documents are loaded as null. Null values are loaded as null regardless.

Default value: Not selected

Compression format

The format in which the provided S3 files are compressed in. Specifies:

  • Uncompressed
  • GZIP
  • BZIP2
  • LZOP

Example: GZIP

Default value: Uncompressed

File type

The type of input files. Specifies:

  • CSV
  • JSON
  • ARVO
  • Undefined

Example: JSON

Default value: CSV

Ignore header

Required. Treats the specified number of rows as file headers and does not load them. 

Example: 1

Default value: 0


The single ASCII character that is used to separate fields in the input file, such as a pipe character ( | ), a comma (, ), or a tab ( \t ). Non-printing ASCII characters are supported. ASCII characters can also be represented in octal, using the format '\ddd', where 'd' is an octal digit (0–7). The default delimiter is a pipe character ( | ), unless the CSV parameter is used, in which case the default delimiter is a comma (, ). The AS keyword is optional. DELIMITER cannot be used with FIXEDWIDTH.


Default value: pipe character ( | )

Additional options

Additional options to be passed to the COPY command. 

Refer to AWS Amazon - COPY documentation for available options.


Default value:  [None] 

Vacuum type

Reclaims space and sorts rows in a specified table after the upsert operation. The available options to activate are FULL, SORT ONLY, DELETE ONLY and REINDEX. Please refer to the AWS Amazon - VACUUM documentation for more information.


Default value:  [None] 

Vacuum threshold (%)

Specifies the threshold above which VACUUM skips the sort phase. If this property is left empty, Redshift sets it to 95% by default.

Default value:  [None] 

Snap Execution

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Redshift's Vacuum Command

In Redshift, when rows are DELETED or UPDATED against a table they are simply logically deleted (flagged for deletion), not physically removed from disk. This causes the rows to continue consuming disk space and those blocks are scanned when a query scans the table. This results in an increase in table storage space and degraded performance due to otherwise avoidable disk IO during scans. A vacuum recovers the space from deleted rows and restores the sort order. 

Groundplex System Clock and Multiple Snap Instances with the same 'S3 file list' property

  1. The system clock of the Goundplex should be accurate down to less than a second. The Snap executes Redshift COPY command to have Redshift load CSV data from S3 files to a temporary table created by the Snap. If Redshift fails to load any record, it stores the error information for each failed CSV record into the system error table in the same Redshift database. Since all errors from all executions go to the same system error table, the Snap executes a SELECT statement to find the errors related to a specific COPY statement execution. It uses WHERE clause including the CSV filenames, start time and end time. If the system clock of Groundplex is not accurate down to less than a second, the Snap might fail to find error records from the error table.
  2. If multiple instances of the Redshift - S3 Upsert Snap have the same S3 file list property value and execute almost same time, the Snap will fail to report correct error documents in the error view. Users should make sure each Redshift - S3 Upsert Snap instance with the same S3 file list executes one at a time.


Basic Use Case

The following pipeline describes how the Snap functions as a standalone Snap in a pipeline:

Step1: Provide a valid Redshift account and database/ table name to upload all the documents in the target table.
Step2: Provide a valid S3 account parameters and S3 file list parameter to copy the list of documents.
Step3: Make sure to choose the proper parameters in the Settings tab and invoke the pipeline.

Below is a preview of the output from the Redshift S3 Upsert Snap depicting that four records have been inserted into the table:


We can also verify the same by checking it in Redshift database:

Refer to the "Redshift - S3 Upsert_2017_10_16.slp" in the Download section for more reference

Typical Snap Configurations

Key configuration of the Snap lies in how S3 file list are passed to perform the upsert of the records. The S3 file lists can be passed:

  • Without Expressions:

The values are passed directly to the snap

  • With Expressions:
    • Using Pipeline parameters:

The table name, Key columns and S3 file list can be passed as the pipeline parameter

The Redshift S3 Upsert Snap with the pipeline parameter as the Table name/ S3 file list/ Key column:

Advanced Use Case

The following pipeline describes how the Snap functions as a standalone Snap in a pipeline by passing some invalid values (Trying to pass invalid UTF-8 characters in the bpchar(1) column created in Redshift table)

Below is the table structure in Redshift: 

We are then trying to pass a CSV files with some invalid characters (with 6 invalid records). CSV file "chartablefors3_new.csv" has been attached in the Downloads section. Also trying to accept the Invalid characters in the Redshift table by specifying the same in "Additional Options" as show below in the snap configuration:

ACCEPTINVCHARS [AS] ['replacement_char'] enables loading of data into VARCHAR columns even if the data contains invalid UTF-8 characters. When ACCEPTINVCHARS is specified, COPY replaces each invalid UTF-8 character with a string of equal length consisting of the character specified by replacement_char. ACCEPTINVCHARS is valid only for VARCHAR columns. Refer to the AWS Amazon documentation - Data Conversion for more information.

Since the data type provided is bpchar which will throw an error for 6 records.

Below is a preview of the output from the Redshift S3 Upsert Snap depicting that three records have been updated into the table and other six records have been failed:

Errors are shown below for your reference:

Refer to the "Redshift S3 Upsert - additional options.slp" in the Download section for more reference


  File Modified
No files shared here yet.

Related Information

This section gives you a consolidated list of internal and external resources related to this Snap:

Older Examples

  Redshift S3 Upsert with a single S3 CSV file

In this pipeline, the Redshift Execute Snap creates a table on a Redshift Database server. The Redshift S3 Upsert Snap upserts the records into the table from a file on an S3 location.

The Redshift Execute Snap creates the table, snap1222_emp2 on the Redshift Database server.

The success status is as displayed below:

The S3 Upsert Snap upserts the records from the file Emp_S3.csv , from an S3 location on to the table, snap1222_emp2 on the Redshift Database.

Additionally, the IAM role is selected, and hence ensure that the table structure is same as on the file on S3 location.

The successful execution of the pipeline displays the below output preview:

 Upsert data into a file on Amazon S3.

Example # 1

In this example, the table,  customer_interleaved in the schema, prasanna, is read and the s3:///patan/customer_interleaved.csv file is upserted (updated or inserted) with records. The table that is read and the file that is updated exists in Amazon S3. 

The successful execution of the pipeline displays the output preview where 2 records have been updated:

Example # 2

The example assumes that you have configured & authorized a valid Redshift account (see Redshift Account) to be used with this Snap. In the following example, employee_1 table in the schema, space in schema, is read and the employee_1.csv file is upserted (updated or inserted) with records. The table that is read and the file that is updated exists in Amazon S3.

See Also

Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateType  Updates
4.29 Patch429patches16908 Latest
  • Enhanced the Redshift accounts with the following:

    • Expression enabler to pass values from Pipeline parameters.

    • Support for Security Token for S3 bucket external staging.

  • Fixed an issue with Redshift - Execute Snap where the Snap failed when the query contained comments with single or double quotes in it. Now the Pipeline executes without any error if the query contains a comment.

4.29 Patch



Fixed an issue with Redshift Account and Redshift SSL Account where the Redshift Snaps failed when the S3 Secret key or S3 Access-key ID contained special characters, such as +.





Upgraded with the latest SnapLogic Platform release.

4.28main14627 StableUpdated the label for Delete Condition to Delete Condition (Truncates Table if empty) in the Redshift Delete Snap.
4.27 Patch427patches12999 LatestFixed an issue with the Redshift Bulk Load Snap, where the temporary files in S3 were not deleted for aborted or interrupted Pipelines.
4.27 Patch427patches12999 Latest
4.27main12833 Stable

Enhanced the Redshift - Execute Snap to invoke stored procedures.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25 Patch425patches11008 Latest

Updated the AWS SDK from version 1.11.688 to 1.11.1010 in the Redshift Snap Pack and added a custom SnapLogic User Agent header value.

StableUpgraded with the latest SnapLogic Platform release.

Fixed an issue with the Redshift Bulk Load Snap that fails while displaying a Failed to commit transaction error.

StableUpgraded with the latest SnapLogic Platform release.
4.21 Patch421patches6144 Latest

Fixed the following issues with DB Snaps:

  • The connection thread waits indefinitely causing the subsequent connection requests to become unresponsive.
  • Connection leaks occur during Pipeline execution.
4.21 PatchMULTIPLE8841 Latest

Fixed the connection issue in Database Snaps by detecting and closing open connections after the Snap execution ends. 



StableUpgraded with the latest SnapLogic Platform release.
4.20 Patch db/redshift8774

Fixed the Redshift - Execute Snap that hangs if the SQL statement field contains only a comment ("-- comment"). 

StableUpgraded with the latest SnapLogic Platform release.
4.19 Patch db/redshift8410 Latest

Fixed an issue with the Redshift - Update Snap wherein the Snap is unable to perform operations when:

  • An expression is used in the Update condition property.
  • Input data contain the character '?'.
StableUpgraded with the latest SnapLogic Platform release.
4.18 Patch db/redshift8043 Latest

Enhanced the Snap Pack to support AWS SDK 1.11.634 to fix the NullPointerException issue in the AWS SDK. This issue occurred in AWS-related Snaps that had HTTP or HTTPS proxy configured without a username and/or password. 

4.18 PatchMULTIPLE7884 Latest

Fixed an issue with the PostgreSQL grammar to better handle the single quote characters.

4.18 PatchMULTIPLE7778 Latest

Updated the AWS SDK library version to default to Signature Version 4 Signing process for API requests across all regions.

StableUpgraded with the latest SnapLogic Platform release.
4.17 Patchdb/redshift7433 Latest

Fixed an issue with the Redshift Bulk Load Snap wherein the Snap fails to copy the entire data from source to the Redshift table without any statements being aborted.


Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

  • Fixed an issue with the Redshift Execute Snap wherein the Snap would send the input document to the output view even if the Pass through field is not selected in the Snap configuration. With this fix, the Snap sends the input document to the output view, under the key original, only if you select the Pass through field.
  • Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview checkbox.
4.16 Patch db/redshift6821 Latest

Fixed an issue with the Lookup Snap passing data simultaneously to output and error views when some values contained spaces at the end.

StableUpgraded with the latest SnapLogic Platform release.
4.15 Patch db/redshift6286 Latest

Fixed an issue with the Bulk Upsert Snap wherein there was no output for any input schema.

4.15 Patch db/redshift6334 Latest

Replaced Max idle time and Idle connection test period properties with Max life time and Idle Timeout properties, respectively, in the Account configuration. The new properties fix the connection release issues that were occurring due to default/restricted DB Account settings.

StableUpgraded with the latest SnapLogic Platform release.
4.14 Patch db/redshift5786 Latest

Fixed an issue wherein the Redshift Upload snap logged the access and secret keys without encryption in the error logs. The keys are now masked.

4.14 Patch db/redshift5667 Latest
  • Added "Validate input data" property in the Redshift Bulk Load Snap to enable users to troubleshoot input data schema.
  • Enhanced a check to identify whether the Provided Query in the Redshift Execute Snap is of read or write type.
StableUpgraded with the latest SnapLogic Platform release.
4.13 Patch db/redshift/5303 Latest

Added a new property "Validate input data" in the Redshift Bulk Load Snap to help users troubleshoot the input data schema.

4.13 Patch db/redshift5186 Latest

Fixed the Bulk Load and Unload Snaps wherein the KMS encryption type property is failing with validation error.




Added KMS encryption support to these Snaps: Redshift Unload, Redshift Bulk Load, Redshift Bulk Upsert, and Redshift S3 Upsert.

4.12 Patch db/redshift5027 Latest

Fixed an issue wherein the Redshift Snaps timeout and fail to retrieve a database connection.

4.12 Patch

MULTIPLE4967 Latest

Provided an interim fix for an issue with the Redshift accounts by re-registering the driver for each account validation. The final fix is being shipped in a separate build.

4.12 Patch

MULTIPLE4744 Latest

Added support for Redshift grammar to recognize window functions as being part of the query statement.



StableUpgraded with the latest SnapLogic Platform release.
4.11 Patch db/redshift4589 Latest

Fixed an issue when creating a Redshift table via the second/metadata input view for the Redshift Bulk Load Snap.


Added SSL support to the Configuring Redshift Accounts.

4.10 Patch db/redshift4115 Latest

The Upsert or BulkUpdate/BulkLoad shall not execute and produce output when no inputView has been provided.

4.10 Patchredshift3936 Latest

Addressed an issue in Redshift Execute with a Select that hangs after extracting 13 million in the morning or 30 million in the evening 




Added Auto commit property to the Select and Execute Snaps at the Snap level to support overriding of the Auto commit property at the Account level.

4.9.0 Patch

redshift3229 Latest

Addressed an issue in Redshift Multiple Execute where INSERT INTO SELECT statement generated a 'transaction, commit and rollback statements are not supported' exception.

4.9.0 Patch

redshift3073 Latest

Fixed an issue regarding connection not closed after login failure; Expose autocommit for "Select into" statement in PostgreSQL Execute Snap and Redshift Execute Snap

  • Updated the Bulk Load, Bulk Upsert and S3 Upsert Snaps with the properties Vacuum type & Vacuum threshold (%) (replaced the original Vacuum property).

  • Update the S3 Upsert Snap with the properties, IAM role and Server-side encryption to support data upsert across two VPCs.

  • Added support for the Redshift driver under the account setting for JDBC jars.

4.8.0 Patchredshift2852 Latest
  • Addressed an issue with Redshift Insert failing with 'casts smallint as varchar'

  • Addressed an issue with Redshift Bulk Upsert fails to drop temp table

4.8.0 Patchredshift2799 Latest
  • Addressed an issue with Redshift Snaps with the default driver failing with could not load JDBC driver for url file.

  • Added the properties, JDBC Driver Class, JDBC jars and JDBC Url to enable the users to upload the Redshift JDBC drivers that can override the default driver.
4.8.0 Patchredshift2758 Latest

Potential fix for JDBC deadlock issue.

4.8.0 Patch

redshift2713 Latest

Fixed Redshift Snap Pack rendering dates that are one hour off from the date returned by database query for non-UTC Snaplexes

4.8.0 Patch

redshift2697 Latest

Addresses an issue where some changes made in the platform patch MRC294 to improve performance caused Snaps in the listed Snap Packs to fail.



  • Redshift MultiExecute Snap introduced in this release.

  • Redshift Account: Info tab added to accounts.

  • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.

  • Info tab added to accounts.
  • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.
4.7.0 Patchredshift2434 Latest

Replaced newSingleThreadExecutor() with a fixed thread pool.

4.7.0 Patch

redshift2387 Latest

Addressed an issue in Redshift Bulk Load Snap where Load Empty String was setting not working after release.

4.7.0 Patch

redshift2223 Latest

Auto-commit is turned off automatically for SELECT

4.7.0 Patch




Fixed an issue for database Select Snaps regarding Limit rows not supporting an empty string from a pipeline parameter.



  • Updated the Redshift Snap Account Settings with the IAM properties that include AWS account ID , IAM role name, and Region name.

  • Redshift Bulk Load Snap updated with the properties IAM Role & Server-side encryption.

  • Redshift Bulk Upsert Snap updated with the properties Load empty stringsIAM Role & Server-side encryption.

  • Updated the Redshift Upsert Snap with Load empty strings property.

  • Updated the Redshift Unload Snap with the property IAM role.

  • Redshift Execute Snap enhanced to fully support SQL statements with/without expressions & SQL bind variables.

  • Resolved an issue in Redshift Execute Snap that caused errors when executing a command Select current_schemas(true).

  • Resolved an issues in Redshift Execute Snap that caused errors when a Select * from <table_name> into statement was executed.

  • Enhanced error reporting in Redshift Bulk Load Snap to provided appropriate resolution messages.



  • Redshift S3 Upsert Snap introduced in this release.

  • Resolved an issue that occurred while inserting mismatched data type values in Redshift Insert Snap.



  • Resolved an issue in Redshift Bulk Upsert Snap that occurred when purging temp tables.

  • Resolved an issue in Redshift Upload/Upsert Snap that occurred when using IAM credentials in an EC2 instance with an S3 bucket.

4.4.1NA Latest

Resolved an issue with numeric precision when trying to use create table if not present in Redshift Insert Snap.

4.4NA StableUpgraded with the latest SnapLogic Platform release.
4.3.2NA Stable
  • Redshift Select Where clause property now has expression support.

  • Redshift Update Update condition property now has expression support.

  • Resolved an issue with Redshift Select Table metadata being empty if the casing is different from the suggested one for table name

4.3NA Stable
  • Table List Snap: A new option, Compute table graph, now lets you determine whether or not to generate dependents data into the output.

  • Redshift Unload Snap Parallel property now explicitly adds 'PARALLEL [OFF|FALSE]' to the UNLOAD query.

4.2NA Latest
  • Resolved an issue where Redshift SCD2 Snap historized the current row when no Cause-historization fields had changed.

  • Ignore empty result added to Execute and Select Snaps. The option will not any document to the output view for select statements.

  • Resolved an issue with Redshift Select Snap returning a Date object for DATE column data type instead of a LocalDate object.

  • Resolved an issue in RedShift SCD2 failing to close database cursor connection.

  • Resolved an issue with Redshift Lookup Snap not handling values with spaces in the prefix.

  • Updated driver not distributed with the Redshift Snap Pack.

  • Output fields table property added to Select Snap.

  • Resolved an issue with Redshift - Bulk Loader incorrectly writing to wrong location on S3 and disable data compression not working

  • Resolved an issue in Execute and Select Snaps where the output document was the same as the input document if the query produces no data. When there is no result from the SELECT query, the input document will be passed through to the output view as a value to the 'original' key. The new property Pass through with true default.

  • Redshift Account: Enhanced error messaging

  • Redshift SCD2: Bug fixes with compound keys

  • RedShift Lookup: Bug fixes on lookup failures; Pass-though on no lookup match property added to allow you to pass the input document through to the output view when there is no lookup matching.