On this Page

Snap Type:

Read

Description:

This Snap unloads the result of a query to one or more files on Amazon S3, using Amazon S3 server-side encryption (SSE)/server-side encryption with KMS/client-side encryption with CMK. The target Redshift table is not modified.

ETL Transformations & Data Flow

https://github.com/awslabs/amazon-redshift-utils/tree/master/src/UnloadCopyUtility.

[
    {
        "entries": [
            {
                "url": "s3:///mybucket/unload_folder/test0000_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0001_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0002_part_00"
            },
            {
                "url": "s3:///mybucket/unload_folder/test0003_part_00"
            }
        ],
        "status": "success",
        "unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test' 
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
    }
]

Upon the successful preview, the expected output data is as follows:

[
    {
        "entries": [
            {
                "url": "s3:///<bucket>/<folder>/<prefix>000n_part_00"
            }
        ],
        "status": "preview",
        "unloadQuery": "UNLOAD ('SELECT * FROM public.company') TO 's3://mybucket/unload_folder/test' 
CREDENTIALS 'aws_access_key_id=;aws_secret_access_key=' MANIFEST ESCAPE ALLOWOVERWRITE PARALLEL"
    }
]

The Snap behaves the same on a Groundplex as it does in a Cloudplex.

Expected upstream Snaps: Any Snap with a document output view

Expected downstream Snaps: Any Snap with a document input view, such as JSON Formatter, Mapper, and so on. The CSV Formatter Snap cannot be connected directly to this Snap since the output document map data is not flat.

This Snap provides a front end to the Amazon Redshift Unload/Copy Utility. The Snap allows data to be efficiently moved from one Redshift instance to an optionally encrypted S3 bucket. The COPY snap moves data from an optionally encrypted S3 bucket into a second Redshift instance. Alternately the data may be downloaded to another system using the S3 READ Snap.

By default the data will be written to a separate CSV-encoded compressed file per Redshift 'slice'. It is possible to override this behavior but the resulting file must be smaller than 6.2 GB.

The default CSV delimiter is a carat ('^'), not a comma. (Note: elsewhere I've seen the default delimiter listed as a pipe ('|') character.)

Input & Output

Input: Key-value map data to evaluate expression properties of the Snap.

Input Schema Provided: No

Output: This snap produces one document. The fields are:

entries - a list of S3 URLs.
status - "success" or "unsuccess"
unloadQuery - the actual SQL command executed.

Output Schema Provided: No

Preview Supported: Yes

Passthrough Supported: No

Output Examples:

- Upon the successful execution, the expected output data is as follows:

Prerequisites:

The Redshift account should contain S3 Access-key ID, S3 Secret key, S3 Bucket and S3 Folder.
The Redshift account security settings should allow access from the IP Address of the cloudplex or groundplex.
The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.
The Amazon S3 bucket where Amazon Redshift should allow write access from the IP address of the cloudplex or groundplex.

Limitations and Known Issues:

Works in Ultra Tasks.

If you use the PostgreSQL driver (org.postgresql.Driver) with the Redshift Snap Pack, it could result in errors if the data type provided to the Snap does not match the data type in the Redshift table schema. Either use the Redshift driver (com.amazon.redshift.jdbc42.Driver) or use the correct data type in the input document to resolve these errors.

Configurations:

Account & Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. This Snap requires a Redshift Account with S3 properties. See Configuring Redshift Accounts for information on setting up this type of account.

Redshift IAM Account Setup

If the EC2 plex (where your pipeline is running with an IAM role), Redshift cluster, and S3 bucket are in the same AWS account, then you must use the SnapLogic Redshift Account (regular IAM Account).
If the EC2 plex (where your pipeline is running with an IAM role) is in one account and the Redshift cluster and S3 bucket are in a different AWS account, you must use the SnapLogic Redshift Cross-Account IAM Role Account to run your pipelines successfully.

This applies only to the Redshift - Bulk Load, Redshift - Unload, and Redshift - S3 Upsert Snaps.

Views:

Input

This Snap has at most one input view.

Output

This Snap has at most one document output view.
The output document contains map data:

a "status" field with "success" or "preview",
a "unloadQuery" field with the query sent to Redshift, and
an "entries" field with a list of S3 URL's written by the Redshift UNLOAD operation.

Error

This Snap has at most one document error view and produces zero or more documents in the view.

The error view would contain error, reason, resolution and stack trace.

Troubleshooting:

The preview on this Snap will not execute the Redshift UNLOAD operation. Connect a JSON Formatter and a File Writer Snaps to the error view and then execute the pipeline. If there is any error, you will be able to preview the output file in the File Writer Snap for the error information.

Settings

Label

Required The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Select query

Required.

Defines a SELECT query. The results of the query are unloaded. In most cases, it is worthwhile to unload data in sorted order by specifying an ORDER BY clause in the query; this approach will save the time required to sort the data when it is reloaded.

Example: SELECT * FROM public.company ORDER BY id

Default value: SELECT * FROM

S3 file prefix

The prefix of AWS S3 file names which are used by Redshift to write data. The Snap uses S3 Bucket and S3 Folder in the RedShift Account to format the full S3 path. File names created by RedShift are in the format: s3://<bucket>/<folder>/<s3-file-prefix><slice-number>_part_<file-number>, where <s3-file-prefix> is the value of this property. The Amazon S3 bucket where Amazon Redshift will write the output files must reside in the same region as your cluster.

Example: test

Default value: [None]

Delimiter

Required. Single ASCII character that is used to separate fields in the output file. DELIMITER will be ignored if FIXEDWIDTH is specified. If the data contains the delimiter character, you will need to specify the ESCAPE option to escape the delimiter, or use ADDQUOTES to enclose the data in double quotes. Alternatively, specify a delimiter that is not contained in the data.

Example: a pipe character ( | ), a comma ( , ), or a tab ( \t )

Default value: a pipe character ( | )

Fixed width

Specifies FIXEDWIDTH spec., in which Redshift unloads the data to a file where each column width is a fixed length, rather than separated by a delimiter. The FIXEDWIDTH spec is a string that specifies the number of columns and the width of the columns. DELIMITER is ignored if FIXEDWIDTH is specified. Because FIXEDWIDTH does not truncate data, the specification for each column in the UNLOAD query needs to be at least as long as the length of the longest entry for that column. The format for fixedwidth_spec is: 'colID1:colWidth1,colID2:colWidth2, ...'

Example: colID1:36,colID2:36,colID3:256

Default value: [None]

Null as

Specifies a string that represents a null value in unload files. If this option is used, all output files contain the specified string in place of any null values found in the selected data. If this option is not specified, null values are unloaded as:
Zero-length strings for delimited output
Whitespace strings for fixed-width output
If a null string is specified for a fixed-width unload and the width of an output column is less than the width of the null string, the following behavior occurs:
An empty field is output for non-character columns
An error is reported for character columns
Example: null

Default value: [None]

Anaplan server may ignore this setting and try to auto-detect. See Anaplan for details.

Escape

If selected, for CHAR and VARCHAR columns in delimited unload files, an escape character (\) is placed before every occurrence of the following characters:
Linefeed: \n
Carriage return: \r
The delimiter character specified for the unloaded data.
The escape character: \
A quote character: " or ' (if both ESCAPE and ADDQUOTES are selected).
We strongly recommend that you always select this property unless you are certain that your data does not contain any delimiters or other characters that might need to be escaped.

Important: If you loaded your data using a COPY with the ESCAPE option, you must also specify the ESCAPE option with your UNLOAD command to generate the reciprocal output file. Similarly, if you UNLOAD using the ESCAPE option, you will need to use ESCAPE when you COPY the same data.

Default value: Selected

Add quotes

If checked, RedShift places quotation marks around each unloaded data field, so that Amazon Redshift can unload data values that contain the delimiter itself.