Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

In this article

...

  • Multiexcerpt include macro
    nameME_ELT_GBQ_StandardSQL_UOI
    pageELT Copy
  • Unlike other Write-type Snaps, the Snap does not display the end SQL query (terminal Snap) in its output preview when connected to a BigQuery instance. This is because BigQuery makes use of APIs instead of an SQL query to load data.

...

Info
titleSQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

...

Parameter NameData TypeDescriptionDefault ValueExample 
LabelString
Insert excerpt
File Writer
File Writer
nopaneltrue
ELT LoadS3 Load
Get preview dataCheck box
Multiexcerpt include macro
namegetpreviewdata
pageELT Intersect
Not selected Selected
Database NameStringEnter the name of the database in which the target table exists. Leave this blank if you want to use the database name specified in the Default Database Name field in the account settings.N/ATESTDB
Schema Name (Not applicable to Databricks Lakehouse Platform)String

Required. Enter the name of the database schema. In case the schema name is not defined, then the suggestion for the schema name retrieves all schema names in the specified database when you click the suggest button.

Multiexcerpt include macro
nameME_Schema_Name
pageELT Insert Select

N/ASALES
Target Table NameString

Required. Enter the name of the table in which you want to perform the load operation. 

Multiexcerpt include macro
nameME_Schema_And_Table_Names
pageELT Insert Select

Note
  • If the target table does not exist, the Snap creates one with the name that you specify in this field and then writes the data into the table. However, this action only happens if you select the Drop and Create table option in the Load Action field.
  • You can specify the table name without using double quotes (""). You must use double quotes when including special characters in the table name, such as hyphens (-) in the table name.
  • A table name must always start with an alphabet.
  • Integers and underscores (_) can also be a part of the table name.
  • All characters are automatically converted to upper-case by the Snap. Use double-quotes to retain lower casing.


N/ASALES_ORDERS
Target Table Hash Distribution Column (Azure Synapse Only)String/Expression

Specify the hash distribution column name for the target table in Azure Synapse, if you choose the Load Action as Drop and Create table.


Info

This field is applicable only for Azure Synapse. Azure Synapse needs a table to be always hash-distributed for improved query performance. If the target table is created outside the Snap, you need not specify the target table column name.


N/Avar table
Load ActionDrop-down list

Select the load action to perform on the target table.

Available options are:

  • Append rows to existing table. Loads the data from the source files into the specified target table. You must ensure that the target table exists, else the Snap displays an error.
  • Overwrite existing table. Overwrites the target table with the data from the source files. You must ensure that the target table exists, else the Snap displays an error.
  • Drop and Create table. Drops the target table if it exists, creates a new table with the columns provided in the Table Columns field set, and then loads the data from the source files into the target table. Activates the Data Type field in the Table Columns field set.
  • Alter table. Modifies the schema of the target table based on the configuration of the Modifier field in the Table Columns field set and loads the data from the source files into the target table. Activates the Data Type and Modifier fields in the Table Columns field set. You must ensure that the target table exists, else the Snap displays an error.
Note

The Snap does not modify the existing tables during Pipeline validation, but if you choose the Drop and Create table option in the Load Action field and the target table does not exist, it creates a new (empty) target table based on the schema specified in the Snap.


Append rows to existing tableAlter table
Table Option List

This field set enables you to define the table options for creating the target table before performing the load operation. You must specify each table option in a separate row. Click to add a new row.

  • Table Option
Table OptionExpression/Suggestion

Click the Suggest icon (Image Modified) and choose a table option that you want to use, from the suggested values. Specify the required values for the selected option.

The table options you can use are:

  • PARTITION BY
  • CLUSTER BY
  • OPTIONS
  • COMMENT
N/AOPTIONS(custID=03761)
Table Columns

This field set enables you to configure the schema of the target table. You can use this field set to add/drop columns in the target table if you select the Drop and Create table or Alter table options in the Load Action field. 

This field set consists of the following fields:

  • Column
  • Data Type
  • Modifier
ColumnStringEnter the name of the column that you want to load in the target table. You can also specify the columns to drop if you select the Alter table option in the Load Action field. N/A

ID

FNAME

Data TypeString

Activates when you select the Drop and Create table or Alter table options in the Load Action field.

Enter the data type of the values in the specified column. 

N/A

INT

VARCHAR

ModifierString

Activates when you select the Alter table option in the Load Action field.

Select whether you want to add/drop the specified column.

Available options are:

  • Add
  • Drop
AddDrop
Redshift Schema Name (Only when BigQuery with Source Location as Redshift)String/Expression

Applicable only when you are loading data from a Redshift source to a BigQuery target.

Specify the schema name of the source table in Redshift.

N/APUBLIC, DEF_SCHEMA
Redshift Table Name (Only when BigQuery with Source Location as Redshift)String/Expression

Applicable only when you are loading data from a Redshift source to a BigQuery target.

Specify the name of the Redshift source table (from the schema selected above).

N/ADataLoad_Source01
Cloud Storage PathString/Expression

Specify the fully-qualified path to the folder in the Cloud storage in this field, if you want to override the default path defined in the Snap's Account settings. For example:

  • AWS S3 path: s3://<bucket name>/<folder>
  • DBFS folder: /<dbfs folder name>
  • Azure: azure://<accountname>.blob.core.windows.net/<containername>/<azurefolder>
  • Google Cloud Storage: gcs://<bucketname>/<folder>/
N/As3://bigbucket/app_data
File List

This field set enables you to specify the staged files to load. You must specify each file in a separate row. Click + to add a new row.

This field set consists of the following field:

  • File 
FileStringEnter the name of the staged file to load.N/ATestData.csv
File Name PatternString

Enter the pattern to use to match the file name and/or absolute path. You can specify this as a regular expression pattern string, enclosed in single quotes.

Info
titleFor Redshift tables
  • Redshift does not natively support use of regular expressions. But, Snaps in the ELT platform provide you with this ability for key-based search mechanism. 
  • If your search is based on AWS IAM roles, you cannot use regular expressions with Redshift tables.


Note
titleUsing Wildcard Patterns in BigQuery

BigQuery has its own set of rules to resolve wildcard patterns like *.* (that you provide in this field). Refer Batch loading data > Load Wildcards in BigQuery documentation for the rules around using wildcards in BigQuery.

See Loading Using Pattern Matching for details on Snowflake regular expressions, Examples of COPY INTO (Delta Lake on Databricks) for DLP, or Regular Expressions and the Java Programming Language for using regular expressions with Redshift, Azure Synapse, or BigQuery in this field.

N/A

Snowflake: '.*key.*[.]csv'

Redshift: './key.*csv'

File Format TypeString/Expression/Suggestion

Select the format for the file. The options vary based on the database type (Snowflake/Redshift/Azure Synapse) in your account settings.

The supported file formats are: 

  • Snowflake: AVRO, CSV, JSON, ORC, PARQUET, XML
  • Redshift: AVRO, CSV, JSON, ORC, PARQUET
  • Azure Synapse: CSV, ORC, PARQUET
NoneAVRO
File Format Option List

This field set enables you to define the file format options for the load operation. You must specify each file format option in a separate row. Click + to add a new row.

  • Format Option

See Snowflake Data Loading Commands, Redshift Data Format Parametersor COPY INTO (Transact-SQL) for Azure Synapse Analytics for the list of of options that you can use based on the target database.

Format OptionString/Expression/Suggestion

Click the Suggest icon (Image Modified) and choose a file format option that you want to use during the load operation, from the suggested values. Specify the required value for the selected option.

N/ASKIP_HEADER = 1
Encryption TypeDrop-down list

Select the encryption type that you want to use for the loaded data and/or files.

Available options are:

  • None
  • Server-Side Encryption (Not supported by Azure Synapse)
  • Server-Side KMS Encryption (Not supported by Azure Synapse)
Note

Server-side encryption is available only for S3 accounts and Snowflake.


NoneServer-Side Encryption
KMS KeyString

Activates when you select the Server-side KMS Encryption option in the Encryption Type field.

Enter the KMS key to use to encrypt the files. 

Note

Not required for Redshift databases. The load operation automatically recognizes and loads the encrypted files. See Loading encrypted files from Amazon S3 for details. 


N/A<require input>
Copy Option List

This field set enables you to define the COPY options for the load operation. You must specify each COPY option in a separate row. Click + to add a new row.

This field set comprises the following field:

  • Copy Option

See COPY Options for Snowflake, COPY Parameter Reference for Redshift, or COPY INTO (Transact-SQL) for Azure Synapse Analytics.

Copy OptionString/Expression/SuggestionClick the Suggest icon (Image Modified) and choose a COPY option that you want to use if there is an error during the load operation, from the suggested values. Specify the required value for the selected option.N/AON_ERROR = ABORT_STATEMENT
Validation ModeString

Select an output preview type for the data validation performed as part of the Pipeline validation. The options vary based on the database type (Snowflake/Redshift) in your account settings.

Available options for Snowflake are:

  • None
  • RETURN_n_ROWS
  • RETURN_ERRORS
  • RETURN_ALL_ERRORS

Available options for Redshift are:

  • None
  • NOLOAD
Info

No specific validation modes are available for Azure Synapse.

For more details, see Validating Staged Files for Snowflake and Validating Input Data for Redshift.

NoneRETURN_n_ROWS
Rows To ReturnInteger

Activates when you select the RETURN_n_ROWS option in the Validation Mode field.

Enter the number of records to retrieve from the target table to validate whether the load operation was successful.

N/A20
Enable source column position (Snowflake and Azure Synapse Only)Checkbox

Select this checkbox to include the column position from the source table in case of Snowflake or Azure Snapse.

In case of other target CDWs (Redshift, DLP and BigQuery), you can leave this checkbox unchecked.

Not selectedSelected
Target to Source Column Map

This field set enables you to define the mapping between the columns in the target table and the corresponding columns in the source file. You must define each mapping in a separate row. Click to add a new row.

The available options are:

  • Target Table Column Name
  • Source File Column Name
Target Table Column NameString/Expression/Suggestion

Click the Suggest icon (Image Modified) and choose a column from the list of target table columns displayed. Ensure that you have defined the Schema Name or the Target Table Name to see the list of columns in this field's suggestions.

N/AORDER_ID
Source File Column NameString/Expression/Suggestion

Click the Suggest icon (Image Modified) and choose a column from the list of source file columns displayed. Ensure that you have defined the Schema Name or source file path) to see the list of columns in this field's suggestions.

N/AORD_ID

Snap Behavior During Pipeline Validation

...

ErrorReasonResolution
Time '"01:01:23"' is not recognized.The CSV file you are trying to load into the Snowflake database contains single or double quotes for time and/or date values. Hence, the ELT Load Snap is unable to read and insert these values into the target table.After you specify the CSV files in the ELT Load Snap’s settings, add an entry in the File Format Options field set as FIELD_OPTIONALLY_ENCLOSED_BY='\'' or FIELD_OPTIONALLY_ENCLOSED_BY='\”' to match the quotes used in the CSV files for date and/or time values in it. The Snap would then ignore these quotes used for the values in the CSV files.
File read error while uploading data from a CSV fileThis can happen when the source CSV file contains a header row which the target database cannot recognize/interpret. 

Depending on your target database, use the respective File Format Option in the ELT Load Snap.

  • For SnowflakeSKIP_HEADER = 1
  • For Azure SynapseFIRSTROW = 2
  • For Databricks Lakehouse Platform'header' = 'true'

Invalid DBFS folder name

(Target CDW: Databricks Lakehouse Platform)

You have not specified an appropriate DBFS path in the DBFS Folder path field in the Snap's (DLP) account settings.Edit the Snap's account settings to ensure that you have provided a valid DBFS Folder path.

Database cannot be blank.

(when seeking the suggested list for Schema Name field)

Suggestions in the Schema Name and Target Table Name fields do not work when you have not specified a valid value for the Database Name field in this Snap.

Specify the target Database Name in this Snap to view and choose from a suggested list in the Schema Name and Target Table Name fields respectively.

Unknown Source / Not found: <path>

(Target CDW: BigQuery)

BigQuery has a unique way of resolving wildcard patterns like *.* (that you provide in the File or File Name Pattern fields of this Snap).

Make sure that the file name pattern you provide conforms to the rules defined in Batch loading data > Load Wildcards section of BigQuery documentation.

Loading data into a Redshift table using the RegEx pattern *.* in File Name Pattern field results in the operation being performed twice.

Since the Redshift Copy Into SQL does not natively support File Pattern option unlike the other CDWs, SnapLogic supports this externally through Java Regular Expressions (RegEx).

The RegEx *.* returns matching results from the root location instead of the folder location specified before *.* in the File Name Pattern field.

Use the RegEx csv/person/.+\..+ in the File Name Pattern field to load data from all the files available in a specific source folder.

Keyword RANGE is not acceptable as a column name.

(CDW: Databricks Lakehouse Platform)

This can happen with any reserved keyword if it is used as a column/field name in the table to be created.Ensure the enclose such column names (reserved keywords) between backticks (`). For example: `RANGE' STRING.

Keyword ORDER is not acceptable as a column name in the JSON file.

(Target CDW: BigQuery)

This can happen with any reserved keyword if it is used as a column/field name in the table to be created.Ensure that you enclose such column names (reserved keywords) between backticks (`). For example: `ORDER`.
SQL exception from Snowflake: Syntax error in one or more positions in the SQL query.
Column names in Snowflake tables are case-sensitive. It stores all columns in uppercase unless they are surrounded by quotes during the time of creation in which case, the exact casing is preserved. See, Identifier Requirements — Snowflake Documentation.Ensure that you follow the same casing for the column table names across the Pipeline.

[Simba][SparkJDBCDriver](500051) ERROR processing query/statement. Error Code: 0

Cannot create table ('<schema name>`.`<table name>`'). The associated location (`…<table name>`) is not empty but it's not a Delta table

(Target CDW: Databricks Lakehouse Platform)

The specified location contains one of the following:

  • A non-Delta table (such as CSV, ORC, JSON, PARQUET)

  • A corrupted table

  • A Delta table with a different table schema

So, the Snap/Pipeline cannot overwrite this table with the target table as needed.

Ensure that you take appropriate action (mentioned below) on the existing table before running your Pipeline again (to create another Delta table at this location).

Move or drop the existing table from the schema manually using one of the following commands:

Access the DBFS through a terminal and run:

  • dbfs mv dbfs:/<current_table_path> dbfs:/<new_table_path> to move the table or

  • dbfs rm -r dbfs:/<table_path> to drop the table.

OR

Use a Python notebook and run:

  • dbutils.fs.mv(from: String, to: String, recurse: boolean = false): boolean to move the table/file/directory or

  • dbutils.fs.rm(dir: String, recurse: boolean = false): boolean to drop the table/file/directory.

...

For the Pipeline to work as expected in case of the CSV file created on MacOS with the default (MacOS) line endings, define the value for the file format option ROWTERMINATOR as '0X0A' instead of the usual '\n'. With '\n' as ROWTERMINATOR, the Pipeline runs successfully, but does not insert the data from the CSV file into the newly-created table in Azure Synapse.

...