ELT Select

In this article

An account for the Snap

You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:

Overview

Use this Snap to build SELECT SQL queries to fetch records from the specified table. Executed in its default state after providing database, schema, and table name, the Snap builds a standard SELECT * FROM query. However, you can use the inbuilt SQL query editor to build complex queries to perform complex operations spanning multiple tables such as JOIN, AGGREGATE, LIMIT, and DISTINCT. Additionally, this Snap can also connect to multiple upstream Snaps. This enables you to build a query to perform operations on multiple tables together or separately at the same time. 

Prerequisites

A valid SnapLogic account to connect to the database in which you want to execute the query if you also want to preview the data.

SnapLogic accounts are also required if you want to use the Snap's suggest feature in fetching schema and table names.

Limitation

ELT Snap Pack  does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.

Known Issue

  • Suggestions displayed for the Schema Name field in this Snap is from all databases that the Snap account user can access, instead of the specific database selected in the Snap's account or Settings.
  • While you specify an SQL statement in the SQL Query Editor of the Snap as an expression, the dynamic validation for the expression displays inline errors when there is more than one incoming document and without the '__sql__' key to the current Snap, when you select Get Preview Data checkbox in the previous Snap, and when Preview Document Count in your user settings is set to a value more than 1.

    • To prevent this error and similar ones, do not select the Get Preview Data checkbox in the previous Snap, set the Preview Document Count in your user settings to 1, or append a condition where 1 = 0 to the SQL statement with the Get Preview Data checkbox selected.

In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):

  • WHERE clause (ELT Filter Snap)
  • WHEN clause
  • ON condition (ELT Join, ELT Merge Into Snaps)
  • HAVING clause
  • QUALIFY clause
  • Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
  • Update expressions list (column names and values in ELT Merge Into Snap)
  • Secondary AND condition
  • Inside SQL query editor (ELT Select and ELT Execute Snaps)

Workaround

As a workaround while using these SQL query constructs, you can:

  • Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
  • In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like SELECT * FROM CSV.`/mnt/csv1.csv` returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:
    • Write the data in the CSV file to a DLP table beforehand, as in: CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv` where a1, b1, and so on are the new column names.
    • Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
  • In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.

Snap Input and Output


Input/OutputType of ViewNumber of ViewsExamples of Upstream and Downstream SnapsDescription
Input 

Document

  • Min: 0
  • Max: No limit
  • ELT Transform
  • ELT Copy
The database, schema, table names in which the query must be executed. You can also pass this information using an upstream ELT Snap. Additionally, you can also pass such information for several tables to the Snap using a separate input view for each table.
Output

Document

  • Min: 1
  • Max: 1
  • ELT Copy
  • ELT Intersect

A SELECT SQL query for the table specified in the Snap. If you use the SQL editor, then an SQL query built based on your inputs in the editor. Optionally, the output also includes a preview of the query's output if the Get preview data checkbox is selected.

Snap Settings

SQL Functions and Expressions for ELT

You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.

Parameter NameData TypeDescriptionDefault ValueExample 
LabelString
Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.
ELT SelectFetch Sales Data
Get preview dataCheckbox

Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation.

In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline.

The number of records displayed in the preview (upon validation) is the smaller of the following:

  • Number of records available upon execution of the SQL query generated by the Snap.

  • The value set in the Preview Data Count setting (default is 50 records).

Rendering Complex Data Types in Databricks Lakehouse Platform

Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type.

If you have selected this field, then you must ensure that the database name is also provided, either in the Snap's Database Name field or in the Account. Otherwise, the Snap's output contains only the SQL built using the Snap. This does not indicate that the SQL is correct. 

Not selectedSelected
Database NameString

The name of the database in which the target/source tables are located. Leave it blank to use the database name specified in the account settings.

You must ensure that the database name is correct. Otherwise, even though the Snap builds an SQL query, you will receive an error when executing the Snap since the query executes only in the target database.  

If your target database is Databricks Lakehouse Platform (DLP), you can, alternatively, mention the file format type for your table path in this field. For example, DELTA, CSV, JSON, ORC, AVRO. See Table Path Management for DLP section below to understand the Snap's behavior towards table paths.

N/A

TESTDB

DELTA

ORC

Schema Name (Not applicable to Databricks Lakehouse Platform)String

The name of the database schema. In case it is not defined, then the suggestion for the table name retrieves all tables names of all schema in the specified database (except for DLP) when you click .

  • Ensure that you include the exactly same schema name including the double quotes, if used, when you repeat the schema name in the Table Name field.
  • Leave this field blank if your target database is Databricks Lakehouse Platform.
N/APUBLIC
Table NameString

The name of the table or view for which you want to build the SQL queries. If it is not defined, then the Snap retrieves all the table names and views associated with the specified schema when you click  

  • Ensure that you include the exactly same schema name, if at all, including the double quotes as specified in the Schema Name field.

If your target database is Databricks Lakehouse Platform (DLP), you can, alternatively, include the path in this field. For example, `/mnt/elt/mytabletarget`Learn more about the Snap’s behavior toward paths in the Table Path Management for DLP section.

When your target database is BigQuery, this Snap supports wildcard search in this field. 

Learn more on wildcard search in BigQuery: Querying wildcard tables.

N/A

PUBLIC.EMPLOYEES

`/mnt/elt/emp`

Enable SQL query editorCheckbox

Select this to enable the SQL query editor. You can use this editor to build all types of SQL queries that will execute on multiple tables. Enabling the SQL query editor activates the fieldset Input View to Virtual Table Map in the Snap.

  • To perform operations upon multiple tables, you must pass the table names through upstream Snaps. Specify each table name in a separate input view.
  • You must be careful when phrasing the query to prevent SQL injection. See Preventing SQL Injection for details. 
Not selectedSelected
Input View to Virtual Table Map

Activates when you select the Enable SQL query editor checkbox. This is useful when you enable multiple input views in the ELT Select Snap. Use this field set to map each input view to a dummy table. You will reference the table name specified in each input view with this dummy/virtual table name. 

  • Each input view corresponds to the SQL coming from the upstream Snap and that SQL is referenced by the dummy table name.
  • Virtual table name columns must not have duplicates.
  • One input view must not be associated with different table names. 

Click + to add rows. Each input view must be specified in a separate row.

This field set consists of the following fields:

  • Input View
  • Virtual Table Name
Input ViewStringSelect the input view whose table reference is to be mapped.N/Ainput2
Virtual Table NameStringSpecify the dummy/virtual table name that will be assigned to the table reference in the selected input view. N/Atbl1
SQL Query EditorQuery editor

Enter the SQL queries that you want to build using this Snap. Even though the Snap's name is ELT Select, you can use this SQL query editor to build any SQL query.

SQL Comments are allowed

You can include inline comments and multi-line comments before, inside, or after your statement in this editor. It supports all standard SQL comment syntaxes as listed below:

-- comment text# comment text/* multi-line
comment text */
// comment text
N/ASELECT DISTINCT  FROM MYTABLE

Table Path Management for DLP

#

File Format Type
(Database Name field)

Table Path exists?#All other requirements
are valid?
Snap Operation Result
1DELTAYesYesSuccess
2DELTANoYesFailure. Snap displays error message.
3DELTAYesNoFailure. Snap displays error message.
4AVRO/CSV/JSON/ORC/otherYesYesSuccess.

# We recommend that you specify a table path that resolves to a valid data file. Create the required target file, if need be, before running your Pipeline.

Preventing SQL Injection

You can pass Pipeline parameters as values in an SQL query; however, if you do not phrase the query properly it can lead to the parameter's name being bound as a value in the database. This potentially incorrect information being inserted into the database is known as SQL injection. It is thus necessary to take precautions when including Pipeline parameters in your SQL query to prevent SQL injection. Based upon the intended use of the Pipeline parameter, use one or both the following methods to prevent accidental SQL injection:

Method-1: Simple Substitutions

You can reference the Pipeline parameter directly with a JSON-path without enabling expressions.

For example, consider a Pipeline parameter name which contains the value of a column. You want to write a SELECT query to fetch records from the table mytable where the column's name matches the value in the Pipeline parameter. 

SELECT * FROM mytable WHERE colname = _name

Method-2: Dynamic Substitutions

You must enable expressions when referencing table names using Pipeline parameters. Format the query, except the Pipeline parameter's reference, as a string. 

For example, if you you want to write a SELECT query to fetch records from a table and you are passing the table's name in the Pipeline parameter table:

"SELECT * FROM" + _table

In the example above, you must still use JSON-paths if you want to use another Pipeline parameter for substituting values in the SQL:

"SELECT * FROM " + _table + " WHERE colname = _name"

The Snap evaluates the expression and also carries out path substitutions.

Here is how it works

The Snap pre-processes the query to extract any JSON-Paths and converts them to bound parameters. For example, consider the following query:

SELECT * FROM mytable WHERE name = _name

The Snap converts this query into the following before turning it into a prepared statement for the database:

SELECT * FROM mytable WHERE name = ?

The Snap evaluates the JSON-Path to get the value to bind the Pipeline parameter in the prepared statement. 

Using escape characters

When expressions are disabled, use \ as an escape character to treat underscore (_) as a string.

For example:

SELECT \_2, \_3 FROM mytable WHERE colname = \_name 

Troubleshooting

<
ErrorReason