ELT Insert-Select
In this article
An account for the Snap
You must define an account for this Snap to communicate with your target CDW. Click the account specific to your target CDW below for more information:
Overview
Use this Snap to perform the INSERT INTO SELECT operation on the specified table. The Snap creates a table and inserts the data if the target table does not exist. After successfully running a Pipeline that contains this Snap, you can check for the data updates made to the target table in one of the following ways:
- Validate the Pipeline and review the Snap's output preview data.
- Query the target table using ELT Select Snap for the latest data available.
- Open the target database and check for the new data in the target table.
Prerequisites
- A valid SnapLogic account to connect to the database in which you want to perform the INSERT INTO SELECT operation.
- Your database account must have the following permissions:
- SELECT privileges for the source table whose data you want to insert into the target table.
- CREATE TABLE privileges for the database in which you want to create the table.
- INSERT privileges to insert data into the target table.
Limitations
- The input data must correspond to the specified table's schema. You can use the ELT Transform Snap to ensure this.
- ELT Snap Pack does not support Legacy SQL dialect of Google BigQuery. We recommend that you use only the BigQuery's Standard SQL dialect in this Snap.
Known Issues
If the last Snap in the Pipeline takes 2 to 5 seconds to update the runtime, the ELT Pipeline statistics are not displayed even after the Pipeline is completed. The UI does not auto-refresh to display the statistics after the runtime.
Workaround: Close the Pipeline statistics window and reopen it to see the ELT Pipeline statistics.When you return to the Snap Statistics tab from the Extra Details tab in the Pipeline Execution Statistics pane, it contains the status bar (Pipeline execution status) instead of the Download Query Details hyperlink and the individual counts of Records Added, Records Updated, and Records Deleted.
- When your Databricks Lakehouse Platform instance uses Databricks Runtime Version 8.4 or lower, ELT operations involving large amounts of data might fail due to the smaller memory capacity of 536870912 bytes (512MB) allocated by default. This issue does not occur if you are using Databricks Runtime Version 9.0.
- Suggestions displayed for the Schema Name field in this Snap are from all databases that the Snap account user can access, instead of the specific database selected in the Snap's account or Settings.
In any of the supported target databases, this Snap does not appropriately identify nor render column references beginning with an _ (underscore) inside SQL queries/statements that use the following constructs and contexts (the Snap works as expected in all other scenarios):
WHERE
clause (ELT Filter Snap)WHEN
clauseON
condition (ELT Join, ELT Merge Into Snaps)HAVING
clauseQUALIFY
clause- Insert expressions (column names and values in ELT Insert Select, ELT Load, and ELT Merge Into Snaps)
- Update expressions list (column names and values in ELT Merge Into Snap)
- Secondary
AND
condition Inside SQL query editor (ELT Select and ELT Execute Snaps)
Workaround
As a workaround while using these SQL query constructs, you can:
- Precede this Snap with an ELT Transform Snap to re-map the '_' column references to suitable column names (that do not begin with an _ ) and reference the new column names in the next Snap, as needed.
- In case of Databricks Lakehouse Platform where CSV files do not have a header (column names), a simple query like
SELECT * FROM CSV.`/mnt/csv1.csv`
returns default names such as _c0, _c1, _c2 for the columns which this Snap cannot interpret. To avoid this scenario, you can:- Write the data in the CSV file to a DLP table beforehand, as in:
CREATE TABLE csvdatatable (a1 int, b1 int,…) USING CSV `/mnt/csv1.csv`
where a1, b1, and so on are the new column names. - Then, read the data from this new table (with column names a1, b1, and so on) using a simple SELECT statement.
- Write the data in the CSV file to a DLP table beforehand, as in:
- In case of Databricks Lakehouse Platform, all ELT Snaps' preview data (during validation) contains a value with precision higher than that of the actual floating point value (float data type) stored in the Delta. For example, 24.123404659344 instead of 24.1234. However, the Snap reflects the exact values during Pipeline executions.
Snap Input and Output
Input/Output | Type of View | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input | Document |
|
| The data to be inserted into the target table. Ensure that the data corresponds to the target table's schema. |
Output | Document |
|
| A document containing the SQL SELECT query executed on the target database. |
Snap Settings
SQL Functions and Expressions for ELT
You can use the SQL Expressions and Functions supported for ELT to define your Snap or Account settings with the Expression symbol = enabled, where available. This list is common to all target CDWs supported. You can also use other expressions/functions that your target CDW supports.
Parameter Name | Data Type | Description | Default Value | Example |
---|---|---|---|---|
Label | String | Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. | ELT Insert-Select | Insert Employee Records |
Get preview data | Checkbox | Select this checkbox to include a preview of the query's output. The Snap performs limited execution and generates a data preview during Pipeline validation. In the case of ELT Pipelines, only the SQL query flows through the Snaps but not the actual source data. Hence, the preview data for a Snap is the result of executing the SQL query that the Snap has generated in the Pipeline. The number of records displayed in the preview (upon validation) is the smaller of the following:
Rendering Complex Data Types in Databricks Lakehouse Platform Based on the data types of the fields in the input schema, the Snap renders the complex data types like map and struct as object data type and array as an array data type. It renders all other incoming data types as-is except for the values in binary fields are displayed as a base64 encoded string and as string data type. | Not selected | Selected |
Database Name | String | Required. Enter the name of the database in which the target table is located. Leave it blank to use the database name specified in the account settings. If your target database is Databricks Lakehouse Platform (DLP), you can, alternatively, mention the file format type for your table path in this field. For example, DELTA, CSV, JSON, ORC, AVRO. See Table Path Management for DLP section below to understand the Snap's behavior towards table paths. | N/A | TESTDB |
Schema Name (Not applicable to Databricks Lakehouse Platform) | String | Required. Enter the name of the database schema. In case it is not defined, then the suggestion for the schema name retrieves all schema names in the specified database when you click .
| N/A | "TEST_DATA" |
Target Table Name | String | Required. The name of the table or view into which you want to insert the data. Only views that can be updated (have new rows) are listed as suggestions. So, Join views are not included. This also implies that the Snap account user has the Insert privileges on the views listed as suggestions. If your target database is Databricks Lakehouse Platform (DLP), you can, alternatively, mention the target table path in this field. Enclose the DBFS table path between two If you choose to include the schema name as part of the target table name, ensure that it is the same as specified in the Schema Name field, including the double quotes. For example, if the schema name specified is "table_schema_1", then the table name should be "table_schema_1"."tablename". If the target table or view does not exist during run-time, the Snap creates one with the name that you specify in this field and writes the data into it. During Pipeline validation, the Snap creates the new table or view but does not write any records into it.
| N/A | "TEST_DATA"."DIRECT" EMPLOYEE_DATA EMPLOYEE_123_DATA REVENUE"-"OUTLET "net_revenue" |
Advanced Options | Checkbox | Select this checkbox to define the mapping of your source and target table columns scenario with or without the Insert Expressions list. When selected, it activates the Operation Types field. | Deselected | Selected |
Operation Types | Dropdown list | Choose one of the following options that best describes your source data and the INSERT preference:
| Source Columns Order | Some Source and Target Column names are identical |
Table Option List | This field set enables you to specify the table options you want to use on the target table. These options are populated based on the Snap Account (target CDW) selected. You must specify each table option in a separate row. Click to add rows. This field set contains one field:
| |||
Table Option | String/Suggestion | Click the Suggestions icon ( ) to view and select the table option you want to apply for loading data into the target table. | N/A | DISTRIBUTION = HASH ( cust_name ) |
Insert Expression List | This field set enables you to specify the values for a subset of the columns in the target table. The remaining columns are assigned null values automatically. You must specify each column in a separate row. Click to add rows. This field set is disabled if you select All Source and Target Column names are identical in the Operation Types field. This field set consists of the following fields:
You can use this field set to insert data only into an existing table. | |||
Insert Column | String | Enter the name of the column in the target table to assign values. | N/A | ORD_AMOUNT |
Insert Value | String | Enter the value to assign in the specified column. Repeat the column name if you want to use the values in the source table. You can also use expressions to transform the values. | N/A | ORD_AMOUNT ORD_AMOUNT+20 |
Overwrite | Checkbox | Select to overwrite the data in the target table. If not selected, the incoming data is appended. | Not selected | Selected |
Snap behavior in different source and target table column scenarios
Data match (operation) type | Number of source table columns | Number of target table columns | Insert expression list specified? | Snap behavior |
---|---|---|---|---|
Not specified (Advanced Options checkbox is not selected) | Less | More | Yes |
|
Not specified (Advanced Options checkbox is not selected) | Same | Same | Yes |
|
Not specified (Advanced Options checkbox is not selected) | More | Less | Yes |
|
Not specified (Advanced Options checkbox is not selected) | Less | More | No |
|
Not specified (Advanced Options checkbox is not selected) | Same | Same | No |
|
Not specified (Advanced Options checkbox is not selected) | More | Less | No |
|
Source Columns Order | Less | More | No |
|
Source Columns Order | Less | More | Yes |
|
Source Columns Order | Same | Same | No |
|
Source Columns Order | Same | Same | Yes |
|
All Source and Target Column names are identical | Less | More | Not displayed |
|