PolyBase Bulk Load

On this Page

Snap type:

Write 

Description:

This Snap Performs a bulk load operation from the input view document stream to the target table. The Snap supports SQL Server database with PolyBase feature, which includes SQL Server 2016 (on-premise) and Data Warehouse. It first formats the input view document stream to a temporary CSV file in Azure Blob storage and then sends a bulk load request to the database to load the temporary Blob file to the target table. 

ETL Transformations & Data Flow

This Snap enables the following ETL operations/flows:

Loads data into a temporarily created Azure blob. Executes the SQl server command to load the above blob into the target table.

  • The Snap reads all the incoming documents and writes them to a temporarily created blob on the Azure storage
  • The Snap executes the following DB commands in sequence:
    • Create a master key, only if it does not exist
    • Create the Database scoped credentials, only if it does not exist
    • Create an external data source
    • Create an external file format
    • Create an external table (blob will be copied this external table)
    • Copy the data from external table to the destination table

Input & Output

  • InputThis Snap must have an upstream Snap that can pass a document output view. Such as Structure or JSON Generator.

  • Output: The Snap outputs one document specifying the records that have been inserted successfully. The records that are not written to the blob successfully are routed to the error view.

Prerequisites:

Bulk load requires a minimum of SQL Server 2016 to work properly.

The database should have PolyBase feature enabled in it.

Support and limitations:
  • Works in Ultra Pipelines.
  • If the Snap fails while loading blob into the DB, the temporary blob created remains un-deleted so the data is not lost.
  • Microsoft PolyBase does not support varchar entries which contain more than 1000 characters. As a workaround, if any row contains a varchar entry with more than 1000 characters, use the Azure SQL - Bulk Load Snap instead.
Account: 

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Configuring Azure SQL Accounts for information on setting up this type of account.


Views:
InputThis Snap has exactly one document input view.
OutputThis Snap has at most one document output view. If an output view is available, it conveys that the bulk load operations were carried out successfully.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.
Troubleshooting:
  • Ensure the DB interacted with it is at least SQL 2016 with the PolyBase feature enabled.
  • Ensure the DB credentials provided are valid.
  • Ensure the Azure blob storage account is set up properly.
  • Ensure the valid blob account credentials.
  • If the Snap fails when writing to a data warehouse, it writes a new blob in the Azure container. This new blob highlights the first invalid row that caused the bulk load operation to fail.

Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Schema Name


The database schema name. In case it is not defined, then the suggestion for the table name will retrieve all tables names of all schemas. The property is suggest-able and will retrieve available database schemas during suggest values.

The values can be passed using the pipeline parameters but not the upstream parameter.

Example: SYS

Default value: None 

Table Name


Required. The target table to load the incoming data into.

The values can be passed using the pipeline parameters but not the upstream parameter.

Example: users

Default value: None 

Create table if not present


Select this property to create target table in case it does not exist; otherwise the system will through table not found error.

Example: table1

Default value: None

Schema source

Specifies if the schema must be fetched from the input document or from the existing table while loading data into the temporary blob at the time of bulk upload. The options available are: Schema from provided input and Schema from existing table.

Default value: Schema from provided input

Use type default

Specifies how to handle any missing values in input documents. The options available are TRUE and FALSE. If you select TRUE, the Snap replaces every missing value in the input document with its default value in the external table. Supported data types and their default values are:

  • Numeric - 0
  • String - ""
  • Date - 1900-01-01

If you select FALSE, the Snap replaces every missing value in the input document with a null value in the external table.

Default value: TRUE

Bulk insert mode


Specifies if the incoming data should be appended to the target table or overwrite the existing data in that table. The options available are: Append and Overwrite.

Example: Append, Overwrite

Default value: Append 

If you select Overwrite, the Snap overwrites the existing table and schema with the input data.

Database scoped credential


The Scoped credential is used to execute the queries in the bulk load operation. To do bulk load via storage blob, external database resources are required to be created.  This, in turn, requires a "Database Scoped Credential".  Refer to https://msdn.microsoft.com/en-us/library/mt270260.aspx for additional information.

Provide the scoped credentials if one exists on the DB or the Snap will create the temporary scoped credentials and deletes them once the operation is completed. 

Default value: None 

Encoding

The encoding standard for the Input data to be loaded on to the database. The available options are:

None - Select this option only when using the Polybase Bulk Load with SQL Server 2016.

UTF-8 - Select this option for the input standard in UTF-8 when using the Snap with Azure database.

UTF-16 - Select this option for the input standard in UTF-16 when using the Snap with Azure database.

Default value: None

Snap execution

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Basic Use Case

In the below example, the Mapper Snap maps the input schema from the upstream to the target metadata on the PolyBase Bulk Load.  

The PolyBase Bulk Load Snap loads the records into the table, "dbo"."9181",  The successful execution of the pipeline displays the success status of the loaded records.

The pipeline performs the below ETL transformations:

Extract: The JSON Generator Snap gets the records to be loaded into the PolyBase table

Transform: The Mapper Snap maps the input schema to the metadata of the PolyBase Bulk Load Snap

Load: The PolyBase Bulk Load Snap loads the records into the required table.

Typical Snap Configurations

The key configuration of the Snap lies in how you pass the statement to write the records . As it applies in SnapLogic, you can pass SQL statements in the following manner:

  •  Without Expression: Directly passing the required statement in the PolyBase Bulk Load Snap.

  • With Expressions
    • Pipeline Parameter: Pipeline parameter set to pass the required values to the PolyBase Bulk Load Snap.

The Mapper Snap maps the input schema to the target fields on the PolyBase table.

The pipeline properties as set to be passed into the Snap:

Advanced Use Case

The following describes a pipeline, with a broader business logic involving multiple ETL transformations, that shows how typically in an enterprise environment, PolyBase functionality is used. Pipeline download link below.

In this pipeline, the PolyBase Bulk Load Snap extracts the data from a table on the Oracle DB using a Oracle Select Snap and  bulk loads into the table on the PolyBase table. The Output preview displays the status of the execution.  

  1. Extract: The Oracle Select Snap reads the records from the Oracle Database. 

  2. Load: The PolyBase Bulk load Snap loads the records into the Azure SQL Database. 

Downloads

Important steps to successfully reuse Pipelines

  1. Download and import the pipeline into the SnapLogic application.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.

  File Modified

File Polybase- bulk load Basic test case.slpBasic.slp

Aug 07, 2017 by Aparna Tayi

File Polybase- bulk load test case.slp

Aug 07, 2017 by Aparna Tayi

File PolyBase Bulk Load _Advance Use case.slp

Aug 07, 2017 by Aparna Tayi

Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateTypeUpdates
August 2024main27765 StableUpgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.
May 2024437patches27180 Latest

Fixed the following issues with the Azure SQL - Bulk Load Snap:

  • The Snap displayed an error when the DateTime was of the LocalDateTime type.

  • The Snap lost milliseconds when the DateTime was in the String data type because the given DateTime format was parsed into a Date object and the timestamp object was created from that date.

May 2024main26341 Stable

The following Azure SQL Active Directory Accounts have been renamed because Microsoft has rebranded Azure Active Directory to Microsoft Entra ID.

February 2024436patches25468 Latest

The Azure SQL Bulk Extract Snap now supports Azure SQL Active Directory and Azure SQL Active Directory Dynamic Accounts.

February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023main23721 StableUpdated and certified against the current SnapLogic Platform release.
August 2023main22460

 

Stable

The Azure SQL Execute Snap now includes a new Query type field. When Auto is selected, the Snap determines the query type automatically.

May 2023main21015 StableUpgraded with the latest SnapLogic Platform release.
May 2023432patches20967 LatestFixed an issue with the connection pool in the Azure SQL accounts, which was affecting the Snap Pack's performance. You should now experience improved performance when using these accounts.
March 2023432patches20318 Latest

The Azure SQL - Bulk Extract Snap no longer fails with a java.lang.NumberFormatException error.

March 2023432patches20219 LatestFixed an issue with the Azure SQL - Bulk Load Snap involving special characters in JDBC URL properties, such as passwords. Special characters are properly escaped now
March 2023432patches20049 Latest

Intermittent connectivity issues no longer occur when using some Snaps in the Azure SQL Snap Pack. These issues caused the following message to display: The connection is broken and recovery is not possible. The connection is marked by the client driver as unrecoverable. No attempt was made to restore the connection

February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
January 2023

431patches19493

 Latest

The Azure SQL Active Directory and the Azure SQL Active Directory Dynamic accounts now include an Authentication Mode dropdown list, which allows you to choose the Active Directory authentication mode you would like to use. This enhancement supports Active Directory Service Principal authentication for the Snap Pack.

December 2022

431patches19410


 


 

Stable


Latest

Fixed the Azure SQL - Execute Snap using the Azure SQL Active Directory Account that failed with SQL operation failed errors in environments using federated authentication.

December 2022431patches19263 Latest

The Azure Synapse SQL Insert Snap no longer includes the Preserve case-sensitivity checkbox because the database is case-insensitive. The database stores the data regardless of whether the columns in the target table and the input data are in mixed, lower, or upper case.

November 2022main18944 Stable

The Azure SQL Snap Pack uses the 11.2x driver by default. If you specify any specific driver, ensure that you provide a version higher than 9.1 that is compatible with Microsoft Authentication Library for Java, as this Snap Pack uses the MSAL4J. Otherwise, you may run into issues.

August 2022

main17386 Stable

The Azure Active Directory Search Entries Snap includes a Display Properties field where you can specify the properties to display in the output for the user or group. For the Snap to correctly return the attributes in the output, you must specify the attribute name as described in User profile attributes in Azure Active Directory B2C. Learn more about Properties for a user and Properties for a group.

4.29 Patch429patches16545 Latest

Fixed an issue with the Azure SQL-Stored Procedure Snap where the Snap failed with an Invalid value type error when the stored procedure contained an NCHAR data type.

4.29 Patch429patches16460 Latest

Fixed an issue with Azure SQL Bulk Load Snap where the Snaplex exited due to insufficient memory when a large number of rows are loaded into the target table and the input data contained a null value for a non-nullable column.

4.29main15993 Stable

Enhanced the Azure SQL Account and Azure SQL Active Directory Account with the Disable connection pooling checkbox that allows you to manage session state sharing.

4.28 Patch428patches15164 Latest

Fixed an issue with the Azure SQL - Update Snap where the Snap failed with an Incorrect syntax error when a column in a table is of NVARCHAR, NCHAR, or NTEXT data type and this column is part of another NVARCHAR, NCHAR, or NTEXT data type column name and the update condition is specified as an expression.

4.28 Patch428patches15114 LatestFixed an issue with the Azure SQL - Bulk Load Snap where the decimal values lost precision when they were inserted into the database.
4.28main14627 StableUpgraded with the latest SnapLogic Platform release.
4.27main12833 Stable

Enhanced the Azure SQL - Execute Snap to invoke stored procedures.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25main9554
 
StableUpgraded with the latest SnapLogic Platform release.
4.24main8556
Stable

Enhances the Azure SQL - Stored Procedure to accept parameters from input documents by column keys. If the values are empty, the parameters are populated based on the column keys for easier mapping in the upstream Mapper Snap.

4.23main7430
 
Stable

Enhances the Azure SQL - Bulk Extract Snap by adding a new check box Enable UTF-8 encoding to support UTF-8 encoded characters. This check box allows the Snap to update the BCP command to read the special characters.

4.22 Patch 422Patches6751 Latest

Enhances the Azure SQL - Bulk Extract Snap by adding a new check box Enable UTF-8 encoding to support UTF-8 encoded characters. Selected by default, this check box allows the Snap to update the BCP command to read these special characters.

4.22main6403
 
StableUpgraded with the latest SnapLogic Platform release.
4.21 Patch 421patches6272 Latest

Fixes the issue where Snowflake SCD2 Snap generates two output documents despite no changes to Cause-historization fields with DATE, TIME and TIMESTAMP Snowflake data types, and with Ignore unchanged rows field selected.

4.21 Patch 421patches6144 Latest

Fixes the following issues with DB Snaps:

  • The connection thread waits indefinitely causing the subsequent connection requests to become unresponsive.
  • Connection leaks occur during Pipeline execution.
4.21 Patch421patches5864 Latest

Adds support for UTF_8 characters with BCP (bulk copy program) command to the Azure SQL Bulk Extract Snap.

4.21 PatchMULTIPLE8841 Latest

Fixes the connection issue in Database Snaps by detecting and closing open connections after the Snap execution ends. 

4.21snapsmrc542

 

StableUpgraded with the latest SnapLogic Platform release.
4.20snapsmrc535
 
StableUpgraded with the latest SnapLogic Platform release.
4.19 Patch db/azuresql8403 Latest

Fixes an issue with the Azure SQL - Update Snap wherein the Snap is unable to perform operations when:

  • An expression is used in the Update condition property.
  • Input data contain the character '?'.
4.19snaprsmrc528
 
Stable

Enhanced the error handling in PolyBase Bulk Load Snap when writing to a data warehouse. The Snap writes a new blob in the Azure container. This new blob highlights the first invalid row that caused the bulk load operation to fail.

4.18snapsmrc523
 
StableUpgraded with the latest SnapLogic Platform release.
4.17ALL7402
 
Latest

Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17snapsmrc515
 
Latest
  • Fixes an issue with the Azure SQL Execute Snap wherein the Snap would send the input document to the output view even if the Pass through field is not selected in the Snap configuration. With this fix, the Snap sends the input document to the output view, under the key original, only if you select the Pass through field. 
  • Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.
4.16 Patch db/azuresql7179 Latest

Fixes an issue with the Azure SQL Bulk Extract Snap wherein the Snap fails to process all the metadata information of the input table and schema.

4.16snapsmrc508
 
StableUpgraded with the latest SnapLogic Platform release.
4.15 Patch db/azuresql6327 Latest

Replaced Max idle time and Idle connection test period properties with Max life time and Idle Timeout properties respectively, in the Account configuration. The new properties fix the connection release issues that were occurring due to default/restricted DB Account settings.

4.15snapsmrc500
 
StableUpgraded with the latest SnapLogic Platform release.
4.14snapsmrc490
 
StableUpgraded with the latest SnapLogic Platform release.
4.13

snapsmrc486

 
StableUpgraded with the latest SnapLogic Platform release.
4.12

snapsmrc480

 
StableUpgraded with the latest SnapLogic Platform release.
4.11 Patchazuresql4631 Latest

Fixes an issue with the Azure Polybase Bulk Load Snap that failed with "Parse error" when there was no input.

4.11 Patch db/azuresql4326 Latest
  • Fixes an issue with the Azure SQL Polybase Bulk Load Snap, that allowed the Snap to load data into a table with identity columns for Azure SQL Data Warehouse instance.
  • Fixes encoding issue when using a Windows plex, and added a "Encoding" Snap property that allows user to choose input data's encoding from UTF-8 and UTF-16.
4.11snapsmrc465
 
StableUpgraded with the latest SnapLogic Platform release.
4.10

snapsmrc414

 
Stable
  • Renamed the Azure SQL Bulk Load Snap to Polybase Bulk Load as it supports Azure SQL DW and SQL Server (starting with 2016).
  • The new Snap, Azure SQL Bulk Load is developed has been developed to carry out the bulk load function extensively for Azure SQL DB. (The old Azure Bulk Load has been renamed to Polybase BulkLoad  which works for on-premise SQL Server and Azure SQL Data Warehouse with polybase functionality).
  • Azure SQL Bulk Load, Table List, Execute, Stored ProcedureAzure SQL - Table Listand Update Snaps are released in this release.
  • Added Auto commit property to the Select and Execute Snaps at the Snap level to support overriding of the Auto commit property at the Account level.
  • Added the below accounts:
    • Azure SQL Active Directory Account

    • Azure SQL Active Directory Dynamic Account

4.9 Patch azuresql3078