Azure SQL - Bulk Load

In this article

 

Overview

You can use this Snap to perform a bulk load operation from the input view document stream to the target table by using SQLServerBulkCopy API. It uses a memory buffer to send records to the target table instead of a temporary CSV file. The Batch size and Bulk copy timeout values can be used to tune the performance and memory used.

This Snap supports a dot (.) as a separator for milliseconds in date and time formats, which helps in handling date and time data types and functions when working with SQL Server (Transact-SQL). Learn more: https://learn.microsoft.com/en-us/sql/t-sql/functions/date-and-time-data-types-and-functions-transact-sql?view=sql-server-ver16#date-and-time-data-types

azure-sql-bulk-load-overview.png

 

Snap Type

The Azure SQL Bulk Load Snap is a Write-type Snap that performs a bulk load operation.

Prerequisites

The Azure SQL database account requires the SQL Server JDBC driver. Version 4.1 and older do not support the SQLServerBulkCopy API.

Support for Ultra Pipelines

Does not support Ultra Pipelines

Limitation

  • Microsoft does not support the DateTime data type when writing to Azure Data Warehouse. As a workaround, change the data type from DateTime to varchar when writing to Azure Data Warehouse. Note that if the table does not exist in the database, enabling the Create table if not present property automatically converts all DateTime fields to varchar fields. For more information on this known issue, see Known Limitations for the batch insert operation

  • You cannot modify certain columns in Azure SQL because they might either be computed columns or the result of a UNION operator, such as "InventoryValue."

  • The Azure Bulk Load Snap supports the money and smallmoney data types only within the following specific ranges:

Data Type

Range

Data Type

Range

money

-922,337,203,685,477.5808 to 922,337,203,685,477.5807

smallmoney

-214,748.3648 to 214,748.3647

When you insert a value beyond the specific range, the Snap does not fail but inserts incorrect values. For instance, see the table below for the incorrect inserted values against the specific value:

Data Type

 When you update

Bulk Load is updated to

smallmoney

-214758.3648

-214738.3649

money

922337203685487.5808

922337203685467.5808



smallmoney

214758.3647

214738.3648

money

922337203685487.5807

-922337203685467.5809

This issue is caused by MSSQL-JDBC dependency from Microsoft®, that converts small-money/money values to int/long values. SnapLogic® has reported a bug in Microsoft®MSSQL-JDBC GitHub repository. You can track the issue here.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

 

  • Min: 0

  • Max: 2

Mapper

  • All input documents must contain map data with a key-value pair. All keys must be spelled the same as in columns in the target table (case-sensitive). Input documents must not contain any data other than data to be bulk-loaded.

  • All input documents must have the same keys. If the number of keys is smaller than the number of columns in the target table, the Snap fills missing keys with null values. If key names in the input document is different from column names, Mapper Snap can be used to map key names to column names.

Output

Document

  • Min: 0

  • Max: 1

JSON Formatter

  • The input document stream is converted to multiple batches, which are bulk-loaded to the target table by using SQLServerBulkCopy API. The Snap converts the input data values according to the corresponding SQL Server column data types to Java class objects which SQLServerBulkCopy accepts.

  • Outputs the bulk-load result in a key-value pair, for example:

{"status" : "34687 records loaded"}
If no record is loaded because of errors, no output document is produced.

When accessing a column name that contains specific characters as supported by Azure SQL, like $, #, @, etc., such field names should be enclosed in the square brackets.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter while running the Pipeline by choosing one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the rest of the records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk (*): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon (): Indicates whether the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon (): Indicates that you can add fields in the field set.

  • Remove icon (): Indicates that you can remove fields from the field set.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label*

String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Default valueAzure SQL - Bulk Load
ExampleBulk_Load

Schema Name

 

String/Expression/Suggestion

The database schema name. If it is not defined, the suggestion for the table name will retrieve all the table names of all schemas. The property is suggestible and will retrieve associated database schemas values.

Default value: None
Example: SYS

Table Name*

 

String/Expression/Suggestion

Specify the target table to load the incoming data into.

Default value: None
Example: users

Create table if not present

Checkbox

Select this checkbox to create a target table in case it does not exist; otherwise, the Snap displays a "table not found" error.

In the absence of a second input view (the schema/metadata document), the Snap creates a table based on the data types of the columns generated from the first row of the input document (first input view).

Default value: Deselected

Batch size

Integer/Expression

Sets the number of rows in each batch.

Default value: 10000
Example: 1000

Bulk copy timeout (sec)

 

Integer/Expression

Sets the number of seconds for each batch operation to complete before it times out.

Default value: 60

Advanced properties

Use this field set to configure advanced properties.

Properties

Dropdown list

Choose an option for SQLServerBulkCopy. The available options are:

  • Check constraints - Sets whether constraints are to be checked while data is being inserted or not.

  • Fire triggers - Sets whether the server should be set to fire insert triggers for rows being inserted into the database.

  • Keep identity - Sets whether or not to preserve any source identity values.

  • Keep nulls - Sets whether to preserve null values in the destination table regardless of the settings for default values, or if they should be replaced by default values (where applicable).

  • Table lock - Sets whether SQLServerBulkCopy should obtain a bulk update lock for the duration of the bulk copy operation.

  • Use internal transaction - Sets whether each batch of the bulk-copy operation will occur within a transaction or not.

Learn more: Microsoft document

Values

String/Expression

The following are the default values for the Properties:

  • Check constraints= false

  • Keep identity = false

  • Keep nulls = false

  • Table lock= false

  • Use internal transaction = false

Dropdown list

 

 

 

 

Troubleshooting

When facing a com.microsoft.sqlserver.jdbc.SQLServerException error, refer to Errors 4000 - 4999 for more details on the error code.

Examples

Basic Use Case

In this Pipeline, the Azure SQL Bulk Load Snap loads the data from the input stream. The data is bulk loaded into the table "dbo"."datatypetest".

The successful execution of the pipeline displays the below output preview with the status of records loaded:

 

Typical Snap Configurations


The key configurations for the Snap are:

  • Without Expression: Directly passing the values via the CSV Generator and the Mapper Snaps. 

In the below pipeline, the values are passed via the upstream for the Azure Bulk Load Snap to update the table, "dbo"."@prasanna1" on the Azure.

 

The Azure Bulk Load Snap Loads the data into the table and the Azure Execute Snap reads the table contents respectively:

 

  • With Expressions

    • Pipeline Parameter: Pipeline parameter set to pass the required table name to the Azure SQL Bulk Load Snap. 

In the below pipeline:

 

  1. The JSON Generator Snap passes the values to be added to a table intcol.

    1. The Mapper Snap passes the values to the table intcol  and "dbo"."inttable" to the target table, tablename.

    2. The Pipeline parameters are set with values  for Tablename, Batchsize and Timeout.

    3. The Bulk Load Snap loads the records into the _tablename, with _batchsize (as 1) and _timeout (as 60) using the pipeline param values.   

    4. The Execute Snap reads the data from the table, inttable. The output preview displays the three records as added via the JSON Generator Snap.

 

 

Advanced Use Case

The following describes a Pipeline, with a broader business logic involving multiple ETL transformations, that shows how typically in an enterprise environment, Azure SQL Bulk Load functionality is used. The Pipeline download is available below.

In the below Pipeline, the records from a table on the SQL Server are loaded into a table on the Azure SQL. The Azure SQL Execute Snap reads the records the loaded records on the Azure SQL table.

  1. Extract: The SQL Server Select reads the records form a table on SQL Server.  

  2. Transform: The Mapper Snap maps the metadata from the input schema (SQL Server) to the output schema (Azure SQL)

  3. Load: Azure SQL Bulk Load Snap loads the records into the Azure SQL table. 

  4. Read: The Azure Execute Snap reads the loaded records on the Azure SQL table.  

A similar enterprise scenario where the records from the Oracle server are loaded into the Azure SQL Server. The loaded records are transformed to JSON and written to a file. The Azure SQL Execute Snap reads the records from the table on the Azure SQL.
The pipeline download is available below.

   

  1. Extract: The Oracle Select reads the records form a table on the Oracle Server.  

  2. Transform: The JSON Formatter Snap transforms the output records in to a JSON format and writes them to a file using the File Writer Snap.

  3. Load: Azure SQL Bulk Load Snap loads the records into the Azure SQL table. 

  4. Read: The Azure Execute Snap reads the loaded records on the Azure SQL table.  

 

Downloads

Related links: