BigQuery Bulk Load (Streaming)

On this Page

Snap Type:Write
Description:

This Snap executes a bulk load of the incoming documents into the Google BigQuery database directly without any intermediary such as Google Cloud Storage. The data is loaded in batches of configurable size. Once a batch is successfully loaded into the target table, the next batch is loaded. The batch load is also subject to a timeout. If the timeout is reached before all the documents in the batch are loaded, the batch is flushed and the next batch's bulk load is started. If the bulk load fails, then the Snap will try to load the batch again after a time delay. Retry occurs for a set number of times. The batch timeout limit, size of a batch, the time delay between retries, and the number of retries are all configurable.

ETL Transformations & Data Flow

Data from the incoming documents is streamed in batches to the destination table. The user can configure Project ID, Dataset ID, and Table ID of the destination. In addition, if the table into which the data is to be loaded does not exist, the Snap can create the table using the configured Table ID.

Input & Output

  • InputAny Snap that can pass a document output view, such as Structure or JSON Generator. Pipeline parameters can also be passed for project ID, dataset ID, and table ID, and so on.
  • Output: The output is in document view format. The data from the incoming document is loaded to the destination table is the output from this Snap. It gives the load statistics after the operation is completed.

Modes

Snaps in Google BigQuery Snap Pack

  • Write datetime values to the database tables, always in UTC format.

  • Convert any non-UTC values in the incoming data to UTC before writing them.

  • Consider datetime values without the time zone mentioned, as in UTC.

So, ensure that you include the time zone in all the datetime values that you load into Google BigQuery tables using this Snap.

For example: "2020-08-29T18:38:07.370 America/Los_Angeles", “2020-09-11T10:05:14.000-07:00", “2020-09-11T17:05:14.000Z”

Prerequisites:

Write access to the Google BigQuery Account is required.

Limitations and Known Issues

The batch size can be a maximum of 10,000.

Known Issue

Copying data by creating a table with the same name in Google BigQuery immediately after deleting it, may not insert the rows as expected. This behavior is due to the way the tables are cached and the internal table ID is propagated throughout the system. 

Workaround

We recommend you avoid rewriting in Google BigQuery and suggest the following workarounds. You can choose to use them individually or in unison, to suit your requirement.

  • Truncate the existing table instead of deleting it.

  • Add some randomly generated prefix to the table name each time you create the table (new or unique table name every time).

Behavior ChangeIn the 4.27 (427patches13615) release and later, pipelines that truncated (or deleted and re-created) a table and then used the BigQuery Write or BigQuery Bulk Load (Streaming) Snaps were allowed to retry the load operation. This lead to potential data loss because Google BigQuery uses an eventually consistent backend where one server might respond that it’s okay to write, but other servers still truncate the table.

In the 4.32 (432patches20298) release and later, we disabled retries on truncated tables in these two Snaps to prevent a potential loss of data. We recommend that you use the BigQuery Bulk Load (Cloud Storage) Snap instead.
Configurations:

Account & Access

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. See Google BigQuery Account for information on setting up this type of account.

Views

InputThis Snap has exactly one document input view.
OutputThis Snap has at most one document output view.
ErrorThis Snap has at most one document error view and produces zero or more documents in the view.
Troubleshooting:[None]

Settings

Label


Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Project ID

Required. Project ID of the project billed for the query. This is a suggestible field and can be populated based on the Account settings.

Example: case1234

Default value: [None]

Dataset ID

Required. Dataset ID of the destination table. This is a suggestible field and all the datasets in the specified project will be populated.

Example: babynames

Default value: [None]

Table ID

Required. Table ID of the destination table. This is a suggestible field and all the tables in the datasets will be listed.

Example: customerdata1

Default value: [None]

Create table if not present

Specifies that the table should be created if not already present in the database.

Default value: Not selected

Batch size

The number of records batched per request. If the input has 10,000 records and the batch size is set to 100, the total number of requests would be 100.

Default value: 1000

Batch timeout (milliseconds)

Time in milliseconds to elapse following which the batch if not empty will be processed even though it might be lesser than the given batch size.

Default value: 2000

Batch timeout value must be set with care. When this limit is reached, the batch will be flushed irrespective of whether all the records in the batch were loaded.

Batch retry count

The number of times the server should try to load a failed batch.

Default value: 0

Batch retry delay (milliseconds)

The time delay between each retry.

Default value: 500

Snap execution

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Writing numeric values into Google BigQuery tables

Google BigQuery tables support columns with a NUMERIC data type to allow storing big decimal numbers (up to 38 digits with nine decimal places). But Snaps in Google BigQuery Snap Pack that load data into tables cannot create numeric columns. When the Create table if not present check box is selected, the Snaps create the required table schema, but map big decimals to a FLOAT64 column. So, to store the data into numeric columns using these Snaps, we recommend the following actions:

  • Create the required schema, beforehand, with numeric columns in Google BigQuery.
  • Pass the number as a string.

The Google API converts this string into a number with full precision and saves it in the numeric column.

Example

Value Passed Through SnapValue Stored in BigQueryRemarks
"12345678901234567890123456789.123456789"12345678901234567890123456789.123456789As per this issue logged in Google Issue Tracker, if you send the values as strings, the values are never converted to floating-point form, so this works as expected.
12345678901234567890123456789.123456789123456789012345678000000000000Big decimal values sent as non-string values lose precision.

Examples

Basic Use Case


The following pipeline describes how the Snap functions as a standalone Snap in a pipeline:

In this example, the BigQuery Bulk Load (Streaming) Snap is used to load records from the table snapteam1.postgretimetypes into the target table Atable1 in the babynames dataset within the case16370 project. The input records are provided through the PostgreSQL Select Snap. The input records from the upstream PostgreSQL Select Snap are shown below:

The BigQuery Bulk Load (Streaming) Snap's configuration is as shown below:


The following is a preview of the output from the Snap:


The exported pipeline is available in the Downloads section below.

Typical Snap Configurations

The key configuration of the Snap lies in how the values are passed. There are two ways of passing values:

  • Without expressions: The values are passed in the Snap directly.

  • With expressions
    • Using Pipeline parameters: Values are passed in the Snap as pipeline parameters. Select the parameters to be applied by enabling the corresponding checkbox under the Capture column.


 

Advanced Use Case

The following describes a pipeline with a broader business logic involving multiple ETL transformations. It shows how, in an enterprise environment, Bulk Load functionality can typically be used. 

This pipeline moves records from a SQL Server instance to the Google BigQuery instance, which is on-premise data migration to the cloud environment.

  1. Extract: Records are extracted from the SQL Server instance by the SQL Server - Select Snap.
  2. Load: The extracted records from the SQL Server Snap are loaded by the BigQuery Bulk Load (Streaming) Snap into the destination table.
  3. Extract: The BigQuery Execute Snap extracts the records inserted into the destination table by the BigQuery Bulk Load (Streaming) Snap.

The exported pipeline is available in the Downloads section below.

Downloads

Important steps to successfully reuse Pipelines

  1. Download and import the pipeline into the SnapLogic application.
  2. Configure Snap accounts as applicable.
  3. Provide pipeline parameters as applicable.

Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateType  Updates

August 2024

main27765

 

Stable

Updated and certified against the current Snaplogic Platform release.

May 2024main26341 StableUpdated and certified against the current SnapLogic Platform release.
February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023main23721 StableUpdated and certified against the current SnapLogic Platform release.

August 2023

main22460

 


Stable

Updated and certified against the current SnapLogic Platform release.

May 2023433patches22057  Latest

Introduced the Google BigQuery Upsert (Streaming) Snap, which enables you to perform bulk update/insert operations into a BigQuery table from existing tables or any input data stream.

May 2023433patches21955 LatestFixed an issue with the GBQ-Google Service Account that caused an input stream to remain open.
May 2023main21015 StableUpdated and certified to be compatible with the August SnapLogic Platform release.
February 2023432patches20962 LatestFixed an intermittent null pointer exception that occurred in the BigQuery Write Snap.
February 2023

432patches20298

 Latest
February 2023432patches19840 Latest

Fixed an issue with the Google BigQuery Bulk Load (Streaming) Snap that caused the Table not found message to display even when Create table if not present was selected.

February 2023main19844 Stable
November 2022431patches19301 LatestThe Google BigQuery Bulk Load (Streaming) Snap works as expected, with no active Timer threads remaining when the Pipeline execution fails. 
November 2022main18944 StableUpgraded with the latest SnapLogic Platform release.
August 2022main17386 Stable
4.29main15993 Stable

Upgraded with the latest SnapLogic Platform release.

4.28 Patch428patches15459 Latest
  • Fixed an issue with Google BigQuery Execute Snap, where the Snap displayed 404 Job not found error when calling a procedure.
  • Fixed an issue with the Google BigQuery Bulk Load (Cloud Storage) Snap where the Snap failed, because the access token expired when it had to wait longer to execute. With this fix, the Snap is reloaded to get refreshed access token.

4.28 Patch428patches14743 Latest
  • Fixed an issue with the Google BigQuery Execute Snap, where the Snap displayed an error when the input data contained a table having the record type column and its value was null.
  • Fixed an issue with the Google BigQuery Write Snap, when the input data contained complex data type columns (such as nested fields) and Create table if not present checkbox was selected.

4.28main14627 StableUpgraded with the latest SnapLogic Platform release.
4.27 Patch427patches13752 Latest

Upgraded Google BigQuery driver to 1.119.0 version to support time partition intervals by MONTH and YEAR.

4.27 Patch427patches13615 LatestFixed the table truncate 404 error with the Google BigQuery Load (Streaming) Snap by supporting the retry functionality. The Snap now waits in case of an error and retries before loading the data.
4.27 Patch427patches12691 Latest

Fixed an issue with the Google BigQuery Bulk Load (Cloud Storage) Snap, where the Snap failed with an exception for big query tables. The CreateDisposition is now set conditionally on the basis of the setting in the Create table if not present checkbox.

4.27

main12833

Stable

Enhanced the Google BigQuery Bulk Load (Cloud Storage) Snap with the following batching and retry properties to process input records:

  • Batching: Processes the input records in batches.

  • Batch Size: The number of records batched per request.

  • Batch Timeout (milliseconds): Time in milliseconds to elapse following which the batch, if not empty, will be processed even though it might be lesser than the given batch size.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25main9554
 
StableUpgraded with the latest SnapLogic Platform release.
4.24main8556
StableUpgraded with the latest SnapLogic Platform release.
4.23main7430
 
Stable
4.22main6403
 
StableUpgraded with the latest SnapLogic Platform release.
4.21snapsmrc542

 

StableUpgraded with the latest SnapLogic Platform release.
4.20 Patch google/bigquery8773 Latest

Fixed the NPE issue with stored procedures and DROP TABLE queries in the Google BigQuery Execute Snap.

4.20snapsmrc535
 
StableUpgraded with the latest SnapLogic Platform release.
4.19snaprsmrc528
 
StableUpgraded with the latest SnapLogic Platform release.
4.18snapsmrc523
 
StableUpgraded with the latest SnapLogic Platform release.
4.17ALL7402
 
Latest

Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17snapsmrc515
 
Stable

Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.

4.16snapsmrc508
 
StableUpgraded with the latest SnapLogic Platform release.
4.15snapsmrc500
 
StableUpgraded with the latest SnapLogic Platform release.
4.14snapsmrc490
 
StableUpgraded with the latest SnapLogic Platform release.
4.13

snapsmrc486

 
StableUpgraded with the latest SnapLogic Platform release.
4.12

snapsmrc480

 
Stable

Added a new property Schema auto detect in the Google BigQuery Bulk Load (Cloud Storage) Snap to support CSV and JSON files where one or more columns in the source file may not contain any values. 

4.11snapsmrc465
 
Stable
  • Added new Snap: Google BigQuery Bulk Load (Cloud Storage)
  • Added new Snap: Google BigQuery Bulk Load (Streaming).
  • Updated Google Big Query Write Snap with a new Create table if not present property.
4.10 Patch google/bigquery4046 Latest

Addressed an issue when authenticating with Dynamic OAuth accounts.

4.10

snapsmrc414

 
StableUpgraded with the latest SnapLogic Platform release.
4.9snapsmrc405
 
StableUpgraded with the latest SnapLogic Platform release.
4.8 Patchbigquery2952 Latest

Supports refreshing OAuth access tokens during long-running pipeline executions. Fixed an issue with writing small batch sizes and when querying empty dataset tables.

4.8.0 Patchbigquery2813 Latest

Reload OAuth account from Platform when the access token expires during pipeline execution.

4.8

snapsmrc398

 
StableUpgraded with the latest SnapLogic Platform release.
4.7

snapsmrc382

 
StableUpgraded with the latest SnapLogic Platform release.
4.6snapsmrc362
 
StableUpgraded with the latest SnapLogic Platform release.
4.5.1

snapsmrc344

 
StableUpgraded with the latest SnapLogic Platform release.
4.4.1NA StableUpgraded with the latest SnapLogic Platform release.
4.4NA StableUpgraded with the latest SnapLogic Platform release.
4.3.2NA Stable
  • Resolved the following issues with the Google BigQuery Execute Snap:
    • throwing binary data in stacktrace and two error messages.
    • improve error handling for suggestions
    • improve error handling on bad queries
    • suggestion bubble missing for Destination table ID
  • Resolved an issue with Auto refresh token not working in Google BigQuery account.