BigQuery Upsert (Streaming)

 

In this article

Overview

This Snap enables you to perform bulk update or insert (upsert) operations into a BigQuery table from existing tables or any input data stream.

The upsert operation updates existing rows if the specified value exists in the target table and inserts a new row if the specified value does not exist in the target table.

Overview of settings with example values

Snap Type

This Snap is a Write-type Snap that performs a bulk upsert operation.

Prerequisites

Write access for the Google BigQuery Account is required.

Support for Ultra Pipelines

Does not work in Ultra Pipelines.

Limitations and Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

 

  • Min: 1

  • Max: 1

  • CSV Parser

  • JSON Parser

  • JSON Generator

This Snap has exactly one document input view. 

Input can come from any Snap that can pass a document to the output view, such as Structure or JSON Generator. Pipeline parameters can also be passed for project ID, dataset ID, and table ID, and so on.

Output

Document

  • Min: 0

  • Max: 1

  • Mapper

  • Google BigQuery Execute

The output is in document view format. The data from the incoming document that is loaded to the destination table is the output from this Snap. It gives the load statistics after the operation is completed

The output view contains information about the bulk load details in the temporary table to better understand the flow. This also helps with error handling.

The output view also lists the number of rows that were updated, modified, or inserted in the target table.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Handling Errors with an Error Pipeline.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon (): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( ): Indicates that you can add fields in the field set.

  • Remove icon ( ): Indicates that you can remove fields from the field set.

  • Upload icon ( ): Indicates that you can upload files.

Field Name

Field Type

Description

Field Name

Field Type

Description

Label

Default Value: BigQuery Bulk Upsert (Streaming)
Example: GBQ Load Employee Tables

 

String

Specify the name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

 

Project ID

Default Value: N/A
Example: test-project-12345

 

 

String/Expression/Suggestion

Specify the project ID in which the dataset resides.

Dataset ID

 

Default Value: N/A

Example: dataset-12345

 

String/Expression/Suggestion

Specify the dataset ID of the destination.

Table ID

 

Default Value: N/A
Example: table-12345

String/Expression/Suggestion

Specify the table ID of the table you are creating.

Batch size

 

Default value: 1000

String

The number of records batched per request. If the input has 10,000 records and the batch size is set to 100, the total number of requests would be 100.

 

Batch timeout (milliseconds)

Default value: 2000

String

Time in milliseconds after which the batch will be processed (even though it might be less than the specified batch size).

Batch timeout value must be set with care. When this limit is reached, the batch will be flushed whether or not all the records in the batch were loaded.

Batch retry count

Default value: 0

String

The number of times the server should try to load a failed batch.

 

Batch retry delay (milliseconds)

Default value: 500

String

The time delay between each retry.

 

Snap Execution

 

Default Value: Validate & Execute
Example: Execute only

 

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Troubleshooting

Error

Reason

Resolution

Error

Reason

Resolution

Key column name is required.

No key column(s) specified for checking for existing entries.

Please enter one or more key column names.

Key column name is not present in target table.

Incorrect key column(s) specified for checking for existing entries.

Please select one or more key column names from the suggestion box.

All columns in target table are key columns.

The merge will fail as all columns in the target table are key columns.

Please select one or more (but not all) key column names from the suggestion box.

Examples

Prerequisite: Write access for the Google BigQuery Account is required.

Upsert customer data from Salesforce to a Google BigQuery table

This example demonstrates how to update or insert (upsert) records in a Google BigQuery table.

Pipeline showing the Snaps in this example

First, we configure the Salesforce Read Snap with the required details to read customer account data from Salesforce.

In this example, we selected Output Fields for Total, Id, and Name.

Upon validation, the Snap prepares the output to pass to the BigQuery Bulk Upsert Snap.

Next, we configure the BigQuery Bulk Upsert Snap to use unique identifiers to update the existing records.

To upsert data based on the Id and Name key columns, we enter Id and Name in the Key column fields.

 

Upon execution, this Snap updates or inserts new records into the Google BigQuery table.

The output shows that 5 records were updated successfully.


In this example, we updated the Total for each record (based on the unique identifiers Id and Name selected under Key columns).

The data is updated in the Google BigQuery table, as shown in the BigQuery console.

 

Download this pipeline. 

Downloads

  1. Download and import the Pipeline into SnapLogic.

  2. Configure Snap accounts, as applicable.

  3. Provide Pipeline parameters, as applicable.

 

  File Modified

File GBQ Upsert example.slp

Aug 11, 2023 by Cindy Hall (Unlicensed)


Related Content

Â