Skip to end of banner
Go to start of banner

WIP: PostgreSQL - Vector Search

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

In this article

Overview

You can use this Snap to perform advanced vector-based queries using the SELECT statement.

postgresql-vector-search-overview.png

Snap Type

The PostgreSQL - Vector Search Snap is a Read-type Snap.

Prerequisites

A valid account with the required permissions.

Support for Ultra Pipelines

Works in Ultra Pipelines

Limitations and Known Issues

None.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

Document

 

 

  • Min: 1

  • Max: 1

  • Mapper

  • Copy

  • Requires an input vector with the same dimension as the selected vector column.

  • Requires a vector input with an array of float/int data types.

Output

Document

 

  • Min: 1

  • Max: 1

  • Mapper

  • Copy

For each input document, all results are grouped into a single output document.

Error

Error handling is a generic way to handle errors without losing data or failing the Snap execution. You can handle the errors that the Snap might encounter when running the pipeline by choosing one of the following options from the When errors occur list under the Views tab:

  • Stop Pipeline Execution: Stops the current pipeline execution if the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

Learn more about Error handling in Pipelines.

Snap Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon ((blue star)): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ((blue star) ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( (blue star) ): Indicates that you can add fields in the fieldset.

  • Remove icon ( (blue star)): Indicates that you can remove fields from the fieldset.

  • Upload icon ((blue star) ): Indicates that you can upload files.

Field Name

Field Type

Description

Label*

 

Default ValuePostgreSQL Vector Search
Example: PostgreSQL VS

String

Specify a name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

 

Schema name

 

Default Value: N/A
Example: VECTOR_DEMO

String/Expression/Suggestion

Specify the schema name for searching for a vector.

Table Name*

 

Default Value: N/A 
Example: VECTOR_DEMO.BOOKS

String/Expression/Suggestion

Specify the table name for searching for a vector.

Vector Column*

 

Default Value: N/A
Example: INT_VEC

String/Expression/Suggestion

Specify the vector column name to search.

Where Clause

 

Default Value: N/A
Example: ID > '001i0000007FVjpAAG'

String/Expression/Suggestion

Specify the where clause to use in the vector search query statement.

Because of the limitation of SQL standard, you cannot use the _SL_DISTANCE column in the where clause.

Limit Rows

 

Default Value4
Example: 3
Min Value: 1

Integer/Expression

Specify the number of rows the query must return.

 

Distance Function*

 

Default ValueL2
Example: COSINE

Dropdown List

Choose the similarity function to compare vectors. The available options are:

  • L2: (Euclidean Distance) Measures the straight-line distance between two points. It’s useful when you want to calculate the "as-the-crow-flies" distance.

  • L1: (Manhattan Distance) Measures the distance between two points along the axes at right angles. It’s useful in grid-based systems, like city streets.

  • COSINE: Measures the cosine of the angle between two vectors. It is commonly used in high-dimensional positive spaces to assess similarity irrespective of magnitude.

  • Inner Product: (Dot Product) Measures the similarity between two vectors. It’s useful in various applications, such as calculating the angle between vectors or finding projections.

Learn more about the Vector Similarity Functions.

Include vector values

 

Default Value: Deselected

Checkbox/Expression

Select this checkbox to include vector values in the response.

This field does not support input schema from the upstream Snaps.

Include scores

 

Default ValueSelected

Checkbox/Expression

Select this checkbox to include similarity scores in the response.

  • This field does not support input schema from the upstream Snaps.

  • When you select this checkbox, the output preview displays _SL_DISTANCE, which is the distance between the input vector and vectors in the database.

Ignore empty result

Default Value: Deselected

Checkbox

Select this checkbox to ignore the empty results and not write a document to the output view when a search operation does not return any results.

Number of retries

 

Default Value0
Example: 3

Integer/Expression

Specify the maximum number of retry attempts the Snap must make in case of network failure.

Retry interval (seconds)

 

Default Value0
Example: 3

Integer/Expression

Specify the time interval between two successive retry requests.

Snap execution

Default ValueValidate & Execute
Example: Execute only

Dropdown list

Select one of the following three modes in which the Snap executes:

  • Validate & Execute: Performs limited execution of the Snap and generates a data preview during pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Example

Find Similar Vectors Using Cosine Distance

The example pipeline below demonstrates how to use the PostgreSQL - Vector Search Snap to find similar vectors using the Cosine distance function.

postgresql-example-image.png

Step 1: Configure the Mapper Snap with a vector to find similar vectors in the PostgreSQL database.

mapper-config.png

Step 2: Configure the PostgreSQL - Vector Search Snap as shown below:

postgresql-vector-search-configuration.png

Step 3: Validate the pipeline. On validation, the Snap fetches similar vectors based on the following criteria:

  • The match vectors have cosine similarity distances, indicating their similarity to the input vector.

  • The cosine similarity distances measure how close the match vectors are to the input vector, with values closer to 0 indicating higher similarity.

  • The first match has the highest similarity (lowest distance), followed by the 2nd match.

postgresql-vector-search-output.png

Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateType  Updates

August 2024

main27765

 

Stable

Updated and certified against the current Snaplogic Platform release.

May 2024437patches27531 Latest
May 2024437patches27416 Latest
  • Fixed an issue with the PostgreSQL—Insert Snap that exposed sensitive information in the error message when the batch operation was not successful.
  • Fixed an issue with the PostgreSQL—Bulk Load Snap that caused incorrect or invalid binary data to be inserted when the column was of Binary type.

May 2024437patches27172 Latest
  • Added the PostgreSQL - Vector Search Snap to enable advanced vector-based queries using the SELECT statement.
  • Added vector data type support for PostgreSQL - Insert and PostgreSQL - Update Snaps.

  • Upgraded the PostgreSQL JDBC driver from v9.4.1207 to v42.7.2 (Java 8). This upgrade will be part of the GA release on August 14, 2024 (Stable release). As part of this upgrade, the 42.7.2 JDBC driver is bundled with the PostgreSQL Snap Pack as the default JDBC driver. Your existing PostgreSQL pipelines that use the default driver (bundled with the PostgreSQL Snap Pack) might break.

    Behavior change:

    This driver upgrade has resulted in specific behavior changes in errors, status codes, and success and failure messages. Learn more.

May 2024437patches26634 LatestFixed an issue with PostgreSQL - Execute Snap that produced logs causing node crashes.
May 20244postgresupgrade26570 -

Upgraded the PostgreSQL JDBC driver from v9.4.1207 to v42.7.2 (Java 8). This upgrade will be part of the latest release on July 10,2024 and Stable release (GA) on August 14, 2024. As part of this upgrade, the 42.7.2 JDBC driver is bundled with the PostgreSQL Snap Pack as the default JDBC driver. Your existing PostgreSQL pipelines that use the default driver (bundled with the PostgreSQL Snap Pack) might break.

Behavior change:

This JDBC driver upgrade has resulted in specific behavior changes in errors, status codes, and success and failure messages. Learn more

May 2024main26341 StableUpdated the Delete Condition (Truncates a Table if empty) field in the PostgreSQL - Delete Snap to Delete condition (deletes all records from a table if left blank) to indicate that all entries will be deleted from the table when this field is blank, but no truncate operation is performed.
February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023435patches23831 Latest

Fixed an issue with the PostgreSQL-Execute and PostgreSQL-Select Snaps that added escape characters ('\\') in the output for JSONB datatype.

November 2023main23721

 

StableUpdated and certified against the current SnapLogic Platform release.

August 2023

main22460

 


Stable

The PostgreSQL - Execute Snap now includes a new Query type field. When Auto is selected, the Snap tries to determine the query type automatically.

May 2023433patches21298 Latest

Fixed an issue with the PostgreSQL Insert Snap that inconsistently inserted some columns and missed the remaining columns(especially the Time fields), when the data was passed in the JSON format from an upstream Snap.

May 2023

main21015 

Stable

Upgraded with the latest SnapLogic Platform release.

February 2023 432patches20409 LatestThe PostgreSQL - Bulk Load and PostgreSQL - Insert Snaps no longer fail with the message ERROR: type modifier is not allowed for type 'bytea' when creating a new table if Create table if not present is selected and the target table does not exist. This issue occurred when metadata from the second input view document contained columns of the bytea data type.
February 2023 main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022

431patches19454

 Latest

The PostgreSQL Snap Pack supports geospatial data types.

November 2022main18944 Stable
  • The PostgreSQL - Bulk Load Snap can now process records with more than 16 KB in the document without encountering the BufferOverflowException because the default value of 16 KB for byte buffer size is now removed.
  • The PostgreSQL - Insert Snap now creates the target table only from the table metadata of the second input view when the following conditions are met:

    • The Create table if not present checkbox is selected.

    • The target table does not exist.

    • The table metadata is provided in the second input view.

September 2022430patches18149 Latest

The PostgreSQL Select and PostgresSQL Execute Snaps now read NaN values in Numeric columns when used with a PostgreSQL Account configured with the latest postgresql-42.5.0.jar driver.

September 2022430patches17894 Latest

The PostgreSQL Select Snap now works as expected when the table name is dependent on an upstream input.

August 2022430patches17700 LatestThe PostgreSQL - Bulk Load Snap can now process the records with more than 16KB in the document without encountering BufferOverflowException because the default value of 16KB for byte buffer size is now removed.
August 2022main17386 StableEnhanced the PostgreSQL Account and PostgreSQL Dynamic Account with SSH Tunneling configurations to encrypt the network connection between the client and the PostgreSQL Database server, thereby ensuring the secure network connection.
4.29 Patch429patches17036 Latest

Enhanced the PostgreSQL Account and PostgreSQL Dynamic Account with SSH Tunneling configurations to encrypt the network connection between the client and the PostgreSQL Database server, thereby ensuring a secure network connection.

4.29

main15993

 

Stable

Upgraded with the latest SnapLogic Platform release.

4.28main14627 StableUpdated the label for Delete Condition to Delete Condition (Truncates Table if empty) in the PostgreSQL Delete Snap.
4.27 Patch427patches13149 Latest

Fixed an issue with PostgreSQL - Execute Snap, where the Snap failed when using Delete query with the RETURNING function.

4.27main12833 Stable

Enhanced the PostgreSQL - Execute Snap to invoke stored procedures.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25 Patch425patches9879 Latest

Enhanced the performance of PostgreSQL - Bulk Load Snap significantly. SnapLogic anticipates that the Snap will execute up to 3 times faster than the previous version for enterprise workloads.

4.25main9554
 
StableUpgraded with the latest SnapLogic Platform release.
4.24main8556
Stable

Enhanced the PostgreSQL - Select Snap to return only the selected output fields or columns in the output schema (second output view) using the Fetch Output Fields In Schema check box. If the Output Fields field is empty all the columns are visible. 

4.23main7430
 
StableUpgraded with the latest SnapLogic Platform release.

4.22 Patch

422patches6879 Latest

Fixed the PostgreSQL - Bulk Load Snap by preventing it from adding extra double quotes when loading values from input documents.

4.22main6403
 
StableUpgraded with the latest SnapLogic Platform release.
4.21 Patch421patches6272 Latest

Fixed the issue where Snowflake SCD2 Snap generates two output documents despite no changes to Cause-historization fields with DATE, TIME and TIMESTAMP Snowflake data types, and with Ignore unchanged rows field selected.

4.21 Patch

421patches6144 Latest

Fixed the following issues with DB Snaps:

  • The connection thread waits indefinitely causing the subsequent connection requests to become unresponsive.
  • Connection leaks occur during Pipeline execution.

4.21 Patch

MULTIPLE8841 Latest

Fixed the connection issue in Database Snaps by detecting and closing open connections after the Snap execution ends. 

4.21snapsmrc542

 

StableUpgraded with the latest SnapLogic Platform release.
4.20snapsmrc535
 
StableUpgraded with the latest SnapLogic Platform release.
4.19 Patch db/postgres8409 Latest

Fixed an issue with the PostgreSQL - Update Snap wherein the Snap is unable to perform operations when:

  • An expression is used in the Update condition property.
  • Input data contain the character '?'.
4.19snaprsmrc528
 
Stable

Added new Snap PostgresSQL Bulk Load.

4.18 Patchpostgres8021 Latest

Fixed an issue with the PostgreSQL grammar to better handle the single quote characters.

4.18snapsmrc523
 
StableUpgraded with the latest SnapLogic Platform release.
4.17 Patch db/postgres7588 Latest

Fixed an issue with tables sharing an overlapping column name wherein Pipeline execution fails due to the table collision.

4.17ALL7402
 
Latest

Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17snapsmrc515
 
Latest
  • Fixed an issue with the PostgreSQL Execute Snap wherein the Snap would send the input document to the output view even if the Pass through field is not selected in the Snap configuration. With this fix, the Snap sends the input document to the output view, under the key original, only if you select the Pass through field.
  • Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.
4.16 Patch db/postgres6822 Latest

Fixed an issue with the Lookup Snap passing data simultaneously to output and error views when some values contained spaces at the end.

4.16snapsmrc508
 
StableUpgraded with the latest SnapLogic Platform release.
4.15 Patch db/postgres6333 Latest

Replaced Max idle time and Idle connection test period properties with Max life time and Idle Timeout properties respectively, in the Account configuration. The new properties fix the connection release issues that were occurring due to default/restricted DB Account settings.

4.15snapsmrc500
 
StableUpgraded with the latest SnapLogic Platform release.
4.14snapsmrc490
 
Stable

Added support for Amazon Aurora and Azure SQL DB.

4.13

snapsmrc486

 
StableUpgraded with the latest SnapLogic Platform release.
4.12 PatchMULTIPLE4967 Latest

Provided an interim fix for an issue with the PostgreSQL 10 accounts by re-registering the driver for each account validation. The final fix is being shipped in a separate build.

4.12 Patch

postgres4832 Latest

Updated the driver from version 8.4.704 to version 9.4.1207 to support PostgreSQL v10 servers.

4.12

snapsmrc480

 
StableUpgraded with the latest SnapLogic Platform release.

4.11 Patch 

db/postgres4290 Latest

PostgreSQL Snap Pack - Fixed an issue when inserting a valid NaN value into a column.

4.11snapsmrc465
 
StableUpgraded with the latest SnapLogic Platform release.
4.10 Patchpostgres3773 Latest

Previously the Postgres PGObject datatype could not be serialized. It is now handled as a String.

4.10

snapsmrc414

 
Stable

Added Auto commit property to the Select and Execute Snaps at the Snap level to support overriding of the Auto commit property at the Account level.

4.9.0 Patchpostgres3134 Latest

PostgreSQL Execute: New Snap advanced property Auto commit has been implemented to fix the Select query error in PostgreSQL replica servers.

4.9 Patch

postgres3072 Latest

Fixed an issue regarding connection not closed after login failure; Expose autocommit for "Select into" statement in PostgreSQL Execute Snap and Redshift Execute Snap.

4.9snapsmrc405
 
StableUpgraded with the latest SnapLogic Platform release.
4.8.0 Patchpostgres2757 Latest

Potential fix for JDBC deadlock issue.

4.8.0 Patch

postgres2712 Latest

Fixed PostgreSQL Snap Pack rendering dates that are one hour off from the date returned by database query for non-UTC Snaplexes.

4.8.0 Patch

postgres2696

 


Latest

Addressed an issue where some changes made in the platform patch MRC294 to improve perfomance caused Snaps in the listed Snap Packs to fail.

4.8

snapsmrc398

 
Stable
  • Info tab added to accounts.
  • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.
4.7.0 Patchpostgres2192 Latest

Fixed an issue for database Select Snaps regarding Limit rows not supporting an empty string from a pipeline parameter.

4.7.0 Patch

postgres2185 Latest

Resolved an issue with the PostgreSQL Execute Snap failing with a “java.util.regex.Pattern” error.

4.7

snapsmrc382

 
StableUpgraded with the latest SnapLogic Platform release.
4.6snapsmrc362
 
StableUpgraded with the latest SnapLogic Platform release.
4.5.1

postgres1584

 
Stable
  • Resolved an issue in the PostgreSQL - Execute Snap that resulted from restrictions on the 'with' operator in conjunction with an 'insert' statement.

  • Resolved an issue in the PostgreSQL Snaps that resulting from restrictions on the 'with' operator in conjunction with the RECURSIVE keyword.

4.4.1NA StableUpgraded with the latest SnapLogic Platform release.
4.4NA Stable
  • Resolved an issue with PostgreSQL Select Snap not parsing JSON data type correctly.
  • Note: The fix for this issue required updating libraries that impacted all database Snaps except those for MongoDB.
4.3.2NA Stable

This Snap Pack is now compatible with the PostgreSQL drivers available in 4.3 Patch mrc222.

NANA Stable
  • PostgreSQL Insert Snap: resolved an issue where it inserts a negative value when the input data was out of range.
  • PostgreSQL Snaps did not properly handle when a table name was created in mixed case.
  • JSON paths in WHERE clauses should be processed as bind values after the expression is evaluated.
NANA Stable
  • A - otd:6828 Postgres Snap shows wrong data type in preview for timestamp withtime zone data type
  • Dynamic DB queries now supported in the Execute Snap.
  • The SQL statement property now can be set as an expression property. When it is an expression, it will be evaluated with each input document and one SQL statement per each input document will be executed.
  • Known issue: When the SQL statement property is an expression, the pipeline parameters are shown in the suggest, but not the input schema.
  • With the SQL statement property set as an expression, the Snap can be exposed to SQL injection. Please use this feature with caution.
NANA Stable
  • PostgreSQL Insert: Enhanced data type support.
  • PostgreSQL Lookup: bug fixes on lookup failures; Pass-though on no lookup match property added to allow you to pass the input document through to the output view when there is no lookup matching.
    • PostgreSQL Select Snap: added support for handling array types.

Related Content

  • No labels