Skip to end of banner
Go to start of banner

Generic Hive Database Account

Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 11 Next »

In this article

Overview

You can use this account type to connect Hive Snaps with data sources that use Generic Hive Database account.

Prerequisites

  • A Hive account.

Limitations

  • The Hive Snap Pack does not validate with Apache Hive JDBC v1.2.1 jars or earlier because of a defect in Hive. HDP 2.6.3 and HDP 2.6.1 run on Apache Hive JDBC v1.2.1 jars.

  • To validate Snaps that must work with HDP 2.6.3 and HDP 2.6.1, use JDBC v2.0.0 jars.

Known Issues

  • "Method not supported" error while validating Apache Hive JDBC v1.2.1 or earlier

Account Settings

  • Asterisk ( * ): Indicates a mandatory field.

  • Suggestion icon ( (blue star) ): Indicates a list that is dynamically populated based on the configuration.

  • Expression icon ( (blue star) ): Indicates the value is an expression (if enabled) or a static value (if disabled). Learn more about Using Expressions in SnapLogic.

  • Add icon ( (blue star) ): Indicates that you can add fields in the fieldset.

  • Remove icon ( (blue star) ): Indicates that you can remove fields from the fieldset.

Field Name

Field Type

Description

Label*

Default Value: N/A
Example: Generic Hive Database Account

String

Specify a unique label for the account.

Account properties*

Username

Default Value: N/A
Example: Snapuser 

String

Specify the username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source.

Password

Default Value: N/A
Example: Sn@pUser.3

String

Specify the password used to connect to the data source. Password will be used as the default when retrieving connections. The password must be valid in order to set up the data source.

JDBC URL

Default Value: N/A
Example: jdbc:hive://hostname/dbname:sasl.qop=auth-int

String/Expression

Specify the URL of the JDBC database.

JDBC JARs

Use this fieldset to specify the JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded.

Enter the following JDBC jars to configure the Generic Hive Database account for the concerned cluster.

For HDP

  • Hive-jdbc-2.0.0.2.3.5.0-81-standalone.jar

  • Zookeeper-3.4.6.jar (Use this for setting up Hive with Zookeeper)

For CDH

  • hive_metastore.jar

  • hive_service.jar

  • hiveJDBC4.jar

  • libfb303-0.9.0.jar

  • libthrift-0.9.0.jar

  • TCLIServiceClient.jar

  • Zookeeper-3.3.6.jar (Use this for setting up Hive with Zookeeper)

  • The JDBC driver can be uploaded through Designer or Manager and it is stored on a per-project basis. That is, only users with access to that project will see JDBC drivers uploaded. To provide access to all users of your org, place the driver in the /shared project.

  • See Advanced Configurations: Configuring Hive with Kerberos section below for a list of JAR files to be uploaded when configuring Hive with Kerberos.

JDBC Driver Class*

Default Value: org.apache.hive.jdbc.HiveDriver
Example: jdbc:hive://hostname/dbname:sasl.qop=auth-int

String

Specify the JDBC Driver class name. 

For HDP Clusters

Enter the following value: org.apache.hive.jdbc.HiveDriver

For CDH Clusters

Enter the following value: com.cloudera.hive.jdbc4.HS2Driver

Advanced properties

Auto commit

Default Value: Selected

Checkbox

Select this checkbox to commit a batch immediately after the batch executes. So, only the current executing batch will be rolled back if the Snap fails. If you deselect, then a transaction is started for the Snap run and committed upon run success. The transaction is rolled back if the Snap fails.

For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect.

Batch size*

Default Value: 50
Example: 10

Integer

Specify the number of statements to execute at a time. Using a large batch size could use up the JDBC placeholder limit of 2100.

Fetch size*

Default Value: 100
Example: 100

Integer

Specify the number of rows to fetch at a time when executing a query. Large values could cause the server to run out of memory.

Max pool size*

Default Value: 50
Example: 10

Integer

Specify the maximum number of idle connections a pool will maintain at a time.

Max life time*

Default Value: 30
Example: 25

Integer

Specify the minutes a connection can exist in the pool before it is destroyed.

Idle Timeout*

Default Value: 5
Example: 4

Integer

Specify the number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out.

Checkout timeout*

Default Value10000
Example10000

Integer

Specify the number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail.

Url properties

Use this fieldset to specify properties to use in JDBC Url. These properties will need to be configured when setting up SSL connection. See Advanced Configurations: Configuring Hive with SSL section below for details.

Url property name

Default Value: N/A
ExamplemaxAllowedPacket 

String

Specify a name for the URL property to be used by the account.

Url property value

Default ValueN/A
Example1000

String

Specify a value for the URL property name.

Hadoop properties

Authentication method*

Default ValueNone
ExampleKerberos

Dropdown list

Select the Authentication method to use when connecting to the Hadoop service.  

  • None: Allows connection even without the Username and Password

  • Kerberos: Allows connection with Kerberos details such as Client Principal, Keytab file, and Service principal

  • User ID: Allows connection with Username only

  • User ID and Password: Allows connection with Username and Password

Use Zookeeper 

Default ValueDeselected

Checkbox

Select if Zookeeper be used to locate the Hadoop service instead of a specific hostname. If the checkbox is selected, use Zookeeper to resolve the location of the database instead of using the hostname field in the standard block.

Zookeeper Versions

When using Zookeeper in combination with a Hive account, add the Zookeeper JAR package file on the Groundplex associated with that Hive account. The version of Zookeeper on the Groundplex should be the same as the version your Hive account uses.

For HDP users, in addition to the zookeeper.jar package, you might also require the curator-client-X.X.X.jar and curator-framework-X.X.X.jar package files on the Groundplex.

Zookeeper URL


Default Value: N/A
Examplehostname1:port,hostname2:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

String

Specify the URL of the Zookeeper service. Zookeeper URL formats are different for CDH and HDP.

  • For HDP:

    • Format: hostname1:port,hostname2:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

    • Example: na77sl-ihdc-ux02011.clouddev.snaplogic.com:2181,na77sl-ihdc-ux02012.clouddev.snaplogic.com:2181,na77sl-ihdc-ux02013.clouddev.snaplogic.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2

  • For CDH:

    • Format: zk=hostname1:port,hostname2:port/hiveserver2

    • Example: Zk = jdbc:hive2://cdhclusterqa-1-1.cloudev.snaplogic.com:2181.cdhclusterqa-1-2.clouddev.snaplogic.com:2181.cdhclusterqa-1-3.clouddev.smaplogic.com:2181/hiveserver2

This is NOT the URL for the Hadoop service being sought.

Hive properties

JDBC Subprotocol*

Default ValueHive
ExampleImpala

Dropdown list

Specify the JDBC Subprotocol to be used. This is required when the Authentication method is KerberosAvailable options are:

  • Hive

  • Impala

Kerberos properties

Use this fieldset to configure information required for the Kerberos authentication. These properties must be configured if you select Kerberos in the Authentication method property.

Client Principal

Default Value: N/A
Example: hiveclient@EXAMPLE.COM

String

Specify the principal used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). 

Keytab File

Default Value: N/A
Exampleetc/krb5.keytab

String

Specify the Keytab file (file used to store encryption keys) used to authenticate to Kerberos KDC.

Service Principal

Default Value: N/A
Example:  hive/host@REALM or impala/host@REALM

String

Specify the principal used by an instance of a service.

Examples: 

  • If you are connecting to a specific server: hive/host@REALM or impala/host@REALM

  • If you are connecting(more common for the Snap) to any compliant host (see Use Zookeeper property's description) in which case the principal is: 'hive/_HOST@REALM' or 'impala/_HOST@REALM'. 

Snap Pack History

 Click here to expand...

Release 

Snap Pack Version

Date

Type

  Updates

May 2024

main26341

08 May 2024 

Stable

Updated and certified against the current SnapLogic Platform release.

February 2024

main25112

Stable

Updated and certified against the current SnapLogic Platform release.

November 2023

main23721

Stable

Updated and certified against the current SnapLogic Platform release.

August 2023

main22460

Stable

The Hive-Execute Snap now includes a new Query type field. When Auto is selected, the Snap tries to determine the query type automatically.

May 2023

main21015

Stable

The Hive Snap Pack is Cloudera-certified for Cloudera Data Warehouse (CDW). You can use the Hive Execute Snap to work with CDW clusters through a Generic Hive Database account.

February 2023

main19844

09 Feb 2023 

Stable

Upgraded with the latest SnapLogic Platform release.

November 2022

main18944

10 Nov 2022 

Stable

Upgraded with the latest SnapLogic Platform release.

August 2022

main17386

11 Aug 2022  

Stable

Upgraded with the latest SnapLogic Platform release.

4.29

main15993

14 May 2022 

Stable

Upgraded with the latest SnapLogic Platform release.

4.28

main14627

12 Feb 2022 

Stable

Upgraded with the latest SnapLogic Platform release.

4.27

main12833

13 Nov 2021 

Stable

Upgraded with the latest SnapLogic Platform release.

4.26

main11181

14 Aug 2021 

Stable

Upgraded with the latest SnapLogic Platform release.

4.25

main9554

08 May 2021 

Stable

Upgraded with the latest SnapLogic Platform release.

4.24 Patch

424patches8867

11 Mar 2021 

Latest

Fixes the missing library error in Hive Snap Pack when running Hadoop Pipelines in JDK11 runtime.

4.24

main8556

13 Feb 2021

Stable

Upgraded with the latest SnapLogic Platform release.

4.23

main7430

14 Nov 2020 

Stable

Upgraded with the latest SnapLogic Platform release.

4.22

main6403

12 Sep 2020 

Stable

Upgraded with the latest SnapLogic Platform release.

4.21 Patch

421patches6272

27 Jul 2020 

Latest

Fixes the issue where Snowflake SCD2 Snap generates two output documents despite no changes to Cause-historization fields with DATE, TIME and TIMESTAMP Snowflake data types, and with Ignore unchanged rows field selected.

4.21 Patch

421patches6144

02 Jul 2020 

Latest

Fixes the following issues with DB Snaps:

  • The connection thread waits indefinitely causing the subsequent connection requests to become unresponsive.

  • Connection leaks occur during Pipeline execution.

4.21 Patch

421patches5851

08 Jun 2020 

Latest

Fixes the Hive Execute Snap that fails with a java.lang.NullPointerException error.

4.21 Patch

MULTIPLE8841

19 May 2020 

Latest

Fixes the connection issue in Database Snaps by detecting and closing open connections after the Snap execution ends.

4.21

snapsmrc542

09 May 2020 

Stable

Upgraded with the latest SnapLogic Platform release.

4.20

snapsmrc535

08 Feb 2020 

Stable

Upgraded with the latest SnapLogic Platform release.

4.19

snaprsmrc528

14 Nov 2019 

Stable

Upgraded with the latest SnapLogic Platform release.

4.18

snapsmrc523

10 Aug 2019 

Stable

Upgraded with the latest SnapLogic Platform release.

4.17

ALL7402

11 Jun 2019 

Latest

Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17

snapsmrc515

11 Jun 2019 

Stable

  • Certified and tested the Snap Pack against CDH 6.1.

  • Fixes an issue with the Hive Execute Snap wherein the Snap would send the input document to the output view even if the Pass through field is not selected in the Snap configuration. With this fix, the Snap sends the input document to the output view, under the key original, only if you select the Pass through field.

  • Adds the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.

  • Adds a new authentication method, User ID and Password with SSL, for Hive SSL Accounts which allows SSL connections for valid user name and password credentials.

4.16

snapsmrc508

16 Feb 2019 

Stable

Upgraded with the latest SnapLogic Platform release.

4.15 Patch 

db/hive6330

05 Dec 2018 

Latest

Replaced Max idle time and Idle connection test period properties with Max life time and Idle Timeout properties respectively, in the Account configuration. The new properties fix the connection release issues that were occurring due to default/restricted DB Account settings.

4.15

snapsmrc500

15 Dec 2018 

Stable

Added Hive HA support for Zookeeper.

4.14

snapsmrc490

11 Aug 2018 

Stable

Added a new account type: Generic Hive Database Account, this enables connecting to different types of clusters using JDBC URL.

4.13 Patch 

db/hive5269

07 Jun 2018 

Latest

Fixes the Hive Execute Snap that stores account passwords in plain text in the log file. 

4.13

snapsmrc486

12 May 2018 

Stable

Upgraded with the latest SnapLogic Platform release.

4.12

snapsmrc480

17 Feb 2018 

Stable

Upgraded with the latest SnapLogic Platform release.

4.11

snapsmrc465

11 Nov 2017 

Stable

Upgraded with the latest SnapLogic Platform release.

4.10

snapsmrc414

12 Aug 2017 

Stable

Upgraded with the latest SnapLogic Platform release.

4.9 Patch

hive3068

01 Jun 2017 

Latest

  • Fixes an issue regarding connection not closed after login failure; Expose autocommit for "Select into" statement in PostgreSQL Execute Snap and Redshift Execute Snap

4.9

snapsmrc405

13 May 2017 

Stable

  • Hive - Execute Snap is tested on Cloudera Version 5.8.

  • Hive - Execute Snap(Kerberos) now works on Groundplex.

4.8 Patch

hive2752

27 Mar 2017 

Latest

Potential fix for JDBC deadlock issue.

4.8

snapsmrc398

11 Feb 2017 

Stable

  • Info tab added to accounts.

  • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.

4.7.0 Patch

hive2469

17 Jan 2017 

Latest

Addresses an issue with ClouderaHiveJDBCDriver(500168) Unable to connect to server: GSS initiate failed, Fixes by changing the connection pooling to Hikari and added privileged user to all getConnect() request.

4.7.0 Patch

hive2199

28 Nov 2016 

Latest

Fixes an issue for database Select Snaps regarding Limit rows not supporting an empty string from a pipeline parameter.

4.7

snapsmrc382

23 Nov 2016 

Stable

  • The editor box for the SQL statement property in certain database Snaps can now be resized to make it easier to read the contents. This setting is in the Execute Snaps for Cassandra, Hive, JDBC, Oracle, MySQL, SQL Server, PostgreSQL, SAP HANA, Vertica, and Teradata.

  • Enabled the Hive account with Kerberos authentication (Hive with Kerberos works only on Hive JDBC4 driver 2.5.12 and above).

4.6 Patch

hive1958

05 Oct 2016 

Latest

Resolved a performance issue with Hive Execute and JDBC Execute Snaps when running Hive Queries.

4.6

snapsmrc362

13 Aug 2016 

Stable

Snap Pack introduced in 4.6.0. This includes only a Hive Execute Snap that executes DML and DDL statements with Kerberos enabled. It does not include Snaps for load, select, insert, delete, execute or others at this time. Tested only on Cloudera CDH 5.3 & 5.5, Hortonworks HDP 2.3.4.


Related Content

  • No labels