Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Overview

This account is used by the Snaps in the Hive Snap Pack. The account can be configured with and without Kerberos and supports SSL connection.

You can create an account from Designer or Manager. In Designer, when working on pipelines, every Snap that needs an account prompts you to create a new account or use an existing account. The accounts can be created in or used from:

  • Your private project folder: This folder contains the pipelines that will use the account.
  • Your Project Space’s shared folder: This folder is accessible to all the users that belong to the Project Space.
  • The global shared folder: This folder is accessible to all the users within an organization in the SnapLogic instance.

Account Configuration

In Manager, you can navigate to the required folder and create an account in it (see Accounts). To create an account for a generic JDBC driver: 

  1. If you have not already done so, upload the JDBC driver for this database as a file for the specific project.
  2. Click Create, then select Hive > Hive Database Account or Generic Hive Database Account (as required).
  3. Supply an account label.
  4. Supply the necessary properties for your database. 
  5. Supply the necessary JDBC driver jars for your driver.
  6. (Optional) Supply additional information on this account in the Notes field of the Info tab.
  7. Click Apply
Warning

Avoid changing account credentials while pipelines using them are in progress. This may lead to unexpected results, including locking the account.

Account Types 

Hive Database Account

Account Settings

Label

Required. User provided label for the account instance.

Account properties

Hostname

Required. The server address to connect to. 

Default value: [None]

Port number

Required. The database server's port to connect to.

Default value: 10000

Database name
 

Required. The name of the database which the account is to be connected to.
Default value: [None]

Username

Username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source.

Example: Snapuser 

Default value: [None]

Password

Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source.

Example: Snapuser 

Default value: [None]

JDBC jars

List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded.

Example: hiveJDBC4 

Default value: [None]

Note
  • The JDBC driver can be uploaded through Designer or Manager and it is stored on a per-project basis. That is, only users with access to that project will see JDBC drivers uploaded. To provide access to all users of your org, place the driver in the /shared project.
  • See Advanced Configurations: Configuring Hive with Kerberos section below for a list of JAR files to be uploaded when configuring Hive with Kerberos.

JDBC Driver Class

Required. The JDBC Driver class name to use.

Example: com.vertica.jdbc.Driver

Default value: [None]

Advanced Properties

Auto commit
optional

If true (selected), then batches are immediately committed after they execute. Therefore, only the current executing batch will be rolled back if the Snap fails.
If false (not selected), then a transaction is started for the Snap run and committed upon run success. The transaction will be rolled back if the Snap fails.

Default value: Selected

Note

For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect.

Batch size

Required. Number of statements to execute at a time.
Using a large batch size could use up the JDBC placeholder limit of 2100.

Example: 10

Default value: 50

Fetch size

Required. Number of rows to fetch at a time when executing a query.
Large values could cause the server to run out of memory.

Example: 100

Default value: 100

Max pool size

Required. Maximum number of idle connections a pool will maintain at a time.

Example: 10

Default value: 50

Max idle time
 

Required. Minutes a connection can exist in the pool before it is destroyed.

Example: 30

Default value: 30

Idle connection Test period
 

Required. Number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out.
Default value: 5

Checkout timeout
 

Required. Number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail.

Example: 10000

Default value: 10000

Url Properties

Properties to use in JDBC Url.

Example: maxAllowedPacket | 1000

Default value: [None]

Hadoop Properties

Authentication method

Required. Authentication method to use when connecting to the Hadoop service.  

  • None: Allows connection even without the Username and Password
  • Kerberos: Allows connection with Kerberos details such as Client Principal, Keytab file, and Service principal
  • User ID: Allows connection with Username only
  • User ID and Password: Allows connection with Username and Password

Default value: None

Use Zookeeper 

Specifies if Zookeeper be used to locate the Hadoop service instead of a specific hostname.

If the checkbox is selected, we should use Zookeeper to resolve the location of the database instead of using the 'hostname' field in the standard block.

Default value: Not selected

Zoo Keeper URL
 

If you intend to use Zookeeper, then you must provide the following details:

You must provide the URL of the Zookeeper service. Zookeeper URL formats are different for CDH and HDP.

Default valueNone

Note

This is NOT the URL for the Hadoop service being sought.

Hive properties

JDBC Subprotocol

Required. JDBC Subprotocol to be used. The options available are Hive and Impala.

Default value: Hive

Kerberos propertiesRequired when the Authentication method is Kerberos. Configuration information Required. for the Kerberos authentication.

Client Principal

Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). 

Default value: [None]

Keytab file

Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC.

Default value[None]

Service principal

Principal used by an instance of a service.

Examples: 

  • If you are connecting to a specific server:
  • hive/host@REALM or impala/host@REALM
  • If you are connecting(more common for the Snap) to any compliant host (see Use Zookeeper property's description) in which case the principal is:
  • 'hive/_HOST@REALM' or 'impala/_HOST@REALM'. 

Default value: [None]

  

Generic Hive Database Account

Account Settings

Label

Required. User provided label for the account instance.

Account properties

Username

Username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source.

This is not a mandatory field.

Example: Snapuser 

Default value: [None]

Password

Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source.

This is not a mandatory field.

Example: Snapuser 

Default value: [None]

JDBC URL

The URL of the JDBC database.

Example: jdbc:hive://hostname/dbname;sasl.qop=auth-int

Default value: [None]

JDBC jars

List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded.

Example: hiveJDBC4 

Default value: [None]
Table of Contents

Overview

This account is used by the Snaps in the Hive Snap Pack. The account can be configured with and without Kerberos and supports SSL connection.

You can create an account from Designer or Manager. In Designer, when building pipelines, every Snap that needs an account prompts you to create a new account or use an existing account. The accounts can be created in or used from:

  • Your private project folder: This folder contains the pipelines that will use the account.
  • Your Project Space’s shared folder: This folder is accessible to all the users that belong to the Project Space.
  • The global shared folder: This folder is accessible to all the users within an organization in the SnapLogic instance.

Account Configuration

In Manager, you can navigate to the required folder and create an account (see Hive Account). To create an account for a generic JDBC driver: 

  1. If not already done, upload the JDBC driver for this database as a file for the specific project.
  2. Click Create, then select Hive > Hive Database Account or Generic Hive Database Account (as required).
  3. Supply an account label.
  4. Supply the necessary properties for your database. 
  5. Supply the necessary JDBC driver jars for your driver.
  6. (Optional) Supply additional information on this account in the Notes field of the Info tab.
  7. Click Apply
Warning

Avoid changing account credentials while pipelines using them are in progress. This may lead to unexpected results, including locking the account.


Account Types 

Hive Database Account

Expand
titleHive Database Account

Account Settings

Label

Required. User provided label for the account instance.

Account properties


Hostname

Required. The server address to connect to. 

Default value: None.

Port number

Required. The database server's port to connect to.

Default value: 10000

Database name

Required. The name of the database which the account is to be connected to.
Default value: None.

Username


The username that is allowed to connect to the database.

Example: Snapuser 

Default value: None.

Password

Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source.

Example: Snapuser 

Default value: None.

JDBC jars

List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded.

Enter the following JDBC JAR values to configure your Hadoop HA Kerberos account:

For HDP

  • Hive-jdbc-2.0.0.2.3.5.0-81-standalone.jar

  • Zookeeper-3.4.6.jar (Use this for setting up Hive with Zookeeper)

For CDH

  • hive_metastore.jar
  • hive_service.jar
  • hiveJDBC4.jar
  • libfb303-0.9.0.jar
  • libthrift-0.9.0.jar
  • TCLIServiceClient.jar
  • Zookeeper-3.3.6.jar (Use this for setting up Hive with Zookeeper)

Default value: None

Note
  • The JDBC driver can be uploaded through Designer or Manager and it is stored on a per-project basis. That is, only users with access to that project will see JDBC drivers uploaded. To provide access to all users of your org, place the driver in the /shared project.
  • See Advanced Configurations: Configuring Hive with Kerberos section below for a list of JAR files to be uploaded when configuring Hive with Kerberos.


JDBC Driver Class

Required. The JDBC Driver class name. 

For HDP Clusters

Enter the following value: org.apache.hive.jdbc.HiveDriver

For CDH Clusters

Enter the following value: com.cloudera.hive.jdbc4.HS2Driver

Default value: None.

Advanced Properties


           Auto commit

If selected, then batches are immediately committed after they execute. Therefore, only the current executing batch will be rolled back if the Snap fails.

If not selected, then a transaction is started for the Snap run and committed upon run success. The transaction will be rolled back if the Snap fails.

Default value: Selected


Note

For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect.


Batch size


Required. Number of statements to execute at a time.
Using a large batch size could use up the JDBC placeholder limit of 2100.

Example: 10

Default value: 50

Fetch size


Required. Number of rows to fetch at a time when executing a query.
Large values could cause the server to run out of memory.

Example: 100

Default value: 100

Max pool size


Required. Maximum number of idle connections a pool will maintain at a time.

Example: 10

Default value: 50

Max idle time

Required. Minutes a connection can exist in the pool before it is destroyed.

Example: 30

Default value: 30

Idle Connection Test period

Required. Number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out.
Default value: 5

Checkout timeout

Required. Number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail.

Example: 10000

Default value: 10000

Url Properties

Properties to use in JDBC Url. These properties will need to be configured when setting up SSL connection. See Advanced Configurations: Configuring Hive with SSL section below for details.

Example: maxAllowedPacket | 1000

Default value: None.

Hadoop properties

Authentication method

Required. Authentication method to use when connecting to the Hadoop service.  

  • None: Allows connection even without the Username and Password
  • Kerberos: Allows connection with Kerberos details such as Client Principal, Keytab file, and Service principal
  • User ID: Allows connection with Username only
  • User ID and Password: Allows connection with Username and Password

Default value: None

Use Zookeeper 


Specifies if Zookeeper be used to locate the Hadoop service instead of a specific hostname.

If the checkbox is selected, use Zookeeper to resolve the location of the database instead of using the hostname field in the standard block.

Default value: Not selected


Note

When using Zookeeper in combination with a Hive account, add the Zookeeper JAR package file on the Groundplex associated with that Hive account. The version of Zookeeper on the Groundplex should be the same as the version your Hive account uses.

For HDP users, in addition to the zookeeper.jar package you might also require the curator-client-X.X.X.jar and curator-framework-X.X.X.jar package files on the Groundplex.


Zookeeper URL

If you intend to use Zookeeper, then you must provide the following details:

You must provide the URL of the Zookeeper service. Zookeeper URL formats are different for CDH and HDP.

Default valueNone


Note

This is NOT the URL for the Hadoop service being sought.


Hive properties

JDBC Subprotocol


Conditional. This is required when the Authentication method is KerberosJDBC Subprotocol to be used. The options available are Hive and Impala.

Default value: Hive

Kerberos properties

Configuration information required for the Kerberos authentication. These properties must be configured if you select Kerberos in the Authentication method property.

Client Principal

Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). 

Default value: None.

Keytab file

Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC.

Default valueNone.

Service principal


Principal used by an instance of a service.

Examples: 

  • If you are connecting to a specific server:
    hive/host@REALM or impala/host@REALM
  • If you are connecting(more common for the Snap) to any compliant host (see Use Zookeeper property's description) in which case the principal is:
    'hive/_HOST@REALM' or 'impala/_HOST@REALM'. 

Default value: None.


Generic Hive Database Account

Expand
titleGeneric Hive Database Account

Account Settings

Label

Required. User provided label for the account instance.

Account properties


Username


Username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source.


Example: Snapuser 

Default value: None.

Password

Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source.


Example: Snapuser 

Default value: None.

JDBC URL


The URL of the JDBC database.

Example: jdbc:hive://hostname/dbname;sasl.qop=auth-int

Default value: None.

JDBC jars

List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded.

Enter the following JDBC JAR values to configure your Hadoop HA Kerberos account:

For HDP

  • Hive-jdbc-2.0.0.2.3.5.0-81-standalone.jar

  • Zookeeper-3.4.6.jar (Use this for setting up Hive with Zookeeper)

For CDH

  • hive_metastore.jar
  • hive_service.jar
  • hiveJDBC4.jar
  • libfb303-0.9.0.jar
  • libthrift-0.9.0.jar
  • TCLIServiceClient.jar
  • Zookeeper-3.3.6.jar (Use this for setting up Hive with Zookeeper)

Default value: None

Note
  • The JDBC driver can be uploaded through Designer or Manager and it is stored on a per-project basis. That is, only users with access to that project will see JDBC drivers uploaded. To provide access to all users of your org, place the driver in the /shared project.
  • See
 
  • Advanced Configurations: Configuring Hive
with Kerberos 
  • with Kerberos section below for a list of JAR files to be uploaded when configuring Hive with Kerberos.


JDBC Driver Class

Required.

 The JDBC Driver class name to use.

Example: com.vertica.jdbc.Driver

Default value: [None]Additional

 The JDBC Driver class name. 

For HDP Clusters

Enter the following value: org.apache.hive.jdbc.HiveDriver

For CDH Clusters

Enter the following value: com.cloudera.hive.jdbc4.HS2Driver

Default value: None.

Advanced Properties


           Auto commit

If true (selected), then batches are immediately committed after they execute. Therefore, only the current executing batch will be rolled back if the Snap fails.

If false (not selected), then a transaction is started for the Snap run and committed upon run success. The transaction will be rolled back if the Snap fails.

Default value: Selected


Note

For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect.


Batch size


Required. Number of statements to execute at a time.
Using a large batch size could use up the JDBC placeholder limit of 2100.

Example: 10

Default value: 50

Fetch size


Required. Number of rows to fetch at a time when executing a query.
Large values could cause the server to run out of memory.

Example: 100

Default value: 100

Max pool size


Required. Maximum number of idle connections a pool will maintain at a time.

Example: 10

Default value: 50

Max idle time

Required. Minutes a connection can exist in the pool before it is destroyed.

Example: 30

Default value: 30

Idle connection Test period

Required. Number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out.
Default value: 5

Checkout timeout

Required. Number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail.

Example: 10000

Default value: 10000

Url Properties

Properties to use in JDBC Url. These properties will need to be configured when setting up SSL connection. See

 

Advanced Configurations: Configuring Hive

with SSL

with SSL section below for details.

Example: maxAllowedPacket | 1000

Default value:

 [None]

 None.

Hadoop Properties

Authentication method

Required. Authentication method to use when connecting to the Hadoop service.  

  • None: Allows connection even without the Username and Password
  • Kerberos: Allows connection with Kerberos details such as Client Principal, Keytab file, and Service principal
  • User ID: Allows connection with Username only
  • User ID and Password: Allows connection with Username and Password

Default value: None

Use Zookeeper 


Specifies if Zookeeper be used to locate the Hadoop service instead of a specific hostname.

If the checkbox is selected,

we should

use Zookeeper to resolve the location of the database instead of using the

'

hostname

'

field in the standard block.

Default value: Not selected

Zoo Keeper URL

If you intend to use Zookeeper, then you must provide the following details:

You must provide the URL of the Zookeeper service. Zookeeper URL formats are different for CDH and HDP.

  • For HDP:
  • For CDH:Format: zk=

    Note
    titleZookeeper Versions

    When using Zookeeper in combination with a Hive account, add the Zookeeper JAR package file on the Groundplex associated with that Hive account. The version of Zookeeper on the Groundplex should be the same as the version your Hive account uses.

    For HDP users, in addition to the zookeeper.jar package you might also require the curator-client-X.X.X.jar and curator-framework-X.X.X.jar package files on the Groundplex.


    Zoo Keeper URL

    If you intend to use Zookeeper, then you must provide the following details:

    You must provide the URL of the Zookeeper service. Zookeeper URL formats are different for CDH and HDP.

    • For HDP:
      • Format: hostname1:port,hostname2:port/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
      • Example:
    Zk = jdbc:hive2://cdhclusterqa-1-1.clouddev.snaplogic.com:2181,cdhclusterqa-1-2
    cdhclusterqa
    1
    3
    .com:2181/hiveserver2

    Default valueNone

    Note

    This is NOT the URL for the Hadoop service being sought.

    Hive properties

    JDBC Subprotocol

    Conditional. This is required when the Authentication method is KerberosJDBC Subprotocol to be used. The options available are Hive and Impala.

    Default value: Hive

    Kerberos properties

    Configuration information Required. for the Kerberos authentication. These properties must be configured if you select Kerberos in the Authentication method property.

    Client Principal

    Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). 

    Default value: [None]

    Keytab file

    Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC.

    Default value[None]

    Service principal

    Principal used by an instance of a service.

    Examples: 

    • If you are connecting to a specific server:
      hive/host@REALM or impala/host@REALM
    • If you are connecting(more common for the Snap) to any compliant host (see Use Zookeeper property's description) in which case the principal is:
      'hive/_HOST@REALM' or 'impala/_HOST@REALM'. 

    Default value: [None]

    Account Encryption

    Standard Encryption

    If you are using Standard Encryption, the High sensitivity settings under Enhanced Encryption are followed.

    Enhanced Encryption

    If you have the Enhanced Account Encryption feature, the following describes which fields are encrypted for each sensitivity level selected for this account.

    Account:

    • High: Password, Key tab file
    • Medium + High: Password, Key tab file
    • Low + Medium + High: Password, Key tab file

    Additional Configurations

    Configuring Hive with Kerberos

    Following is the recommended list of JARs to be uploaded for the JDBC4 drivers on Hive with Kerberos:

  • hive_metastore.jar
  • hive_service.jar
  • HiveJDBC4.jar
  • libfb303-0.9.0.jar
  • libthrift-0.9.0.jar
  • TCLIServiceClient.jar

    Default value: None

    Note

    This is NOT the URL for the Hadoop service being sought.


    Hive properties

    JDBC Subprotocol


    Conditional. This is required when the Authentication method is KerberosJDBC Subprotocol to be used. The options available are Hive and Impala.

    Default value: Hive

    Kerberos properties

    Configuration information required for the Kerberos authentication. These properties must be configured if you select Kerberos in the Authentication method property.

    Client Principal

    Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). 

    Default value: None.

    Keytab file

    Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC.

    Default valueNone.

    Service principal


    Principal used by an instance of a service.

    Examples: 

    • If you are connecting to a specific server:
      hive/host@REALM or impala/host@REALM
    • If you are connecting(more common for the Snap) to any compliant host (see Use Zookeeper property's description) in which case the principal is:
      'hive/_HOST@REALM' or 'impala/_HOST@REALM'. 

    Default value: None.


    Additional Configurations

    Configuring Hive with SSL

    The following URL properties have to be configured:

    URL Property NameURL Property Value
    sslRequired. Binary value to denote that SSL is enabled. This value must always be 1.
    sslTrustStoreRequired.
     
    The path of the SSL trust store in the SLDB.
    sslTrustStorePasswordRequired.
     
    Password configured for the SSL trust store.
    AllowSelfSignedCertsBinary value to denote that server is allowed to use self-signed SSL certificates by the driver. 
    CAIssuedCertNamesMismatchBinary value to denote that the CA issued SSL certificate's name is required by the driver to match the host name of the Hive server. 


    Note

    The above list is specific to Hive with or without Kerberos enabled. With Kerberos enabled, the properties such

    as Client

    as Client Principal,

     Key

    Key tab file

    , and Service principal have

    and Service principal have to be additionally provided.

    Limitations or known issues

    • "Method not supported" error while validating Apache Hive JDBC v1.2.1 or earlier
      The Hive Snap Pack does not validate with Apache Hive JDBC v1.2.1 jars or earlier because of a defect in Hive. HDP 2.6.3 and HDP 2.6.1 run on Apache Hive JDBC v1.2.1 jars.
      To validate Snaps that must work with HDP 2.6.3 and HDP 2.6.1, use JDBC v2.0.0 jars.

    Testing Environment

    • Hive Version: Hive 1.1.0, Hive 1.2.0

    • Hive with Kerberos works only on Hive JDBC4 driver 2.5.12 and above 

    • Hive with Kerberos is tested on Hadooplex only

    • Hive with SSL  works only on Hive JDBC4 driver 2.5.12 and above.
    • Cloudera CDH VersionCluster VersionsCDH 5.78, CDH 5.10, HDP 2.6.1, HDP 2.6.3 


    Panel
    borderColorblack
    borderWidth1
    borderStylesolid
    titleAccount History

    4.14 (snapsmrc490)

    • Added a new account type: Generic Hive Database Accountthis enables connecting to different types of clusters using JDBC URL.

    4.8.0 

    • Info tab added to accounts.

    • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.

    4.7.0

    • Enabled the account with Kerberos authentication.