Overview
This account is used by the Snaps in the Hive Snap Pack. The account can be configured with and without Kerberos and supports SSL connection.
You can create an account from Designer or Manager. In Designer, when working on pipelines, every Snap that needs an account prompts you to create a new account or use an existing account. The accounts can be created in or used from:
- Your private project folder: This folder contains the pipelines that will use the account.
- Your Project Space’s shared folder: This folder is accessible to all the users that belong to the Project Space.
- The global shared folder: This folder is accessible to all the users within an organization in the SnapLogic instance.
Account Configuration
In Manager, you can navigate to the required folder and create an account in it (see Accounts). To create an account for a generic JDBC driver:
- If you have not already done so, upload the JDBC driver for this database as a file for the specific project.
- Click Create, then select Hive > Hive Database Account or Generic Hive Database Account (as required).
- Supply an account label.
- Supply the necessary properties for your database.
- Supply the necessary JDBC driver jars for your driver.
- (Optional) Supply additional information on this account in the Notes field of the Info tab.
- Click Apply.
Avoid changing account credentials while pipelines using them are in progress. This may lead to unexpected results, including locking the account.
Account Types
Hive Database Account
Account Settings
Label | Required. User provided label for the account instance. | |
---|---|---|
Account properties | ||
Hostname | Required. The server address to connect to. Default value: [None] | |
Port number | Required. The database server's port to connect to. Default value: 10000 | |
Database name | Required. The name of the database which the account is to be connected to. Default value: [None] | |
Username | Username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source. Example: Snapuser Default value: [None] | |
Password | Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source. Example: Snapuser Default value: [None] | |
JDBC jars | List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded. Example: hiveJDBC4 Default value: [None]
| |
JDBC Driver Class | Required. The JDBC Driver class name to use. Example: com.vertica.jdbc.Driver Default value: [None] | |
Advanced Properties | ||
Auto commit | If true (selected), then batches are immediately committed after they execute. Therefore, only the current executing batch will be rolled back if the Snap fails. Default value: Selected For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect. | |
Batch size | Required. Number of statements to execute at a time. Example: 10 Default value: 50 | |
Fetch size | Required. Number of rows to fetch at a time when executing a query. Example: 100 Default value: 100 | |
Max pool size | Required. Maximum number of idle connections a pool will maintain at a time. Example: 10 Default value: 50 | |
Max idle time | Required. Minutes a connection can exist in the pool before it is destroyed. Example: 30 Default value: 30 | |
Idle connection Test period | Required. Number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out. Default value: 5 | |
Checkout timeout | Required. Number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail. Example: 10000 Default value: 10000 | |
Url Properties | Properties to use in JDBC Url. Example: maxAllowedPacket | 1000 Default value: [None] | |
Hadoop Properties | ||
Authentication method | Required. Authentication method to use when connecting to the Hadoop service.
Default value: None | |
Use Zookeeper | Specifies if Zookeeper be used to locate the Hadoop service instead of a specific hostname. If the checkbox is selected, we should use Zookeeper to resolve the location of the database instead of using the 'hostname' field in the standard block. Default value: Not selected | |
Zoo Keeper URL | URL for Zookeeper service. This is NOT the URL for the Hadoop service being sought. Default value: [None] | |
Hive properties | ||
JDBC Subprotocol | Required. JDBC Subprotocol to be used. The options available are Hive and Impala. Default value: Hive | |
Kerberos properties | Required when the Authentication method is Kerberos. Configuration information Required. for the Kerberos authentication. | |
Client Principal | Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). Default value: [None] | |
Keytab file | Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC. Default value: [None] | |
Service principal | Principal used by an instance of a service. Examples:
Default value: [None] |
Generic Hive Database Account
Account Settings
Label | Required. User provided label for the account instance. | |
---|---|---|
Account properties | ||
Username | Username that is allowed to connect to the database. Username will be used as the default username when retrieving connections. The username must be valid in order to set up the data source. This is not a mandatory field. Example: Snapuser Default value: [None] | |
Password | Password used to connect to the data source. Password will be used as the default password when retrieving connections. The password must be valid in order to set up the data source. This is not a mandatory field. Example: Snapuser Default value: [None] | |
JDBC URL | The URL of the JDBC database. Example: jdbc:hive://hostname/dbname;sasl.qop=auth-int Default value: [None] | |
JDBC jars | List of JDBC JAR files to be loaded. A different driver binary for a driver must have a different name, the same name can not be reused for a different driver. If this property is left blank, a default JDBC driver will be loaded. Example: hiveJDBC4 Default value: [None]
| |
JDBC Driver Class | Required. The JDBC Driver class name to use. Example: com.vertica.jdbc.Driver Default value: [None] | |
Advanced Properties | ||
Auto commit | If true (selected), then batches are immediately committed after they execute. Therefore, only the current executing batch will be rolled back if the Snap fails. Default value: Selected For a DB Execute Snap, assume that a stream of documents enter the input view of the Snap and the SQL statement property has JSON paths in the WHERE clause. If the number of documents are large, the Snap executes in more than one batches rather than executing one per each document. Each batch would contain a certain number of WHERE clause values. If Auto commit is turned on, a failure would only roll back the records in the current batch. If Auto commit is turned off, the entire operation would be rolled back. For a single execute statement (with no input view), the setting has no practical effect. | |
Batch size | Required. Number of statements to execute at a time. Example: 10 Default value: 50 | |
Fetch size | Required. Number of rows to fetch at a time when executing a query. Example: 100 Default value: 100 | |
Max pool size | Required. Maximum number of idle connections a pool will maintain at a time. Example: 10 Default value: 50 | |
Max idle time | Required. Minutes a connection can exist in the pool before it is destroyed. Example: 30 Default value: 30 | |
Idle connection Test period | Required. Number of minutes for a connection to remain idle before a test query is run. This helps keep database connections from timing out. Default value: 5 | |
Checkout timeout | Required. Number of milliseconds to wait for a connection to be available in the pool. Zero waits forever. After set time, then an exception will be thrown and the pipeline will fail. Example: 10000 Default value: 10000 | |
Url Properties | Properties to use in JDBC Url. These properties will need to be configured when setting up SSL connection. See Additional Configurations: Configuring Hive with SSL section below for details. Example: maxAllowedPacket | 1000 Default value: [None] | |
Hadoop Properties | ||
Authentication method | Required. Authentication method to use when connecting to the Hadoop service.
Default value: None | |
Use Zookeeper | Specifies if Zookeeper be used to locate the Hadoop service instead of a specific hostname. If the checkbox is selected, we should use Zookeeper to resolve the location of the database instead of using the 'hostname' field in the standard block. Default value: Not selected | |
Zoo Keeper URL | URL for Zookeeper service. Default value: [None] This is NOT the URL for the Hadoop service being sought. | |
Hive properties | ||
JDBC Subprotocol | Conditional. This is required when the Authentication method is Kerberos. JDBC Subprotocol to be used. The options available are Hive and Impala. Default value: Hive | |
Kerberos properties | Configuration information Required. for the Kerberos authentication. These properties must be configured if you select Kerberos in the Authentication method property. | |
Client Principal | Used to authenticate to Kerberos KDC (Kerberos Key Distribution Center - Network service used by the clients and servers for authentication). Default value: [None] | |
Keytab file | Keytab file(file used to store encryption keys) used to authenticate to Kerberos KDC. Default value: [None] | |
Service principal | Principal used by an instance of a service. Examples:
Default value: [None] |
Account Encryption
Standard Encryption | If you are using Standard Encryption, the High sensitivity settings under Enhanced Encryption are followed. | |
---|---|---|
Enhanced Encryption | If you have the Enhanced Account Encryption feature, the following describes which fields are encrypted for each sensitivity level selected for this account. Account:
|
Additional Configurations
Configuring Hive with Kerberos
Following is the recommended list of JARs to be uploaded for the JDBC4 drivers on Hive with Kerberos:
- hive_metastore.jar
- hive_service.jar
- HiveJDBC4.jar
- libfb303-0.9.0.jar
- libthrift-0.9.0.jar
- TCLIServiceClient.jar
Configuring Hive with SSL
The following URL properties have to be configured:
URL Property Name | URL Property Value |
---|---|
ssl | Required. Binary value to denote that SSL is enabled. This value must always be 1. |
sslTrustStore | Required. The path of the SSL trust store in the SLDB. |
sslTrustStorePassword | Required. Password configured for the SSL trust store. |
AllowSelfSignedCerts | Binary value to denote that server is allowed to use self-signed SSL certificates by the driver. |
CAIssuedCertNamesMismatch | Binary value to denote that the CA issued SSL certificate's name is required by the driver to match the host name of the Hive server. |
The above list is specific to Hive with or without Kerberos enabled. With Kerberos enabled, the properties such as Client Principal, Key tab file, and Service principal have to be additionally provided.
Testing Environment
Hive Version: Hive 1.1.0
Hive with Kerberos works only on Hive JDBC4 driver 2.5.12 and above
Hive with Kerberos is tested on Hadooplex only
- Hive with SSL works only on Hive JDBC4 driver 2.5.12 and above.
Cloudera CDH Version: CDH 5.7.3
4.14 (snapsmrc490)
- Added a new account type: Generic Hive Database Account, this enables connecting to different types of clusters using JDBC URL.
4.8.0
Info tab added to accounts.
Database accounts now invalidate connection pools if account properties are modified and login attempts fail.
4.7.0
Enabled the account with Kerberos authentication.