Teradata Export to HDFS

On this page

Snap type:

Write


Description:

This Snap exports data from Teradata and directly loads it into Hadoop (HDFS). 

This Snap does not support kerberized Hadoop environments.

You can drop your database with it, so be careful.


Valid JSON paths that are defined in the where clause for queries/statements will be substituted with values from an incoming document. Documents will be written to the error view if the document is missing a value to be substituted into the query/statement.

If a select query is executed, the query's results are merged into the incoming document and any existing keys will have their values overwritten. On the other hand, the original document is written if there are no results from the query.


  • Expected upstream Snaps: Any Snap that provides a document output view, such as Structure or a JSON Generator Snap.
  • Expected downstream Snaps: A Snap monitoring the exit code to check result status.
  • Expected input: None.
  • Expected output: A single document containing the console output and return status from the external Teradata application for each input document


Prerequisites:

Teradata Connector for Hadoop (v1.5.1). See Account for information on the necessary jar file.


Support and limitations:Works in Ultra Pipelines.
Account: 

This Snap uses account references created on the Accounts page of SnapLogic Manager to handle access to this endpoint. One additional jar must be added to the JDBC Driver jars on the Teradata account page - the teradata-connector jar from the Teradata Connector to Hadoop (TDCH) package. You can find it by installing the package on one system and looking in the usr/lib/tdch/1.5/lib directory. It would be safest to also use the terajdbc4.jar and tdgssconfig.jars in the same directory unless you have a specific need to use a different version of the jars. See Configuring Teradata Database Accounts for information on setting up this type of account.


Views:
Input

This Snap allows zero or one input views. If the input view is defined, then the where clause can substitute incoming values for a given expression

Output

This Snap has exactly one output view and produces documents in the view. The output fields of a single view are:

    • OUT - The console output from the sub process

    • OUTPUT SUMMARY - The count of the input, output and skipped records

    • err - The console error output from the sub process

    • TERADATA STATUS - The exit code of the sub process

    • CLASSPATH- The classpath used by the sub process. This identifies the location of all jar files.

    • ENVIRONMENT - The full environment variables seen by the sub process. This identifies the location of the Hadoop configuration files.

Error

This Snap has at most one error view and produces zero or more Document(s) in the view.
Only internal exceptions will be written to the error view. Subprocess failures (with a non-zero exit code) are written to the output view in order for the user to see the full console output an environment. It is important to check the status field in the output view. 

 

Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.


SQL Statement



Required. SQL statement to execute on the Teradata server. Document value substitution will be performed on literals starting with $ (such as $people.name will be substituted with its value in the incoming document).

The Snap does not allow to inject SQL, such as select * from people where $columName = 'abc'. 
Only values can be substituted since we use prepared statements for execution which result e.g. in select * from people where address = ?

Example: select * from people LIMIT 10 or select * from people where name = $people.name

Default value: [None]

Number of retries

Specify the maximum number of reconnection attempts that the Snap must perform, in case of connection failure or timeout.

Default Value: 0

Retry interval (seconds)


Enter in seconds the duration for which the Snap must wait between two reconnection attempts, until the number of retries is reached.

Default Value: 1

HDFS destination  

 

Directory

Required. The HDFS directory where the output files will be written. This directory must not already exist. 

Example: people

Default value: hdfs://<hostname>:<port>/   

TDCH conversion properties


Separator


Required. Field separator in text file output.  The available options are Comma, Tab, Pipe (I).

Default value: Comma 

Number of mappers


The number of mappers to use to export table data from Teradata. The degree of parallelism for these TDCH jobs is defined by the number of mappers (a Snap configuration) used by the MapReduce job. The number of mappers also defines the number of files created in HDFS location.

More mappers leads to faster execution, however, the number of mappers is limited by the number of nodes in the cluster and the available bandwidth.

Default value: 2 

Snap execution

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.
  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.
  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Example


In this example pipeline, the Teradata Export to HDFS Snap executes a SQL query and publishes the results to an HDFS directory.

   

The Teradata Export to HDFS Snap, selects the documents from the table ADW_SNAPL"."Channel" and publishes it to the HDFS destination directory path.



Successful execution of the pipeline displays the below output:  

 
 See Also

Snap Pack History

 Click to view/expand
Release Snap Pack VersionDateType  Updates

August 2024

main27765

 

Stable

Upgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.

May 2024437patches26471 LatestThe jOOQ library for Teradata Snap Pack is upgraded from v3.9.1 to v3.17.3. This upgrade will be part of the GA release on August 14, 2024 (Stable release). Pipelines using the Teradata Snaps are not impacted after the jOOQ upgrade.

May 2024

main26341

 


Stable

Updated and certified against the current SnapLogic Platform release.

February 2024

main25112

 

Stable

Updated and certified against the current SnapLogic Platform release.

November 2023main23721 StableUpdated and certified against the current SnapLogic Platform release.

August 2023

main22460

 


Stable

The Teradata Execute Snap now includes a new Query type field. When Auto is selected, the Snap tries to determine the query type automatically.

May 2023

main21015 

Stable

Upgraded with the latest SnapLogic Platform release.

February 2023main19844 StableUpgraded with the latest SnapLogic Platform release.
November 2022main18944 Stable

The TPT Insert Snap now creates the target table only from the table metadata of the second input view when the following conditions are met:

  • The Create table if not present checkbox is selected.

  • The target table does not exist.

  • The table metadata is provided in the second input view.

August 2022main17386 StableUpgraded with the latest SnapLogic Platform release.
4.29 Patch429patches16235 Latest

Fixed an issue with Teradata Execute Snap where the Snap did not display valid error message when the delete condition is invalid.

4.29main15993 Stable

Enhanced the Teradata FastExport Snap with Character Set dropdown list to support encoding of data when you export data from the Teradata Database.

4.28main14627 StableUpgraded with the latest SnapLogic Platform release.
4.27main12833 Stable

Enhanced the Teradata Execute Snap to invoke stored procedures.

4.26main11181 StableUpgraded with the latest SnapLogic Platform release.
4.25 Patch425patches11008 Latest

Improved the error messages for all the Snaps in the Teradata Snap Pack where the Snaps fail with a null pointer exception error when the account reference provided is invalid. 

4.25main9554
 
StableUpgraded with the latest SnapLogic Platform release.

4.24 Patch

424patches8799 Latest
  • Enhanced the Teradata Execute Snap by adding a new field, Advanced options, which extends support with microsecond precision for TIMESTAMP data type.
  • Fixed an issue in the TPT Load Snap where the Snap failed to load data into the table while creating an output.
  • Fixed an issue with the TPT Load Snap wherein now a null value is loaded as null and empty string is loaded as empty string.

    • This fix may cause existing pipelines to fail if empty string values are expected to be loaded as null.

    • Following are the new configurations:

      • VARCHAR QuotedData= Optional

      • VARCHAR OpenQuoteMark= \”

      • VARCHAR NullColumns= Yes

4.24main8556
Stable

Enhanced the Teradata Select Snap to return only the selected output fields or columns in the output schema (second output view) using the Fetch Output Fields In Schema check box. If the Output Fields field is empty all the columns are visible.

4.23main7430
 
Stable

Fixes the multi-line value issue and issue in the TPT Load Snap where the Snap writes null for both an empty string and null values in the input data. The fix for this issue was to add the following three lines in the script and wrap all the values in double quotes unless they are null while writing the input data into a temp CSV file. 
VARCHAR QuotedData = 'Optional',
VARCHAR OpenQuoteMark = '\"'
VARCHAR NullColumns = 'Yes'

This fix may cause existing pipelines to fail if empty string values are expected to be loaded as null.

4.22main6403
 
StableUpgraded with the latest SnapLogic Platform release.

4.21 Patch

421patches6272 Latest

Fixed the issue where Snowflake SCD2 Snap generates two output documents despite no changes to Cause-historization fields with DATE, TIME and TIMESTAMP Snowflake data types, and with Ignore unchanged rows field selected.

4.21 Patch

421patches6144 Latest

Fixed the following issues with DB Snaps:

  • The connection thread waits indefinitely causing the subsequent connection requests to become unresponsive.
  • Connection leaks occur during Pipeline execution.

4.21 Patch

MULTIPLE8841 Latest

Fixed the connection issue in Database Snaps by detecting and closing open connections after the Snap execution ends. 

4.21snapsmrc542

 

StableUpgraded with the latest SnapLogic Platform release.
4.20snapsmrc535
 
StableUpgraded with the latest SnapLogic Platform release.

4.19 Patch 

db/teradata8414 LatestFixed an issue with the TPT Update Snap wherein the Snap is unable to perform operations when:
  • An expression is used in the Update condition property.
  • Input data contain the character '?'.
4.19snaprsmrc528
 
StableUpgraded with the latest SnapLogic Platform release.
4.18snapsmrc523
 
StableUpgraded with the latest SnapLogic Platform release.
4.17ALL7402
 
Latest

Pushed automatic rebuild of the latest version of each Snap Pack to SnapLogic UAT and Elastic servers.

4.17snapsmrc515
 
Latest
  • Fixed an issue with the Teradata Execute Snap wherein the Snap would send the input document to the output view even if the Pass through field is not selected in the Snap configuration. With this fix, the Snap sends the input document to the output view, under the key original, only if you select the Pass through field.
  • Added the Snap Execution field to all Standard-mode Snaps. In some Snaps, this field replaces the existing Execute during preview check box.
4.16snapsmrc508
 
StableUpgraded with the latest SnapLogic Platform release.

4.15 Patch 

db/teradata6338 Latest

Replaced Max idle time and Idle connection test period properties with Max life time and Idle Timeout properties respectively, in the Account configuration. The new properties fix the connection release issues that were occurring due to default/restricted DB Account settings.

4.15snapsmrc500
 
StableUpgraded with the latest SnapLogic Platform release.
4.14snapsmrc490
 
StableUpgraded with the latest SnapLogic Platform release.
4.13

snapsmrc486

 
StableUpgraded with the latest SnapLogic Platform release.
4.12

snapsmrc480

 
StableUpgraded with the latest SnapLogic Platform release.
4.11snapsmrc465
 
StableUpgraded with the latest SnapLogic Platform release.
4.10

snapsmrc414

 
Stable

Added Auto commit property to the Select and Execute Snaps at the Snap level to support overriding of the Auto commit property at the Account level.

4.9.0 Patch

teradata3077 Latest

Fixed an issue regarding connection not closed after login failure; Expose autocommit for "Select into" statement in PostgreSQL Execute Snap and Redshift Execute Snap

4.9snapsmrc405
 
Stable
  • Enhanced the Output view of the Snap with Order Summary field that displays the output values (added to the out, err, status, classpath and env fields of the single output view that display the subprocess only).
  • Teradata Export to HDFS Snap supported with Dynamic account.
4.8

snapsmrc398

 
Stable
  • Introduced the TPT Delete, Insert, Load, Upsert and Update Snaps in this release.
  • Introduced Teradata Export to HDFS Snap in this release.
  • Info tab added to accounts.
  • Database accounts now invalidate connection pools if account properties are modified and login attempts fail.
4.7

snapsmrc382

 
Stable
  • Introduced the Teradata FastLoad and Execute Snaps in this release.
  • Migration impact: In Teradata FastExport, the values of the Data Format field have been standardized to be in all caps. Existing pipelines that use the values of Binary, Text, or Unformatted will fail unless the new value of BINARY, TEXT, or UNFORMAT are used.
4.6snapsmrc362
 
Stable

Snap Pack introduced in 4.6.0. This includes only Teradata extract functionality to move data out of Teradata database using the FastExport Utility. It does not include Snaps for load, select, insert, delete, execute or others at this time. It also does not utilize the Teradata Parallel Transporter to extract data.