On this Page

Table of Contents

maxLevel	2
exclude	Older Versions\|Additional Resources\|Related Links\|Related Information

When executed in SnapReduce mode, the value of the File setting specifies the output directory of the MapReduce job

.

Example:

sample.orc
tmp/another.orc
_filename

Default value: [None]

Snap type:

Write

Description:

This Snap converts documents into the ORC format and writes the data to HDFS, S3, or the local file system.

Expected upstream Snaps: Any Snap with a document output view.
Expected downstream Snaps: [None]
Expected input: A document.
Expected output: [None]

Note
This Snap supports both HDFS (non-Kerberos) and ABFS (Azure Data Lake Storage Gen 2 ), WASB(Azure storage), and S3 protocols.

Prerequisites:

[None]

Support and limitations:

Ultra pipelines: Works in Ultra Pipelines.Spark: Not supported in Spark pipelinesTasks.

note

Note

All expression Snap properties (when '=' button is pressed) can be evaluated from pipeline parameters only, not from input documents. Input documents are data to be formatted and written to the target files.

Plex process creation must be enabled if the ORC Writer Snap fails with the following error:
Cannot execute command (….) as process creation is blocked by the SnapLogic security policy.

To enable process creation on the Plex:

Get the snode_id from the plex, which should be under /api/1/rest/asset/snaplogic.
Example: https://elastic.snaplogic.com//api/1/rest/asset/snaplogic)
Use the following command for this:
curl -u admin@snaplogic.com https://elastic.snaplogic.com//api/1/rest/asset/snaplogic

Replace the string ‘snaplogic’ at the end of the command with the name of your Org.

Trigger the curl command below:
curl -u admin@snaplogic.com -H "Content-Type: application/json" --data-binary '{"flag_overrides": {"com.snaplogic.common.SnapLogicSecurityManager.ALLOW_CLOUDPLEX_PROCESS_CREATION" : "true" }}' https://elastic.snaplogic.com/api/1/rest/admin/snappack/org-dist/SNODE_ID
Replace SNODE_ID with the one you found in Step 1, above.

If needed, replace admin@snaplogic.com with the required username.

Account:

The ORC Writer works with the following accounts:

Views:

Input	This Snap has exactly one document input view.
Output	This Snap has at most one document output view.
Error	This Snap has at most one document error view and produces zero or more documents in the view.

Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Default Value: ORC Writer

Directory

Required. The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

Basic directory URI structure

HDFS: hdfs://<hostname>:<port>/
S3: s3:///<S3 bucket name>/<file-path>
ABFS(S): abfs(s):///filesystem/<path>/
ABFS(S): abfs(s)://filesystem@accountname.endpoint/<path>

The Directory property is not used in the pipeline execution or preview, and used only in the Suggest operation. When you press the Suggest icon, the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

Example:

hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/
webhdfs://cdh-qa-2.fullsail.Snaplogic.com:50070/user/ec2-user/csv/
s3://test-s3-drea/8867_output.json
_dirname
file:///home/snaplogic/file.orc
abfs:///filesystem2/dir1
abfs://filesystem2@snaplogicaccount.dfs.core.windows.net/dir1

Default value: hdfs://<hostname>:<port>/

Filter

Insert excerpt

	HDFS Writer
	HDFS Writer
nopanel	true

File

Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

Use Hive tables if your input documents contains complex data types, such as maps and arrays.

Note

File action

Required. Select an action to take when the specified file already exists in the directory. Please note the Append file action is supported for SFTP, FTP, and FTPS protocols only.

Default value: [None]

File permissions for various users

Set the user and desired permissions.

Default value: [None]

Hive Metastore URL

This setting is used to assist in setting the schema along with the database and table setting. If the data being written has a Hive schema, then the Snap can be configured to read the schema instead of manually entering it. Set the value to a Hive Metastore URL where the schema is defined.

Default value: [None]

Database

The Hive Metastore database where the schema is defined. See the Hive Metastore URL setting for more information.

Table

The table from which the schema in the Hive Metastore's database must be read. See the Hive Metastore URL setting for more information.

Compression

Required. The compression type to be used when writing the file.

Column paths

Paths where the column values appear in the document. This property is required if the Hive Metastore URL property is empty.

Examples:

Column Name: Fun
Column Path: $column_from_input_data
Column Type: string

Default value: [None]

Multiexcerpt include macro

name	Snap Execution
page	SOAP Execute

Multiexcerpt include macro

name	Execution_Detail_Write
page	SOAP Execute

...

Insert excerpt

	Hadoop Directory Browser
	Hadoop Directory Browser
nopanel	true

Multiexcerpt include macro

name	Temporary Files
page	Join

Examples

...

Expand

title	ORC Writer Writing to an HDFS Instance

ORC Writer Writing to an HDFS Instance

Here is an example of a ORC Writer configured to write to a local instance of HDFS. The output is written to /tmp/orc-output. The Hive Metastore used reads the schema from the employee_orc table from the masterdb database. No column paths or compression are used. For an example of the Schema, see the documentation on the Schema setting.

...

Expand

title	ORC Writer Writing to an S3 Instance

ORC Writer Writing to an S3 Instance

Here is an example of a ORC Writer configured to write to a local instance of S3. The output is written to /tmp/orc-output. The Hive Metastore used reads the schema from the employee_orc table from the masterdb database. No column paths or compression are used. For an example of the Schema, see the documentation on the Schema setting.

...

Versions Compared

Old Version 25

New Version Current

Key

Settings

Examples

ORC Writer Writing to an HDFS Instance

ORC Writer Writing to an S3 Instance

See Also

Page Comparison

Versions Compared

Old Version 25

New Version Current

Key

Settings

Examples

ORC Writer Writing to an HDFS Instance

ORC Writer Writing to an S3 Instance

See Also