Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

...

Snap type:

Write

 

Description:

This Snap converts documents into the ORC format and writes the data to HDFS, S3, or the local file system.

  • Expected upstream Snaps: Any Snap with a document output view.
  • Expected downstream Snaps: [None]
  • Expected input: A document.
  • Expected output: [None]
 
Note

This Snap supports both HDFS (non-Kerberos) and ABFS (Azure Data Lake Storage Gen 2 ), WASB(Azure storage), and S3 protocols.


Prerequisites:

[None]

 

Support and limitations:
Ultra pipelines: Supported for use  
  • Spark: Not supported in Spark pipelines.
  •  
    Note

    All expression Snap properties (when '=' button is pressed) can be evaluated from pipeline parameters only, not from input documents. Input documents are data to be formatted and written to the target files.


    Account: 

    Accounts are not used with this Snap.

    The ORC Writer works with the following accounts:

    Views:


    InputThis Snap has exactly one document input view. 
    Output

    This Snap has at most one document output view.

    ErrorThis Snap has at most one document error view and produces zero or more documents in the view.

     


    Settings

    Label

     

    Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

    Default Value: ORC Writer

    Directory

     



    Required.

    The URL for HDFS directory. It should start with hdfs or webhdfs file protocol in the form of:

     The path to a directory from which you want the ORC Reader Snap to read data. All files within the directory must be ORC formatted.

    Basic directory URI structure

    • HDFS: hdfs://<hostname>:<port>
    /<path to directory>/
  • webhdfs://<hostname>:<port>/<path to directory>/
  •  

    • /
    • S3: s3:///<S3 bucket name>/<file-path>
    • ABFS(S): abfs(s):///filesystem/<path>/
    • ABFS(S): abfs(s)://filesystem@accountname.endpoint/<path>

    The Directory property is not used in the pipeline execution or preview, and used only in the Suggest operation. When you press the Suggest icon,

    it will display Default value:  hdfs

    the Snap displays a list of subdirectories under the given directory. It generates the list by applying the value of the Filter property.

    Example

    • hdfs://ec2-54-198-212-134.compute-1.amazonaws.com:8020/user/john/input/

    • webhdfs://cdh-qa-2.fullsail.Snaplogic.com:50070/user/ec2-user/csv/
  • $dirname 

    • s3://
    <hostname>:<port>/

     

    Filter

    The glob pattern is used to display a list of directories or files when the Suggest icon is pressed in the Directory or File property. The complete glob pattern is formed by combining the value of the Directory property and the Filter property. If the value of the Directory property does not end with "/", the Snap appends one so that the value of the Filter property is applied to the directory specified by the Directory property.

    The following rules are used to interpret glob patterns:

    • The * character matches zero or more characters of a name component without crossing directory boundaries. For example, *.csv matches a path that represents a filename ending in .csv and *.* matches file names containing a dot.

    • The ** characters matches zero or more characters crossing directory boundaries, therefore it matches all files or directories in the current directory as well as in all subdirectories. For example, /home/** matches all files and directories in the /home/ directory.

    • The ? character matches exactly one character of a name component. For example, foo.? matches file names starting with foo. and a single character extension.

    • The backslash character (\) is used to escape characters that would otherwise be interpreted as special characters. The expression \\ matches a single backslash and "\{" matches a left brace for example.

    • The [ ] characters are a bracket expression that match a single character of a name component out of a set of characters. For example, [abc] matches "a", "b", or "c". The hyphen (-) may be used to specify a range so [a-z] specifies a range that matches from "a" to "z" (inclusive). These forms can be mixed so [abce-g] matches "a", "b", "c", "e", "f" or "g". If the character after the [ is a ! then it is used for negation so [!a-c] matches any character except "a", "b", or "c".

      Within a bracket expression the *, ? and \ characters match themselves. The (-) character matches itself if it is the first character within the brackets, or the first character after the ! if negating.

    • The { } characters are a group of subpatterns, where the group matches if any subpattern in the group matches. The "," character is used to separate the subpatterns. Groups cannot be nested. For example, *.{csv, json} matches file names ending with .csv or .json

    • Leading dot characters in file name are treated as regular characters in match operations. For example, the "*" glob pattern matches file name ".login". 

    • All other characters match themselves.

     

    File

     

     
    • test-s3-drea/8867_output.json
    • _dirname 

    • file:///home/snaplogic/file.orc
    • abfs:///filesystem2/dir1
    • abfs://filesystem2@snaplogicaccount.dfs.core.windows.net/dir1

    Default valuehdfs://<hostname>:<port>/


    Filter


    Insert excerpt
    HDFS Writer
    HDFS Writer
    nopaneltrue

    File




    Required for standard mode. Filename or a relative path to a file under the directory given in the Directory property. It should not start with a URL separator "/". The File property can be a JavaScript expression which will be evaluated with values from the input view document. When you press the Suggest icon, it will display a list of regular files under the directory in the Directory property. It generates the list by applying the value of the Filter property.

    Use Hive tables if your input documents contains complex data types, such as maps and arrays.

    Example: 

    • sample.
    csv
    • orc
    • tmp/another.
    csv
    • orc
    $filename
    • _filename

    Default value:  [None]

     

    File action


    Required. Select an action to take when the specified file already exists in the directory. Please note the Append file action is supported for SFTP, FTP, and FTPS protocols only.

    Default value: [None]

    File permissions for various users

    Set the user and desired permissions.

    Default value: [None]

     

    Hive Metastore URL

     


    This setting is used to assist in setting the schema along with the database and table setting.  If the data being written has a Hive schema, then the Snap can be configured to read the schema instead of manually entering it.

     Set

    Set the value to a Hive Metastore

    url

    URL where the schema is defined.

     

    Default value: [None]

     

    Database

    The Hive Metastore database where the schema is defined. See the Hive Metastore URL setting for more information.

     

    Table

    The table
    to read
    from which the schema
    from
    in the Hive Metastore's
    databse
    database must be read. See the Hive Metastore URL setting for more information.

    Compression

    Required.
    What
    The compression type
    of compression
    to
    use
    be used when writing the file. 

    Column paths


    Paths where the column values appear in the document.

    Example

    This property is required if the Hive Metastore URL property is empty.

    Examples:

    • Column Name: Fun
    • Column Path: $column_from_input_data
    • Column Type: string

    Default value: [None]

     

    Execute during preview

    Enables you to execute the Snap during the Save operation so that the output view can produce the preview data.

    Default value:  Not selected

     

     

    Troubleshooting

    • Use Hive tables if your input documents contains complex data types such as maps and arrays.
    • The Snap can only write data into HDFS.
    • When executed in SnapReduce mode, the value of the File setting specifies the output directory of the MapReduce job.

    Multiexcerpt include macro
    nameSnap Execution
    pageSOAP Execute

    Multiexcerpt include macro
    nameExecution_Detail_Write
    pageSOAP Execute

    Troubleshooting

    Insert excerpt
    Hadoop Directory Browser
    Hadoop Directory Browser
    nopaneltrue

    Multiexcerpt include macro
    nameTemporary Files
    pageJoin

    Examples

    ...

    Expand
    titleORC Writer Writing to an HDFS Instance

    ORC Writer Writing to an HDFS Instance

    Here is an example of a ORC Writer configured to write to a local instance of HDFS. The output is written to /tmp/orc-output. The Hive Metastore used reads the schema from the employee_orc table from the masterdb database. No column paths or compression are used. For an example of the Schema, see the documentation on the Schema setting.

    Image Added



    Expand
    titleORC Writer Writing to an S3 Instance

    ORC Writer Writing to an S3 Instance

    Here is an example of a ORC Writer configured to write to a local instance of

    ...

    S3. The output is written to /tmp/orc-output. The Hive Metastore used reads the schema from the employee_orc table from the masterdb database. No column paths or compression are used. For an example of the Schema, see the documentation on the Schema setting.

    ...

    Image Added

    ...


    ...

    See Also

    Related Information

    Read more about ORC at the Apache project's website, https://orc.apache.org/

    ...

    titleSnap History

    ...

    Insert excerpt
    Hadoop Snap Pack
    Hadoop Snap Pack
    nopaneltrue