Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

On this Page

Table of Contents
maxLevel2
excludeOlder Versions|Additional Resources|Related Links|Related Information

Snap type:

Parse


Description:

This Snap reads binary data from its input view, extracts field values based on the field configuration, and writes document data to its output view.


Prerequisites:

[None]


Support and limitations:Ultra pipelines: Works in Ultra PipelinesSpark mode: Not supported in Spark mode.
Account: 

Accounts are not used with this Snap.


Views:


InputThis Snap has exactly one binary input view, where it gets the binary data to be parsed for fixed length columns.
OutputThis Snap has exactly one document output view, where it provides a document data stream. Each document contains the extracted field values. All fields in the document data are of string-type. The Mapper (Data) Snap can be used to transform the data type from string-type to required type.
Error

This Snap has at most one document error view and produces zero or more documents in the view.
Note: An error view is highly recommended to help track down errors in field configuration.


Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Skip lines



Number of lines to skip at beginning of the data

Example: 2

Default value: 0


Line separator


The character used to separate lines in the input.  Leave empty to separate lines using new line character or specify the character used for separating lines.

Default value: \n


Field configuration

Details to be filled for each field that is required from the input.


Column names

Required. Column names to be used as headers for the extracted values.

Example:  

First Name

Last Name

Default value:  [None]


Start position

Required. Starting position of each column to be used while extracting field values.

Example:  1

Default value:  [None]


End position

Required. Ending position of each column to be used while extracting field values.
Example: 25
Default value: [None]


Trim column data

Required. If removal of leading and trailing spaces is required on the extracted data
Default value: False


Ignore Lines

This is a table property allows user to ignore lines in the input document satisfying the provided condition


Function


This is an LOV property having functions to be applied on the data line to be ignored. 

Values:

startsWith: To ignore a line starting with specific value

endsWith: To ignore a line ending with specific value

contains: To ignore a line containing specific value

regex: To ignore a line with data in provided regular expression format

Default value: startsWith


Value


The value to be used for the function. If this value is empty then that property is ignored.

Format:  String


Multiexcerpt include macro
nameSnap Execution
pageAnaplan Read

Multiexcerpt include macro
nameSnap_Execution_Introduced
pageAnaplan Read

Examples


Fixing a Pipeline Containing Incorrect Field Configurations

In this sample pipeline, information is brought in through a Constant Snap, then sent to the Fixed Width Parser.

In the Constant Snap, supply the following information in the Content field:


81888888888800002ST06/20/2014JOHN SMITH                    05-24-2012A
82777777777700003ST06/20/2014MARY SMITH                    05-24-2012A
81888888888800002ST06/20/2014JOHN SMITH                    05-24-2012A
82777777777700003ST06/20/2014MARY SMITH                    05-24-2012A
81888888888800002ST06/20/2014JOHN SMITH                    05-24-2012A
82777777777700003ST06/20/2014MARY SMITH                    05-24-2012A
81888888888800002ST06/20/2014JOHN SMITH                    05-24-2012A
82777777777700003ST06/20/2014MARY SMITH                    05-24-2012A
81888888888800002ST06/20/2014JOHN SMITH                    05-24-2012A
82777777777700003ST06/20/2014MARY SMITH                    05-24-2012A
FIXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX00000010

For reference, most of the rows are 70 characters long. Note that the last row of data intentionally does not follow the same format. 

In the Fixed Width Parser add the following Field configurations, containing incorrect field positions to demonstrate error handling.

Column NamesStart positionEnd PositionTrim Column Data
COLA112
COLB1320
DATECOL2130
NAME3160Selected
DATE26170
A7172

When you save the pipeline, you will see the error: Failure: The input data format is not supported.

To help determine the error, set the error view on the Fixed Width Parser to route error data to error view. Now when you save the pipeline, data is written to the error view. If you look at the schema preview, you'll see the following.

   

Note that:

  • DATECOL contains the first character of NAME
  • NAME contains the first character of DATE2
  • DATE2 contains the character for A, which does not exist as a column.
  • The last row of data only fills the first three columns (expected because its data format).

Now, update the Field configurations as follows:

  • Set DATECOL end at 29.
  • Set NAME to start at 30 and end at 59.
  • Set DATE2 to start at 60 and end at 69.
  • Set A to start and end at 70.

Now when you save the pipeline, data preview is available at both the output view and error view.

The output view contains the first 10 rows of data correctly formatted.

 

The error view now conatins only the last row of data that does not match the Field configuration settings.

 

Using the regex Function

To use regex to ignore lines with the digits "17" in the 16 & 17th positions in the lines, select regex in the function, and apply a value of: 

\w{15}17.*  (including the starting \ and ending *)

To Ignore all those except those with 17 in that position, we use:

^\w{15}17.*

To ignore all which had 17 or 99:

\w{15}(17|99).*

Excerpt Include
Transform Snap Pack
Transform Snap Pack
nopaneltrue