Excel Parser


Snap type:

Parse

Description:

This Snap reads binary data in XLS and XLSX formats from the specified sheet and provides the output document data. 

This Snap supports XLS version 97 and above.

Prerequisites

None

Support and limitations:
  • Works in Ultra Tasks.
  • Does not read password-protected Excel files and fails with the error: "Failure: Failed to parse input data."
  • Does not support Excel 5.0/95 format XLS files and fails with the error: "Input was neither an XLS stream nor an XLSX stream."
  • Does not support files greater than 4GB or files that have a high compression ratio (you may see an error that says "Zip bomb detected!"). A high compression ratio is likely to happen when workbooks contain formatted cells with no values or excessive unused custom styles. Consider removing empty cells or removing unused custom styles and saving the workbook again. If you continue to have issues, try saving the Excel worksheet as a CSV file.
Account: 

Excel Parser does not use accounts.

Views:
InputThis Snap has exactly one binary input view, where it gets the Excel binary data to be parsed.
OutputThis Snap has exactly one document output view, where it provides the Excel document data stream.
Error

This Snap has at most one document error view and produces zero or more documents in the view. If the Snap fails during the operation, the error view receives an error document containing the fields error, reason, original, resolution, and stacktrace:

{ error: "Input was neither a XLS stream, nor a XLSX stream" reason: "Input was neither a XLS stream, nora a XLSX stream" original: {...} 
resolution: "error while reading input data"
stacktrace: "com.Snaplogic.api.ExecutionException:  ...
}

Settings

Label

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Sheet index

Required. Index (0 to n) or name of the sheet which must be parsed.

  • It can use Pipeline Parameters or it can be hardcoded. It cannot use fields from the upstream Snap.
  • If the Sheet name property is specified, this value is overwritten.
  • If the name of the sheet is a number (for example, 2018), use the Sheet name field. 

Example: 0 or Sheet1
Default value: 0

Sheet name

The name of the sheet that must be parsed.

  • It can use Pipeline Parameters or it can be hardcoded. It cannot use fields from the upstream Snap.
  • It overwrites the Sheet index field.

Example: PriceList

Default value: N/A

Start row

Required. Specifies the row number to start reading from the spreadsheet.

This cannot be set to 1 if the Contains headers property is selected.

Example: 1
Default value: N/A

End row

Required. The row number to stop reading in the spreadsheet.
Example: 100 
Default value: N/A

Contains headers

Required. This option specifies whether the input data contains headers. If this option is not selected, the Snap uses the Column names property for the header information when parsing Excel data.

Default value: Unselected

Header row

Specifies the row number in which headers are located in the input data. This property is ignored if Contains headers is false.

If Contains headers is true and Start row is smaller than or equals to Header row, the Start row is reset to Header row+1.

If Header row is empty, it is regarded as a default value of 1.

Default value: 1

Evaluate formulas

When selected, the Snap checks if there are cell formulas to be evaluated and displays the results instead of the raw formulas.

Default value: Unselected

If an Excel file generated by the Excel Formatter Snap is passed to the Excel Parser Snap where formula strings have been written as formulas, the Snap will not display the result of the formula even if the checkbox is selected. For such files, open the file in any spreadsheet application (such as Microsoft Excel or LibreOffice Calc), select the option to execute the full workbook, save the file, and then use this saved file in the Excel Parser Snap.

Round dates

Select the check box to round numeric Excel date values to the closest second. You can enable the option when you have numeric excel date values that have milliseconds values and want to round off the milliseconds to the closest second.

For example, 1 Jan 2020 14:29:29.600 is rounded to 1 Jan 2020 14:29:30. 1 Jan 2020 14:29:59.700 is rounded to 1 Jan 2020 14:30:00.

Default value: Not selected

This Snap supports Internal Excel date format or date formatting characters, for example, the default date format for US dates is “m/d/yyyy” (1/27/2016). It does not support the date format with timezone such as UTC at the end, for example, 2020-09-15 00:00:00 UTC.

Custom Locale

Select a user-defined locale that the Snap should use to format numbers, instead of the default locale.

For example, if the default locale that is configured in the Snaplex uses “,” (comma) to separate the integer and fractional parts, you can select any of the user-defined locale that uses. (dot) to format numbers.

If you leave this field empty, then the default locale that is configured on the Snaplex is used. Click the Suggestion icon to view all the available locales on the Snaplex.

Default Value: None
Example: English (Jamaica)

Cell formatting

Lets you determine whether to format numeral outputs. Valid options are:

  • Formatted: Determines the cell format used for numeric types. For example, Snap converts real numbers to two decimal places when formatted for currency.
  • Unformatted: Supports the unformatted output of numerals.

Default value: Formatted

Column names 

Conditional. Specify a header to map the columns in the Excel sheet. If the Contains header property is not selected and column names are not set, the column names are taken from the Excel data, such as A and B.

Example

  • Item 
  • Name 
  • Description
  • Price 

Default value: N/A

Insert null columns

RequiredThis option specifies whether to insert nulls when rows have missing column data. 

Default value: Unselected

Number of blank headers after the last header

Specify the number of additional headers that must be added.

The Insert null columns property must be selected for this property to work.

Default value: 0

Snap Execution

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default Value: Execute only
Example: Validate & Execute

Temporary Files

During execution, data processing on Snaplex nodes occurs principally in-memory as streaming and is unencrypted. When larger datasets are processed that exceeds the available compute memory, the Snap writes Pipeline data to local storage as unencrypted to optimize the performance. These temporary files are deleted when the Snap/Pipeline execution completes. You can configure the temporary data's location in the Global properties table of the Snaplex's node properties, which can also help avoid Pipeline errors due to the unavailability of space. For more information, see Temporary Folder in Configuration Options

See Also


 Click to view/expand
ReleaseSnap Pack VersionDateType Updates
November 2024439patches29078 Latest

Fixed an issue with the CSV Parser Snap that introduced unexpected characters into the records and output data because of incorrect handling of the delimiter.

November 2024main29029 StableUpdated and certified against the current SnapLogic Platform release.
August 2024438patches28073 Latest

Fixed an issue with the JSON Generator and XML Generator Snaps that caused unexpected output displaying '__at__' and '__h__' instead of '@' and '-' respectively because the Snap could not update them to their original values after the Velocity library upgrade.

August 2024438patches27959 Latest

Fixed an issue with the Sort where the Snap could not sort files larger than 52 MB. This fix applies to Join Snap also.

August 2024main27765 StableUpgraded the org.json.json library from v20090211 to v20240303, which is fully backward compatible.
May 2024437patches26643 Latest
  • Fixed an issue with the Sort Snap that displayed an error when estimating the size of the input document provided by the upstream S3 Browser Snap.
  • Fixed an issue with the Parquet Formatter Snap that was unable to route errors to the error view.
May 2024437patches26453 Latest
  • Added expression support to the Skip lines field in the CSV Parser Snap to enable passing pipeline parameters and upstream values. 

  • Fixed an issue with the XML Parser Snap that caused an error when using the Splitter option in the Snap settings. 

May 2024main26341 Stable
  • Added Parquet Parser and Parquet Formatter Snaps to the Transform Snap Pack:
    • Parquet Parser: Reads the binary Parquet data and writes document data to the output.
    • Parquet Formatter: Reads the document data and writes it to the output in binary Parquet format.
  • Enhanced the JSON Splitter Snap to capture metadata and lineage information from the input document.

February 2024436patches25564 Latest

Fixed an issue with the JSON Formatter Snap that generated incorrect schema.

February 2024436patches25292 Latest

Fixed an out-of-memory error issue with the Aggregate Snap. This Snap no longer performs the presort for the input documents.

If the input documents are unsorted and GROUP-BY fields are used, you must use the Sort Snap upstream of the Aggregate Snap to presort the input document stream and set the Sorted stream field Ascending or Descending to prevent the out-of-memory error. However, if the total size of input documents is expected to be relatively small compared to the available memory, then Sort Snap is not required upstream.

Learn more about presorting unsorted input documents to be processed by the Aggregate Snap.

February 2024main25112 StableUpdated and certified against the current SnapLogic Platform release.
November 2023435patches24802 LatestFixed an issue with the Excel Parser Snap that caused a null pointer exception when the input data was an Excel file that did not contain a StylesTable.
November 2023435patches24481 Latest

Fixed an issue with the Aggregate Snap where the Snap was unable to produce the desired number of output documents when the input was unsorted and the GROUP-BY fields field set was used.

November 2023435patches24094 Latest

Fixed a deserialization issue for a unique function in the Aggregate Snap.

November 2023main23721 StableUpdated and certified against the current SnapLogic Platform release.
August 2023434patches23076 LatestFixed an issue with the Binary to Document Snap where an empty input document with Ignore Empty Stream selected caused the Snap to stop executing.
August 2023434patches23034 Latest
  • Fixed an issue with the Transform Snap Pack that caused an error when the input file was a binary JSON file that contained a string value of more than 20,000,000 characters.
  • Fixed a memory issue with the Aggregate Snap that occurred when using GROUP-BY fields.

August 2023434patches22705 Latest

Fixed an issue with the JSON Splitter Snap that caused the pipeline to terminate with excessive memory usage on the Snaplex node after the 4.33 GA upgrade. The Snap now consumes less memory.

August 2023main22460 StableUpdated and certified against the current SnapLogic Platform release.
May 2023433patches22431 Latest
  • Fixed an issue with the Excel Multi Sheet Formatter Snap that caused it to produce binary output data when there was no input document and Ignore empty stream was selected.
  • Introduced the following new Snaps:
    • GeoJSON Parser: Parses geospatial data from binary data input and outputs the contents as a GeoJSON document downstream.

    • WKT Parser: Parses geospatial data from binary data input and outputs the contents as a WKT (Well Known Text) document downstream.

May 2023433patches21779 Latest

The Decrypt Field and Encrypt Field Snaps now support CTR (Counter mode) for the AES (Advanced Encryption Standard) block cipher algorithm.

May 2023433patches21586 Latest

The Decrypt Field Snap now supports the decryption of various encrypted fields on providing a valid decryption key.

May 2023433patches21461 Latest

The following