XML Parser

XML Parser

On this Page


Overview

You can use this Snap to parse the incoming XML data into SnapLogic document objects. The supported schema language is: W3C XML Schema 1.0

Snap Type

The XML Parser Snap is a Parser-type Snap.

Support for Ultra Pipelines

Works in Ultra Tasks.

Limitations

The XML Parser Snap does not support mixed content, such as the following XML data, because it may contain attributes, elements, and text.

Workaround: Use the XSLT Snap to remove the mixed content from the XML input before passing it on to the XML Parser Snap. Learn more.

Snap Views

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Type

Format

Number of Views

Examples of Upstream and Downstream Snaps

Description

Input 

  • Binary

 

  • Min: 1

  • Max: 1

  • File Reader

  • XSLT

The input must be:

  • XML-formatted data in binary form

  • UTF-8 encoded data (if non-UTF-8 encoded data is passed, it may result in errors)

The input must be properly structured XML data without mixed content elements for the XML Parser to process it correctly.

Example of Valid Input

<letter> <name>John Smith</name> <orderid>1032</orderid> <shipdate>2001-07-13</shipdate> </letter>

Output

  • Document

 

  • Min: 1

  • Max: 1

  • XML Generator

Each XML element is converted into a corresponding field in the output document. The output maintains the hierarchical structure of the original XML. It can be processed by any downstream Snap that accepts document input.

Error

Error handling is a generic way to handle errors without data loss or Snap execution failure. You can handle the errors that the Snap might encounter when running the pipeline with one of the following options from the When errors occur list under the Views tab. The available options are:

  • Stop Pipeline Execution: Stops the current pipeline execution when the Snap encounters an error.

  • Discard Error Data and Continue: Ignores the error, discards that record, and continues with the remaining records.

  • Route Error Data to Error View: Routes the error data to an error view without stopping the Snap execution.

If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, original, resolution, and stacktrace:

Learn more about Error handling in Pipelines.

 
Snap Settings

Field/Field set

Field type

Description

Label

 

String

Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Inbound schema

 

String/Expression

XSD schema definition file url for the incoming data. The currently supported url protocols are SLDB, HDFS, S3.

 If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema.

Default value: None
Examplesldb:///foo/bar/customer.xsd

Validate XML

Checkbox

Required. Appears when you enable expression for Inbound schema.

If selected, the incoming data will be validated against the provided XSD schema definition.

 If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema. 

Default value: Deselected

Match data types

Checkbox

Select this checkbox to convert the output document data types to the data type as specified in the inbound schema property.

  • Supported design XSD files: Russian Doll

  • Supported data types: xs:string, xs:int, xs:integer, xs:long, xs:short, xs:byte, xs:float, xs:double, xs:decimal, and xs:boolean.

  • Salami Slice, Venetian Blind, and Garden of Eden design XSD files are not supported.

  • If you enter an Input schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema.

Default value: Deselected

Splitter

String

Specify the value to split the incoming XML document into multiple smaller documents using the XPath expression.

This expression must be of the form a/b/c/d or ns1:a/ns2:b/ns3:c/ns4:d where the prefixes ns1 to 4 can be the same or different. Learn more.

Default value: None
Example: d:catalog/d:book

Namespace Context (Optional)

 

 

Namespace context for the expression provided in the Splitter property.

Namespaces are typically defined in the format of xmlns prefix:URI

 

Prefix

String

Prefixes included in the expression provided in the Splitter property.

URI

String

URIs associated with the prefixes.

Optimization

Dropdown list

Select the parameter that you want to optimize during Snap execution. Available options:

  • None: Continues with standard memory consumption and speed

  • Memory:  Leads to lower memory consumption and slower execution

  • Speed: Leads to higher memory consumption and faster execution

Default valueNone

Snap Execution

Dropdown list

Select one of the three modes in which the Snap executes. Available options are:

  • Validate & Execute: Performs limited execution of the Snap, and generates a data preview during Pipeline validation. Subsequently, performs full execution of the Snap (unlimited records) during Pipeline runtime.

  • Execute only: Performs full execution of the Snap during Pipeline execution without generating preview data.

  • Disabled: Disables the Snap and all Snaps that are downstream from it.

Default Value: Execute only
Example: Validate & Execute

 

Splitters

Splitter expression without prefix

Example: breakfast_menu/food

Default namespace can be accessed by giving a unique prefix in the splitter expression followed by a colon and the tag value. Provide its corresponding namespace value in the Prefix URI table. Ensure this prefix is not used in the XML before using it. 

If the XML data is of the form: 

 

 The output will be:
 

 

Splitter expression with prefix

Example: d:catalog/d:book
For the Splitter expression: "d:catalog/d:book”, the output contains two output documents—one for each note tag in the XML file. If the Splitter expression contains prefixes, they must be defined in the Namespace Context.

Prefix

 URI

 

In the Settings, enter d:catalog/d:book in the Splitter field and http://www.develop.com/student in URI field to get the output view containing the data with the prefix 'd'.

The output view is:

Troubleshooting

Error

Reason

Resolution

Error

Reason

Resolution

"Failed to convert xml to json"

"Unexpected character."

Ensure that the xml data is well formed.

 

Examples


Parse XML data from mixed XML content

The following example pipeline reads an XML file, transforms it using XSLT, and then parses the transformed XML data.

Configure the File Reader Snap to read data from the XML file that contains mixed content.

Configure the XSLT Snap to apply the XSLT transformation using the Workaround_XML_Mixedcontent.xslt stylesheet.

XSLT Snap configuration

XSLT Stylesheet

Connect an XML Parser Snap downstream to parse the transformed XML data. On validation, the Snap parses the XML with no specific optimization settings and displays the parsed data in the output preview.


This pipeline effectively handles the XML content transformation and parsing.

Download this pipeline.

Splitting XML

In this example, XML that contains multiple purchase orders is split into individual orders.

The incoming XML looks something like this:

<po:PurchaseOrders xmlns:po="http://www.example.com"> <po:PurchaseOrder po:PurchaseOrderNumber="23578" po:OrderDate="2015-01-20"> <po:Address po:Type="Shipping"> <po:Name>Full Name</po:Name> <po:Street>123 Maple Street</po:Street> <po:City>City Name</po:City> <po:State>CA</po:State> <po:Zip>10101</po:Zip> <po:Country>USA</po:Country> </po:Address> <po:Address po:Type="Billing"> <po:Name>Another Name</po:Name> <po:Street>456 Oak Avenue</po:Street> <po:City>Town Name</po:City> <po:State>NJ</po:State> <po:Zip>99999</po:Zip> <po:Country>USA</po:Country> </po:Address> <po:DeliveryNotes>Please leave packages on side porch.</po:DeliveryNotes>po <po:Items> <po:Item po:PartNumber="123456"> <po:ProductName>Product</po:ProductName> <po:Quantity>1</po:Quantity> <po:USPrice>89.90</po:USPrice> <po:Comment>Refurbished</po:Comment> </po:Item> </po:Items> </po:PurchaseOrder> <po:PurchaseOrder po:PurchaseOrderNumber="23579" po:OrderDate="2015-01-20"> ... </po:PurchaseOrder> </po:PurchaseOrders>

Each PurchaseOrder contains the shipping address, billing address and the items purchased.

To split these into individual orders, the Splitter field should contain the hierarchy down to where you want the split to occur, including any specified prefix. In this case, the value is: po:PurchaseOrders/po:PurchaseOrder.

You will also need to specify the Namespace Context, which is defined in the sample as xmlns:po="http://www.example.com". 

 



This will result in separating the orders into individual documents.

[{, ...}, {, ...}, {, ...}] {po:PurchaseOrder:{, ...}} {po:PurchaseOrder:{, ...}} {po:PurchaseOrder:{, ...}}



  File Modified

File XML Parser_Mixed content handling.slp

May 23, 2025 by Kalpana Malladi