XML Parser
On this Page
Overview
You can use this Snap to parse the incoming XML data into SnapLogic document objects. The supported schema language is: W3C XML Schema 1.0
Snap Type
The XML Parser Snap is a Parser-type Snap.
Support for Ultra Pipelines
Works in Ultra Tasks.
Limitations
The XML Parser Snap does not support mixed content, such as the following XML data, because it may contain attributes, elements, and text.
Workaround: Use the XSLT Snap to remove the mixed content from the XML input before passing it on to the XML Parser Snap. Learn more.
Snap Views
Type | Format | Number of Views | Examples of Upstream and Downstream Snaps | Description |
---|---|---|---|---|
Input |
|
|
| The input must be:
The input must be properly structured XML data without mixed content elements for the XML Parser to process it correctly. Example of Valid Input <letter>
<name>John Smith</name>
<orderid>1032</orderid>
<shipdate>2001-07-13</shipdate>
</letter> |
Output |
|
|
| Each XML element is converted into a corresponding field in the output document. The output maintains the hierarchical structure of the original XML. It can be processed by any downstream Snap that accepts document input. |
Error | Error handling is a generic way to handle errors without data loss or Snap execution failure. You can handle the errors that the Snap might encounter when running the pipeline with one of the following options from the When errors occur list under the Views tab. The available options are:
If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, original, resolution, and stacktrace: Learn more about Error handling in Pipelines. |
Snap Settings
Field/Field set | Field type | Description |
Label
| String | Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline. |
Inbound schema
| String/Expression | XSD schema definition file url for the incoming data. The currently supported url protocols are SLDB, HDFS, S3. If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema. Default value: None |
Validate XML | Checkbox | Required. Appears when you enable expression for Inbound schema. If selected, the incoming data will be validated against the provided XSD schema definition. If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema. Default value: Deselected |
Match data types | Checkbox | Select this checkbox to convert the output document data types to the data type as specified in the inbound schema property.
Default value: Deselected |
Splitter | String | Specify the value to split the incoming XML document into multiple smaller documents using the XPath expression. This expression must be of the form Default value: None |
Namespace Context (Optional)
|
| Namespace context for the expression provided in the Splitter property. Namespaces are typically defined in the format of xmlns prefix:URI
|
Prefix | String | Prefixes included in the expression provided in the Splitter property. |
URI | String | URIs associated with the prefixes. |
Optimization | Dropdown list | Select the parameter that you want to optimize during Snap execution. Available options:
Default value: None |
Snap Execution | Dropdown list | Select one of the three modes in which the Snap executes. Available options are:
Default Value: Execute only |
Splitters
Splitter expression without prefix
Example: breakfast_menu/food
Default namespace can be accessed by giving a unique prefix in the splitter expression followed by a colon and the tag value. Provide its corresponding namespace value in the Prefix URI table. Ensure this prefix is not used in the XML before using it.
If the XML data is of the form:
The output will be:
Splitter expression with prefix
Example: d:catalog/d:book
For the Splitter expression: "d:catalog/d:book
”, the output contains two output documents—one for each note tag in the XML file. If the Splitter expression contains prefixes, they must be defined in the Namespace Context.
Prefix | URI |
---|---|
d |
In the Settings, enter d:catalog/d:book in the Splitter field and http://www.develop.com/student in URI field to get the output view containing the data with the prefix 'd'.
The output view is:
Troubleshooting
Error | Reason | Resolution |
---|---|---|
"Failed to convert xml to json" | "Unexpected character." | Ensure that the xml data is well formed. |
Examples
Parse XML data from mixed XML content
The following example pipeline reads an XML file, transforms it using XSLT, and then parses the transformed XML data.
Configure the File Reader Snap to read data from the XML file that contains mixed content.
Configure the XSLT Snap to apply the XSLT transformation using the Workaround_XML_Mixedcontent.xslt stylesheet.
XSLT Snap configuration | XSLT Stylesheet |
Connect an XML Parser Snap downstream to parse the transformed XML data. On validation, the Snap parses the XML with no specific optimization settings and displays the parsed data in the output preview.
This pipeline effectively handles the XML content transformation and parsing.
Splitting XML
In this example, XML that contains multiple purchase orders is split into individual orders.
The incoming XML looks something like this:
<po:PurchaseOrders xmlns:po="http://www.example.com">
<po:PurchaseOrder po:PurchaseOrderNumber="23578" po:OrderDate="2015-01-20">
<po:Address po:Type="Shipping">
<po:Name>Full Name</po:Name>
<po:Street>123 Maple Street</po:Street>
<po:City>City Name</po:City>
<po:State>CA</po:State>
<po:Zip>10101</po:Zip>
<po:Country>USA</po:Country>
</po:Address>
<po:Address po:Type="Billing">
<po:Name>Another Name</po:Name>
<po:Street>456 Oak Avenue</po:Street>
<po:City>Town Name</po:City>
<po:State>NJ</po:State>
<po:Zip>99999</po:Zip>
<po:Country>USA</po:Country>
</po:Address>
<po:DeliveryNotes>Please leave packages on side porch.</po:DeliveryNotes>po
<po:Items>
<po:Item po:PartNumber="123456">
<po:ProductName>Product</po:ProductName>
<po:Quantity>1</po:Quantity>
<po:USPrice>89.90</po:USPrice>
<po:Comment>Refurbished</po:Comment>
</po:Item>
</po:Items>
</po:PurchaseOrder>
<po:PurchaseOrder po:PurchaseOrderNumber="23579" po:OrderDate="2015-01-20">
...
</po:PurchaseOrder>
</po:PurchaseOrders>
Each PurchaseOrder contains the shipping address, billing address and the items purchased.
To split these into individual orders, the Splitter field should contain the hierarchy down to where you want the split to occur, including any specified prefix. In this case, the value is: po:PurchaseOrders/po:PurchaseOrder.
You will also need to specify the Namespace Context, which is defined in the sample as xmlns:po="http://www.example.com".
This will result in separating the orders into individual documents.
[{, ...}, {, ...}, {, ...}]
{po:PurchaseOrder:{, ...}}
{po:PurchaseOrder:{, ...}}
{po:PurchaseOrder:{, ...}}
Have feedback? Email documentation@snaplogic.com | Ask a question in the SnapLogic Community
© 2017-2025 SnapLogic, Inc.