On this Page

Snap type:

Parse


Description:

This Snap parses the incoming XML data into SnapLogic document objects.


Prerequisites:

[None]


Support:

Works in Ultra Task Pipelines.

Limitations

Mixed content such as below is not supported. XML data with mixed content may contain attributes, elements, and text.

<letter>
       Dear Mr. <name>John Smith</name>.
       Your order <orderid>1032</orderid>
       will be shipped on <shipdate>2001-07-13</shipdate>.
</letter>


Account: 

Accounts are not used with this Snap.


Views:


InputThis Snap has exactly one binary input view.
OutputThis Snap has exactly one document output view.
Error

This Snap has at most one document error view and produces zero or more documents in the view. If the Snap fails during the operation, an error document is sent to the error view containing the fields error, reason, original, resolution, and stacktrace:

{ 
error: "Failed to convert xml to json"
reason: "Unexpected character ..."
original:  {...} 
resolution: "Please check if the xml data is well formed"
stacktrace: "java.lang.RuntimeException:  ...
}



Settings

Label


Required. The name for the Snap. You can modify this to be more specific, especially if you have more than one of the same Snap in your pipeline.

Inbound Schema
 


XSD schema definition file url for the incoming data. The currently supported url protocols are SLDB, HDFS, S3.

 If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema.

Example: sldb:///foo/bar/customer.xsd

Default value: [None]
 

Validate XML
 

Required. If selected, the incoming data will be validated against the provided XSD schema definition.

 If you enter an Inbound schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema. 

Example: False

Default value: False 

Splitter
 

Splits the incoming XML document into multiple smaller documents using the XPath expression.

This expression must be of the form a/b/c/d or ns1:a/ns2:b/ns3:c/ns4:d where the prefixes ns1 to 4 can be the same or different.

Example 1: Splitter Expression without prefix


Default namespace can be accessed by giving a unique prefix in the splitter expression followed by a colon and the tag value. Provide its corresponding namespace value in the Prefix URI table. 
To be on the safer side, make sure this prefix is not used in the XML before using it. 

Splitterbreakfast_menu/food

 
If the XML data is of the form: 


 

 
The output will be:
 

 

 
 
Example 2: Splitter Expression with prefix
 
For the Splitter expression: "d:catalog/d:book”, the output contains two output documents - one for each note tag in the XML file.
If the Splitter expression contains prefixes, they are to be defined in the Namespace Context so that the Snap can understand them.


Splitter  d:catalog/d:book


Prefix URI
d http://www.develop.com/student 


 

 
In the Settings, enter d:catalog/d:book in the Splitter field and http://www.develop.com/student in URI field to get the output view containing the data with the prefix 'd'.

 

 The output view is:
 

 

Default value: [None]  


Namespace Context

optional

Namespace context for the expression provided in the Splitter property.

Namespaces are typically defined in the format of xmlns prefix:URI


Prefix

Prefixes included in the expression provided in the Splitter property.

URI

URIs associated with the prefixes.

Match data types

Specifies the values in the input document as specified in the Inbound schema property.

  • Supported design XSD files: Russian Doll

  • Supported data types: xs:string, xs:int, xs:integer, xs:long, xs:short, xs:byte, xs:float, xs:double, xs:decimal, and xs:boolean.

  • Salami Slice, Venetian Blind, and Garden of Eden design XSD files are not supported.

  • If you enter an Input schema, then you must select Validate XML and Match data types properties to derive the output as per the defined schema.


Optimization

Select the parameter that you want to optimize during Snap execution. Available options:

  • None: Continues with standard memory consumption and speed
  • Memory:  Leads to lower memory consumption and slower execution
  • Speed: Leads to higher memory consumption and faster execution

Default value: None

 
Schema language supported : W3C XML Schema 1.0

Examples


Splitting XML

In this example, XML that contains multiple purchase orders is split into individual orders.


The incoming XML looks something like this:

<po:PurchaseOrders xmlns:po="http://www.example.com">
   <po:PurchaseOrder po:PurchaseOrderNumber="23578" po:OrderDate="2015-01-20">
     <po:Address po:Type="Shipping">
       <po:Name>Full Name</po:Name>
       <po:Street>123 Maple Street</po:Street>
       <po:City>City Name</po:City>
       <po:State>CA</po:State>
       <po:Zip>10101</po:Zip>
       <po:Country>USA</po:Country>
     </po:Address>
     <po:Address po:Type="Billing">
       <po:Name>Another Name</po:Name>
       <po:Street>456 Oak Avenue</po:Street>
       <po:City>Town Name</po:City>
       <po:State>NJ</po:State>
       <po:Zip>99999</po:Zip>
       <po:Country>USA</po:Country>
     </po:Address>
     <po:DeliveryNotes>Please leave packages on side porch.</po:DeliveryNotes>po
     <po:Items>
        <po:Item po:PartNumber="123456">
          <po:ProductName>Product</po:ProductName>
          <po:Quantity>1</po:Quantity>
          <po:USPrice>89.90</po:USPrice>
          <po:Comment>Refurbished</po:Comment>
        </po:Item>
     </po:Items>
   </po:PurchaseOrder>
   <po:PurchaseOrder po:PurchaseOrderNumber="23579" po:OrderDate="2015-01-20">
   ...
   </po:PurchaseOrder>
</po:PurchaseOrders>

Each PurchaseOrder contains the shipping address, billing address and the items purchased.

To split these into individual orders, the Splitter field should contain the hierarchy down to where you want the split to occur, including any specified prefix. In this case, the value is: po:PurchaseOrders/po:PurchaseOrder.

You will also need to specify the Namespace Context, which is defined in the sample as xmlns:po="http://www.example.com"

 


This will result in separating the orders into individual documents.

 [{, ...}, {, ...}, {, ...}] 
         {po:PurchaseOrder:{, ...}} 
         {po:PurchaseOrder:{, ...}} 
         {po:PurchaseOrder:{, ...}}