For enterprise organizations, maintaining an inventory of distributed data assets that facilitate data monetization and conform to regulations requires metadata management projects to understand the full potential of enterprise data and to link them to broader data management needs. Two contexts in which this challenge comes to fore are:
The Data Governance office of an enterprise organization performs a security compliance audit to make sure no PCI, PII or PHI information is moved out of their data store; if data is illicitly transferred, the organization wants to take action to fix the problem.
The IT team wants to decommission an old data store and wants to know the impact of retiring the data store.
Data Lineage enables you to view a graphical representation of all SnapLogic Pipelines that interact with a given data source which could be a file, database, or blob. Looking at the destination, you can see where the data came from and how it changed over time. In today's business organizations, data governance is a crucial requirement for compliance. For example, if a business unit or company is undergoing an audit, those responding to an audit would have to demonstrate transparency in their keeping of records. This process involves expending time and resource to track the relationships between data and processes. To save time, SnapLogic introduces the Inspector feature, which provides data lineage tracking through the visualization of how Pipelines connect to their data sources.
Prerequisite: You must be an Org-admin to access the Lineage subtab.
Supported Data Sources:
Spark SQL 2X
Displaying the Data Lineage Visualization
To visualize the data lineage of the input and output documents that flow through your Pipelines:
Go to Dashboard > Lineage.
On the Location bar, click the downward arrow to view the drop-down menu of the data types.
Select the time period or the start and end dates during which the Pipeline was executed.
Click + by the data type to open a list of the actual data sources, then select the target data source.
Alternatively, you can click on the data type label to populate the list with all the data sources under that data type.
Conversely, if the list is populated with the data sources, click the data type label to remove the data sources from the list.
Once the Locations list is populated, click Fetch. The visualization displays the intersections of your Pipelines with the selected data source.
Runtime ID—connector line
You can populate the Locations list with all the data sources from one data type by clicking the Data Type.
To remove data sources from the Locations list, click the x on the right side of the data type label.
To remove all data sources of one type from the Locations list, click the Data Type.
Displaying Pipeline Runtime IDs
To display the associated Runtime IDs, click the connector between the data source and Pipeline displays.
Displaying Pipeline Information
Clicking on the Pipeline gives the name, path, and last update of the Pipeline.
To view the actual Pipeline, click Open in Designer.
The lineage graph displays Pipelines that are in the recycle bin. However, Open in Designer is disabled for deleted pipelines.