Automating Traceability and Data Lineage on a Holistic Scale

Automating Traceability and Data Lineage on a Holistic Scale

Metadata Management, Data Quality and Data Lineage are essential capabilities to implement to track data movement between data sources and destinations across different tools, environments, and technologies. Data Lineage answers the questions that many organizations are facing today due to regulatory compliance, impact analysis, enhanced analytics and data quality initiatives.   

Metadata lineage provides the means to merge business and technical requirements for traceability, lineage, and impact analysis for the enterprise. The process to build Metadata lineage can be long but the results can be impressive. Adaptive automates the harvesting and stitching of technical metadata for complete end-to-end data lineage analytics. This can be achieved with the Adaptive Integrator product suite that is included with the Adaptive Metadata Manager platform. Additionally, the Adaptive Integrator Product Suite can be implemented to feed third-party reporting technologies as a licensing option.

 

What is lineage?

  • Lineage analytics is very complex and difficult to achieve for organizations as it relates to how information moves through different systems and databases that create, manage, and use data, so it’s often about relating the same types of items. For example: Table to Table, but also Table to XML Element to Dimension detailing how data is being transformed as it moves throughout the various systems . Lineage matters as the downstream effect of a source data item can be determined, whether the effect is direct or indirect. Moreover, the data origin point can be located, and the source can be justified. Lineage can focus and ensure accuracy for data quality efforts and enhanced analytics.
  • Lineage projects can led by either “Business Driven” lineage projects or “IT Driven” lineage projects:
    • Business Driven lineage led projects are lineage projects linked to a business glossary leveraging terms and applicable ontologies. Business Driven lineage led projects have heavy business involvement, typically include many more stakeholders, provide traceability from the Business Glossary to a report field and/or physical data element, and provide the same end-to-end lineage as a technical lineage project.
    • IT Driven lineage led projects are technical lineage projects focused on implementing capabilities where the data needs to be understood. They normally have minimal business involvement, and provide end-to-end lineage analytics but have no references to the Business Glossary.

 

Lineage Phases

    1. Planning and inventory of metadata
    2. Planning and implementing the platform configuration
    3. Validating and populating the business glossary
    4. Populating the design models and mapping to the business glossary
    5. Populating the actual database schema and mapping to the physical design models
    6. Importing the tool metadata and stitching to the actual database schema
    7. Keeping the information up to date (evergreening/change management)

 

End-to-end Data Lineage Project Benefits

  • One stop metadata access provides users the ability to access and govern their metadata in a consistent manner providing the following benefits:
    • Accurate documentation of metadata is provided, governed and maintained on a holistic scale
    • Importing tool centric metadata into the Adaptive Metadata Manager enables validity and quality of the data in a governed and audited fashion
    • No need for users to have different tools to view tool metadata
  • Merges silos of data sources
    • Roles can access the applicable metadata per their access rights and applicable use cases leveraging powerful search and collaboration capabilities
    • Understanding of gaps and inconsistencies when silos are semantically connected
  • Sharing metadata where it matters
    • Users only need a web browser for access per their security profiles
    • Applicable best practices leveraging graphs, reports and workflows
    • Audit trails and collaborative communications can be leveraged for proper reuse and best practices
  • Enterprise-wide metadata governance
    • Being able to govern metadata for the enterprise on a holistic scale where applicable stakeholders are involved in the entire process
  • Regulatory mandates and compliance requirements are accurately represented including controls and actions to ensure proper governance is actionable and documented
    • Lineage and traceability of the metadata is an essential building block for compliance, analytics and best practices
    • Reduced time for preparing regulatory documents

 

Populating the Repository

Automation of the capture and linking of different sources of metadata ensures accuracy and enhanced performance for across the enterprise. The metadata inventory captures the metadata types, configuration information, transformation information and the development life cycle frequencies. Information on file formats is also captured including the file location, source control locations, passwords and hostnames.

 

  • Order of import
    • Top Down order
      • Business metadata
      • Design models
      • Actual database
      • Tool metadata for ETL
      • Tool/cloud-based metadata for Business Intelligence

 

Value of the Adaptive Platform

  • Use the semantically connected content to address:
    • Regulatory and compliance objectives
    • Tracking and governing metadata usage
    • Enhanced analytics leveraging AI and machine learning

 

 

Establishing Lineage

Using the Adaptive Integrator Suite of Bridges automates the inventory collection processes contained in various toolsets and applications. This project task will populate the repository from metadata tools that use databases in the inventory. The tools could be ETL, Business Intelligence (BI), Reports, application source code, Cloud-based and Big Data technologies. Big Data Lineage is usually imported, typically from the technology responsible for the movement (like Informatica PowerCenter and other data warehousing tools or BI and Reporting tools such as Microsoft Reporting Services). Other options are available as well if the movement technology being used is not integrated (e.g., a bespoke program).  For instance, an Excel workbook can be imported that details lineage mapping or transforming XML files leveraged to provide lineage mapping.

 

Automating Stitching and Establishing Lineage

Automating Stitching is a process that will stitch standalone lineage into end-to-end lineage. For this project task a stitching operation will be performed to stitch the tool metadata to source and target metadata. Sources/target are usually relational tables, views, record files, XML schema.

 

Using the Metadata

  • Advantages for business users
    • Access to metadata via views and reports that business users can easily understand
    • Reduced complexity by offering a one stop look-up of business-related metadata to include where the origin of the golden sources of data are located
  • Advantage for technical users
    • The inverse relationships exist
    • Access to business metadata via technical metadata that technical users understand
    • Opens the business view to technical users
  • Provides a feedback mechanism for users via the Adaptive Platform to comment and provide additional metadata information to applicable stakeholders

 

Summary

  • Automated end-to-end lineage of metadata from different metadata tools provides enhanced analytics, reuse and governance
  • Start with 30-90 day scoping project leveraging 2-3 sources of data
  • Document everything through the process
  • Get stakeholders involved including executive sponsorship
  • Keep everyone informed
  • Benefits
    • Metadata information can be easily connected with an automated approach
    • Enterprise collaboration
    • One stop metadata access – metadata from different tools in one place
    • Merges silos for holistic understanding
    • Sharing metadata where it matters to the right stakeholder at the right time
    • Enterprise-wide metadata governance
    • Regulatory and compliance requirements can be addressed, governed and reported properly