AIA DQG IDQ Approach& Features v1.1
AIA DQG IDQ Approach& Features v1.1
AIA DQG IDQ Approach& Features v1.1
2018
2
Executive Summary
Just as manufacturing plants assembles multiple parts on assembly line and build a high quality product,
Information as the Organizations assembles data to generate a multiplicity of information assets
new currency
Data Critical Raw Material for Success
Like defective parts can impact product quality and cost billions $ to organization, consequently, poor quality data can contaminate
information assets, jacking up cost, Jeopardize customer relationships and causing imprecise forecast and poor decisions
Industry Trends
Data quality problems estimated cost U.S. businesses more than $600
billion a year – TDWI
Insig
Poor data quality is a primary reason for 40% of all business initiatives hts
failing to achieve their targeted benefits – Gartner Knowledge
Information
3
Need of Data Quality and the prospective RoI
The need for efficient Data Quality
$3.1 trillion, IBM’s estimate of the yearly cost of poor quality data, in the US alone, in 2016
Almost half of companies (48 percent) do not have a plan for managing or improving data quality ~ Data
Warehousing Institute
Nearly 40% of all company data is found to be inaccurate
4
Data Quality Drivers
Business Drivers IT Drivers
Data residing in the various source systems often
lacks completeness, accuracy, consistency, Lack of centrally defined data quality rules and
conformity and suffers from issues of duplicity guidelines
Inconsistent data standards/definitions across High volume of data and too many tables in few of
countries. the schemas
Handwritten and manually entered records at the Presence of duplicate data in source systems and
point of operation lead to high error rates. performance issues in job executions related to
processing huge volume of data
5
Benefits of Data Quality
Existing Data Quality Challenges Benefits of Enterprise Data Quality
Continuous
Continuous improvement
improvement inin data
data quality
quality though
though data
data
Decay in data quality over time monitoring,
monitoring, stewardship
stewardship and
and reporting
reporting of
of business
business specific
specific KPIs
KPIs
and
and SLAs
SLAs
Enterprise
Enterprise Data
Data Quality
Quality helps
helps in
in decision
decision making
making by by enabling
enabling
Lack of accurate forecasting due to incorrect
accurate
accurate forecasts through high quality data for all the key
forecasts through high quality data for all the key data
data
data
entities
entities of
of an
an organization
organization
6
Data Quality – How do we
Approach it?
7
Data Quality Approach – Maturity Model
• Dqaas/ COE
• Stopping ingestion of bad data
Optimized and
Governed
Proactive • Data Quality Standard and Guidelines
that helps the enterprise to get :-
• Preventive actions that show benefit • A high level understanding of
in longer term
Reactive the DQ as a subject
• Long term approach focusses on • Target operating model for the
process improvement and organization
• Corrective actions providing governance • Holistic approach involving People,
measureable benefits • Takes a more holistic view involving Process, Technology:-
• Low hanging fruits in terms of people, process & technology • People organization with proper
Unaware
improving data quality through data • Improves data quality and accountability and data
standardization, cleansing etc. responsibilities
governance culture across
• Short term approach focusses on • Define process for DQ As a
• No Understanding of organization Service (DQAaS).
Data Quality Impact identifying business problems
• Metadata Rule book • Data Quality Cleansing Rules and • Data Quality KPI’s and SLAs
Outcomes • Data Quality Audit reports Match Rules • Data Quality dashboards
8 • List of attributes for profiling • Gold copy of cleansed data • Data Quality Control Framework
Informatica Data Quality as the
DQ Tool
9
Recommended DQ implementation Approach using IDQ
Multi-
Multi- phase
phase approach
approach for
for assessing
assessing and
and improving
improving data
data quality
quality management
management (using
(using IDQ).
IDQ). The
The purpose
purpose ofof this
this is
is to
to assist
assist Enterprise
Enterprise
Organizations
Organizations in Analysis of as-is systems and pain-points, perform gap analysis and propose a blueprint in line with Client business
in Analysis of as-is systems and pain-points, perform gap analysis and propose a blueprint in line with Client business && IT
IT vision.
vision.
Standardized Data
- Tactical using IDQ
Data Quality DQ Control Framework
Improvement/Automation Established
Ongoing
Monitoring Data Quality Monitoring
Standardize DQ Processes
across enterprise
Long Term
Establishing Enterprise wide Data Quality Standards
- Strategic to ensure that DG and
DQ processes are in
place for implementing
Establishing Enterprise wide Data Governance Enterprise Data Quality
Benefits
• Proactively cleanse and monitor data for all applications and keep it clean
• Huge savings on ongoing data quality maintenance by Business users
• Enable the business to share in the responsibility for data Quality and Data Governance
• Enhance IT productivity with powerful business-IT collaboration and a common data quality environment
IDQ – Key Features
12
Data Profiling Features & Benefits
Provide immediate insight into the • End to end data profiling to discover the
basic quality of data and quickly content, quality, and structure of a data source:
expose potential areas of risk • Column Profiling
• Primary Key profiling
• Functional dependency Profiling
• Data Domain Discovery
Stats to identify outliers
• Enterprise Data Discovery
and anomalies in data Value and Pattern
Frequency to isolated
• Customized business rules can be created and
inconsistent/dirty data or used during profiling
unexpected patterns
• Rule builder allows Business Users to efficiently
collaborate with Developers for building
complex business rules.
• Rich GUI interface enables easy readability of
Rule Specifications.
• Scorecards can be created to display the value
frequency for columns in a profile
• Trend charts can be configured to view the
history of scores over time
Drill down into actual data
values to inspect results across
entire data set, including
potential duplicates
Scorecards and Trend Charts
Features & Benefits
• Enables Business to “measure the data-fitness” based on defined metrics before using it for various
data-driven projects.
• Critical for making good decisions about data quality improvement initiatives.
• Trend Charts allows Business to evaluate the progression and ROI of Data quality programs.
Quantify the Quality of Data with
• Weighted Scores on multiple metrics can help to find root causes and significant contributors for
Scorecards and Trend Charts
poor Data Quality Scores
Reference Data Management
Features & Benefits
Enriching or Standardizing Data using • Enables Business to create and manage
Reference Data Reference data
• Maintain Audit trails to monitor changes
to the Reference data objects
• Use Reference data objects to
standardize and enrich source data
during data quality operations
• Same Reference Data Objects can be
used across multiple data quality
projects
• Reference data objects can be created
from Column Profile values, Patterns, flat
files and database tables
Rule Builder Features & Benefits
• Enables Business Users to define Data Requirements of a Business Rule as a reusable
software object that can be run against the data to check its validity.
• Allows Business Users to efficiently collaborate with Developers for building complex
business rules.
• Rule Specifications defines condition-action pairs for defining Business Rules that can
Define and Design Business Rules
be evaluated in a particular order for validity.
• Rich GUI interface enables easy readability of Rule Specifications.
Data Quality Enhancement
Cleanse and Standardize Data, Resolving
and address data quality issues
Features & Benefits
• Build rules and mapplets to address data
quality issues
• Address validation corrects errors in
addresses and completes partial
addresses
• Reference data usage for enhancing DQ
process
• Exception handling for Manual review
and correction
• Export Maps to PowerCenter for
metadata reuse for physical data
integration
• Web Service consumers /provider for
integration with any SOAP based
application
Data Quality Mapping
Address Validation and
Geocoding enrichment across
260 countries
Address
Validation
Standardize
Parsing of Unstructured
Data/Text Fields of all data Parsing
types of data (customer/
product/ social/ logs)
20
Data Quality Management across multiple project streams for large mutual insurance company in Mid
West USA
Client is a large mutual insurance company that focuses • Data profiling using IDQ
on property, casualty and auto Insurance located in the • Profiling & scorecard generation
Mid-West of United States • Business rule creation & validation
Technology – Informatica IDQ, Informatica Data • Data Quality Testing
Analyzer, Informatica Data Analyst Highlight • Multiple Projects under different programs for Data
Quality
23
Our Capabilities in Data Quality & Governance (DQG)
DQG Pillars
64+ 450+ 100+ 100+
• Data Maturity • Data Governance • ERP, CRM, SCM Fortune 1000 Resources Engagements in DQ-DG Business
Analysis and strategy and rollout Data Migration Globally Analysts,
Clients DQ, DG, Metadata
Profiling • Big Data Governance • Cleanse and
• Data Cleansing • Data Stewardship Standardize Management and Consultants,
• Data Quality • Metadata • Reconcile and App Migration Designers and
Monitoring Management Load Integrators
Data Quality Metadata Management Application Migration Data Governance Augmented Data Preparation
• Metadata Management & Stewardship For Analytics
• Data Maturity • T24,ERP, CRM, SCM Data • Data Governance • Automating Data
Analysis, Profiling Strategy & Rollout Migration Strategy & Rollout Preparation using
• Data Ingestion & Data • Metadata Ontology & • Cleanse/Standardize & • Big Data Governance Machine Learning
Curation Taxonomy Validate • Data Stewardship & • Visual environment for
• Data Quality • Metadata Compliance and • Transform , Load & interactive preparation
Curation •
Monitoring monitoring Reconcile Push down processing
Cognizant’s Informatica Data Quality Competency
Global Partner
250+ IDQ Resources
Accelerators and Frameworks
Potential Data Domain specific Data
120 + IDQ resources certified by
Informatica
Inconsistencies template
Data Profiling Report -
Rules Repository
Quality KPIs
Profiling Summary Report DQ Dashboard template
Graphical template template
Data Quality Assessment & Implementation for leading Insurance Company in Japan
Performed Data Quality Analysis, Cleansing and Validation of customer data using IDQ. Improved data
Delivery Capabilities quality & made available best version master data for downstream business applications
Naming Standards for all objects should be clear and consistent within a repository
Quality should be base-lined after analysis is done, before applying any improvement rules
If multiple phases of a project are being developed After a deployment group is copied, the Deployment groups allow the last deployment
practices
simultaneously in separate folders, it is possible to target repository can be validated to to be rolled back. This allows for a documented
consolidate them by mapping folders appropriately through verify the legitimacy of objects and method for backing out a deployment if it
Best
To DQ production Turnover,
ETL Processes
Is
standardization No Is this data Yes
baselined? To ETL Process
required?
API Layer
• Sqoop
Mobile
Interactive
• INFA PC
Unstructured Data Mobile
E-Mail Alerts
Data Quality DQ Rules
Profiling & Data Certified Data Alert
Social Framework Scorecards Certification Notifications Reject Audit Portal /
Media Intranet
Speed
RFIDs MS Office
• Spark
Big Data Platform
Logs
Spark / Map Reduce File Transfer
• Kafka
Streams
Stage (RAW) Transformation Integration Email
Metadata
Operational Metadata Technical Metadata Business Glossary Data Lineage Semantic Search Data Stewardship
Management