AIA DQG IDQ Approach& Features v1.1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 29

Data Quality

Using Informatica Data Quality

2018

Cognizant’s CDB - AI and Analytics – DQG Practice


Data Quality – Why is it
Important?

2
Executive Summary
Just as manufacturing plants assembles multiple parts on assembly line and build a high quality product,
Information as the Organizations assembles data to generate a multiplicity of information assets
new currency
Data  Critical Raw Material for Success

Like defective parts can impact product quality and cost billions $ to organization, consequently, poor quality data can contaminate
information assets, jacking up cost, Jeopardize customer relationships and causing imprecise forecast and poor decisions

Industry Trends

Data quality problems estimated cost U.S. businesses more than $600
billion a year – TDWI

Insig
Poor data quality is a primary reason for 40% of all business initiatives hts
failing to achieve their targeted benefits – Gartner Knowledge

Information

40% Projects are delayed due to Data Quality Issues Data

3
Need of Data Quality and the prospective RoI
The need for efficient Data Quality
 $3.1 trillion, IBM’s estimate of the yearly cost of poor quality data, in the US alone, in 2016
 Almost half of companies (48 percent) do not have a plan for managing or improving data quality ~ Data
Warehousing Institute
 Nearly 40% of all company data is found to be inaccurate

The Impact of Data Quality on Productivity


 70% of the annual productivity growth of 2.7%, or 1.9%, comes from IT – Bureau of Labor Statistics
 1.9% increase in labor productivity from IT  approximately $21.7 million for the average enterprise
 Losses : $ 2.2 Million if Poor DQ affects 20% of the savings
 As per Gartner, Data quality effects overall labor productivity by as much as a 20%

Efficiency of operations increase with improved processes and data quality


• Organizations on an average generated 2%-5% increased revenue from sales
• 10-15% reduction in data management & maintenance costs
• 12% improvement in revenue per sales representative
• 15-20% improvement in campaign response rate

Efficiency of IT operations resulting in greater agility of business models


• 15% improvement in time to market for new launches • 20% reduction in credit risk cost
• 20% reduction in integration costs • 15% reduction in reporting costs
• 15-20% improvement in cross-sell effectiveness

4
Data Quality Drivers
Business Drivers IT Drivers
Data residing in the various source systems often
lacks completeness, accuracy, consistency, Lack of centrally defined data quality rules and
conformity and suffers from issues of duplicity guidelines

Lack of segmentation information leading to Incorrect implementation of custom


ineffective marketing campaigns standardization rules that leads to poor data
quality

Presence of some wrong identifiers and duplicate


data in sources systems which leads to inaccurate Lack of operational tool to monitor data incidents
customer targeting on a regular basis

Inconsistent data standards/definitions across High volume of data and too many tables in few of
countries. the schemas

Handwritten and manually entered records at the Presence of duplicate data in source systems and
point of operation lead to high error rates. performance issues in job executions related to
processing huge volume of data

5
Benefits of Data Quality
Existing Data Quality Challenges Benefits of Enterprise Data Quality

Insufficient data quality of inbound data from Maintaining


Maintaining data
data quality
quality of
of inbound
inbound data
data by
by data
data cleansing,
cleansing,
source systems and data loads standardization
standardization and
and de
de duplication
duplication before
before mastering
mastering

Enterprise Data Quality


Ineffective reporting due to lack of completeness, Increased
Increased effectiveness
effectiveness of
of reporting
reporting due
due to
to higher
higher data
data quality
quality
validity and integrity of data standards
standards with
with the
the implementation
implementation of
of specialized
specialized DQ
DQ tools
tools

Different departments having different level of Common


Common data
data quality
quality standards
standards applied
applied across
across enterprise
enterprise
data quality standards and quality acceptance by
by centralized
centralized data
data governance
governance and
and quality
quality processes
processes
levels

Continuous
Continuous improvement
improvement inin data
data quality
quality though
though data
data
Decay in data quality over time monitoring,
monitoring, stewardship
stewardship and
and reporting
reporting of
of business
business specific
specific KPIs
KPIs
and
and SLAs
SLAs

Enterprise
Enterprise Data
Data Quality
Quality helps
helps in
in decision
decision making
making by by enabling
enabling
Lack of accurate forecasting due to incorrect
accurate
accurate forecasts through high quality data for all the key
forecasts through high quality data for all the key data
data
data
entities
entities of
of an
an organization
organization

6
Data Quality – How do we
Approach it?

7
Data Quality Approach – Maturity Model
• Dqaas/ COE
• Stopping ingestion of bad data

• Improved Data Management process


• Critical Data rules-market disruptive
• Policies, Best practices & standards

Optimized and
Governed
Proactive • Data Quality Standard and Guidelines
that helps the enterprise to get :-
• Preventive actions that show benefit • A high level understanding of
in longer term
Reactive the DQ as a subject
• Long term approach focusses on • Target operating model for the
process improvement and organization
• Corrective actions providing governance • Holistic approach involving People,
measureable benefits • Takes a more holistic view involving Process, Technology:-
• Low hanging fruits in terms of people, process & technology • People organization with proper
Unaware
improving data quality through data • Improves data quality and accountability and data
standardization, cleansing etc. responsibilities
governance culture across
• Short term approach focusses on • Define process for DQ As a
• No Understanding of organization Service (DQAaS).
Data Quality Impact identifying business problems

• Metadata Rule book • Data Quality Cleansing Rules and • Data Quality KPI’s and SLAs
Outcomes • Data Quality Audit reports Match Rules • Data Quality dashboards
8 • List of attributes for profiling • Gold copy of cleansed data • Data Quality Control Framework
Informatica Data Quality as the
DQ Tool

9
Recommended DQ implementation Approach using IDQ
Multi-
Multi- phase
phase approach
approach for
for assessing
assessing and
and improving
improving data
data quality
quality management
management (using
(using IDQ).
IDQ). The
The purpose
purpose ofof this
this is
is to
to assist
assist Enterprise
Enterprise
Organizations
Organizations in Analysis of as-is systems and pain-points, perform gap analysis and propose a blueprint in line with Client business
in Analysis of as-is systems and pain-points, perform gap analysis and propose a blueprint in line with Client business && IT
IT vision.
vision.

Data Profiling & Impact


Data Quality Analysis using IDQ

Assessment Cleansed &


Short Term
DQ Maturity- Unaware

Standardized Data
- Tactical using IDQ
Data Quality DQ Control Framework
Improvement/Automation Established

Ongoing
Monitoring Data Quality Monitoring
Standardize DQ Processes
across enterprise

Long Term
Establishing Enterprise wide Data Quality Standards
- Strategic to ensure that DG and
DQ processes are in
place for implementing
Establishing Enterprise wide Data Governance Enterprise Data Quality

DQ Maturity - Reactive DQ Maturity - Proactive DQ Maturity –


Optimized & Governed
IDQ Features & Benefits
Informatica Data Quality Features

Reference Data Data Quality


Data Profiling De-Duplication Exception Handling Monitoring
Management Enhancement
• Access Data for • Easy • Data Cleansing & • Configure • Data Stewardship • Configuration of
anomalies and Maintenance of Enrichment Probabilistic and • Manual Data Dashboards and
inconsistencies Enterprise wide through Deterministic correction Reports for
• Build Metrics and Reference data Mapplets and Match Rules • Manual continuous DQ
Scorecards • Audit trail for Rules • Creation of Consolidation of monitoring
• Build Rules for capturing • Address Clusters Duplicate Data • Reactive and
Profiling changes to the Standardization • Consolidation of • Audit Mechanism Proactive
• Trend analysis on LOV list • Publish DQ rules Match candidates for Exception Monitoring
DQ metrics as Web Services handling capability

Benefits

• Proactively cleanse and monitor data for all applications and keep it clean
• Huge savings on ongoing data quality maintenance by Business users
• Enable the business to share in the responsibility for data Quality and Data Governance
• Enhance IT productivity with powerful business-IT collaboration and a common data quality environment
IDQ – Key Features

12
Data Profiling Features & Benefits
Provide immediate insight into the • End to end data profiling to discover the
basic quality of data and quickly content, quality, and structure of a data source:
expose potential areas of risk • Column Profiling
• Primary Key profiling
• Functional dependency Profiling
• Data Domain Discovery
Stats to identify outliers
• Enterprise Data Discovery
and anomalies in data Value and Pattern
Frequency to isolated
• Customized business rules can be created and
inconsistent/dirty data or used during profiling
unexpected patterns
• Rule builder allows Business Users to efficiently
collaborate with Developers for building
complex business rules.
• Rich GUI interface enables easy readability of
Rule Specifications.
• Scorecards can be created to display the value
frequency for columns in a profile
• Trend charts can be configured to view the
history of scores over time
Drill down into actual data
values to inspect results across
entire data set, including
potential duplicates
Scorecards and Trend Charts
Features & Benefits
• Enables Business to “measure the data-fitness” based on defined metrics before using it for various
data-driven projects.
• Critical for making good decisions about data quality improvement initiatives.
• Trend Charts allows Business to evaluate the progression and ROI of Data quality programs.
Quantify the Quality of Data with
• Weighted Scores on multiple metrics can help to find root causes and significant contributors for
Scorecards and Trend Charts
poor Data Quality Scores
Reference Data Management
Features & Benefits
Enriching or Standardizing Data using • Enables Business to create and manage
Reference Data Reference data
• Maintain Audit trails to monitor changes
to the Reference data objects
• Use Reference data objects to
standardize and enrich source data
during data quality operations
• Same Reference Data Objects can be
used across multiple data quality
projects
• Reference data objects can be created
from Column Profile values, Patterns, flat
files and database tables
Rule Builder Features & Benefits
• Enables Business Users to define Data Requirements of a Business Rule as a reusable
software object that can be run against the data to check its validity.
• Allows Business Users to efficiently collaborate with Developers for building complex
business rules.
• Rule Specifications defines condition-action pairs for defining Business Rules that can
Define and Design Business Rules
be evaluated in a particular order for validity.
• Rich GUI interface enables easy readability of Rule Specifications.
Data Quality Enhancement
Cleanse and Standardize Data, Resolving
and address data quality issues
Features & Benefits
• Build rules and mapplets to address data
quality issues
• Address validation corrects errors in
addresses and completes partial
addresses
• Reference data usage for enhancing DQ
process
• Exception handling for Manual review
and correction
• Export Maps to PowerCenter for
metadata reuse for physical data
integration
• Web Service consumers /provider for
integration with any SOAP based
application
Data Quality Mapping
Address Validation and
Geocoding enrichment across
260 countries

Standardization and Reference


Data Management

Address
Validation

Standardize

Parsing of Unstructured
Data/Text Fields of all data Parsing
types of data (customer/
product/ social/ logs)

DQ logic pushed down/run Native or Hadoop


De-Duplication

Features & Benefits


Identify Duplicates and Consolidate
• Customizable match rule
• Support both fuzzy as well as exact
match rule.
• Duplicate analysis and consolidation
of source data.
• Identity Matching capability using
population files
• Auto merging of the data based on
customizable de-duplication rule.
• Manual merging /unmerging for the
data which have low match score.
IDQ – Case Study

20
Data Quality Management across multiple project streams for large mutual insurance company in Mid
West USA
Client is a large mutual insurance company that focuses • Data profiling using IDQ
on property, casualty and auto Insurance located in the • Profiling & scorecard generation
Mid-West of United States • Business rule creation & validation
Technology – Informatica IDQ, Informatica Data • Data Quality Testing
Analyzer, Informatica Data Analyst Highlight • Multiple Projects under different programs for Data
Quality

Situation Solution Satisfaction


• At present there are multiple project streams across Financial • Data Quality implementation involving profiling, scorecard • Supported data quality management across multiple project
Regulatory Compliance, Product Line Performance / generation streams
Enterprise Analytics ; Sales/ Customer Facing Analytics,
Commercial Lines, Underwriting etc which have varied data • Data Profiling for non critical data elements to support • Supporting state wide roll out of new age systems
quality challenges. business
• Better visibility of data defects upfront reduced the overall
• Data quality efforts are minimal & silo’d and often done on an • DQ testing across 5 dimensions of data quality time to Go Live and overall time to market
ad-hoc basis with the results being available only to the
individual department • DQ Business rule creation for each element • Helped in planning project charters through availability of
profiling activities upfront.
• Minimal visibility on defects due to bad data quality • Build a rule library for covering cross business lines to ensure
proper standards are followed • Reduced scope of errors into production environment
• Critical need for supporting state wide roll out of new age
systems and ensuring the quality of data is suitable for the Go- • Technology Stack: Informatica IDQ, Informatica Data Analyzer • Availability of Best Practices and standards document for use
Live by multiple departments as part of their data quality work
• Duration: Multi Year Program
• Data Quality rules used to test the data were primarily at L0 or • Leverage work done for a particular track across other areas
• Resources: 2 (Onsite), 8 (Offshore) for commonly used critical data elements (reducing overall
L1 levels for most of the attributes which needed to be
testing time)
enhanced
• At Peak team size of 15 members between on and offshore
Thank You
Cognizant Capabilities & Service
Offerings

23
Our Capabilities in Data Quality & Governance (DQG)
DQG Pillars
64+ 450+ 100+ 100+
• Data Maturity • Data Governance • ERP, CRM, SCM Fortune 1000 Resources Engagements in DQ-DG Business
Analysis and strategy and rollout Data Migration Globally Analysts,
Clients DQ, DG, Metadata
Profiling • Big Data Governance • Cleanse and
• Data Cleansing • Data Stewardship Standardize Management and Consultants,
• Data Quality • Metadata • Reconcile and App Migration Designers and
Monitoring Management Load Integrators

Data Quality Data Governance and Application Migration Representative Experience


Metadata Management
Big Data Big DQ DG
DQG CoEs DQ as a implementation
Quality implement
Data Governance Data Quality Metadata Management Checks Service using Collibra
ation
• Informatica DQ
• IBM Quality Stage • Informatica MM Largest Global Global online
• Collibra Large mutual Multinational
• Talend • IBM Infosphere Information Providers of Insurance, Payment Services ,
• IBM Infosphere insurance company Automobile
• SAP BOIS,DS Analyzer Annuities & Employee
Governance Catalogue in Mid West USA Manufacturer Largest Asset
• Oracle EDQ • Ab Initio Metadata Hub Benefit Programs
• ASG Rochade • Collibra Manager for
• MS DQS
• Trillium Insurance
Tool Partnerships Assets Awards
• Recipient of Informatica Innovation Award for
“Delivering trusted information to drive data
governance, reduce risk using Cognizant’s EDC
tool to address business concerns around DQ”
• Informatica Partner of the Year: 2012,
DQ Accelerator Rules DQ 2013,2014, 2015, 2016
PLATINUM TM
framework Repository Analyzer
Data Quality and Governance Service Offerings
Data Quality and Data Governance Strategy Data Quality Implementation Data Governance Implementation

Data Quality Metadata Management Application Migration Data Governance Augmented Data Preparation
• Metadata Management & Stewardship For Analytics
• Data Maturity • T24,ERP, CRM, SCM Data • Data Governance • Automating Data
Analysis, Profiling Strategy & Rollout Migration Strategy & Rollout Preparation using
• Data Ingestion & Data • Metadata Ontology & • Cleanse/Standardize & • Big Data Governance Machine Learning
Curation Taxonomy Validate • Data Stewardship & • Visual environment for
• Data Quality • Metadata Compliance and • Transform , Load & interactive preparation
Curation •
Monitoring monitoring Reconcile Push down processing
Cognizant’s Informatica Data Quality Competency
Global Partner
250+ IDQ Resources
Accelerators and Frameworks
Potential Data Domain specific Data
120 + IDQ resources certified by
Informatica
Inconsistencies template
Data Profiling Report -
Rules Repository
Quality KPIs
Profiling Summary Report DQ Dashboard template
Graphical template template

30+ Data Quality Architects


Representative Experience
Implementation of Informatica Data Quality tool for a Large Healthcare
IDQ Expertise on various IDQ versions Implemented IDQ for cleansing & matching processes. Created and validated Business Rules. Migrated
50+ Engagements including 8.x, 9.x and 10.x with big
data editions / big data management
data from 8 legacy systems. Data Stewardship application to support operational activities

Data Quality Assessment & Implementation for leading Insurance Company in Japan
Performed Data Quality Analysis, Cleansing and Validation of customer data using IDQ. Improved data
Delivery Capabilities quality & made available best version master data for downstream business applications

Customer Data Cleansing, Standardization and De-duplication for a Banking Major


Local attribute mapped to Global Standards data model. Performed Business rules validation and Data
Cleansing and Standardization using IDQ. Created a repeatable and auditable process to facilitates the
Profiling, cleansing & Recipient of Informatica Innovation Award for
“Delivering trusted information to drive data
improvements of data quality on source system and the development of strategic solutions
standardizing data
Integration of INFA governance, reduce risk using Cognizant’s EDC tool
to address business concerns around DQ” Data Quality for One of the largest automobile manufacturer
IDQ in Informatica
Power Center Informatica Partner of the Year: 2012, 2013, 2014,
Cognizant explored the data through DQ profiles (along with customized domain-specific rules) and
2015 & 2016 implemented Score carding mechanism to quantify Data Health using IDQ
IDQ Best Practices
Development Best practices
Proper version control of code should be maintained

Naming Standards for all objects should be clear and consistent within a repository

Proper code indentation (with spaces and punctuations) should be maintained

Reference tables should be applied in context-sensitive manner

Proper strategy should be considered before applying match transformations

Quality should be base-lined after analysis is done, before applying any improvement rules

Special care should be taken into sizing considerations for repository


Deployment

If multiple phases of a project are being developed After a deployment group is copied, the Deployment groups allow the last deployment
practices

simultaneously in separate folders, it is possible to target repository can be validated to to be rolled back. This allows for a documented
consolidate them by mapping folders appropriately through verify the legitimacy of objects and method for backing out a deployment if it
Best

the deployment group migration wizard. dependent objects. should be necessary.


High Level DQ Process Initial Profiling,
Profiling,
cleansing
Sources to be cleansing & standardizing & Code for
baselined Baselining Baselining rules Turnover

To DQ production Turnover,
ETL Processes

INGESTION PROCESSING No CATEGORIZING COMMUNICATING

Is
standardization No Is this data Yes
baselined? To ETL Process
required?

• Internally Sourced files Yes Operational High quality


• Internally Sourced DQ Process Data to ETL
DQ Cache
messages DQ Operations Scorecard
• Change Data Capture
• Log centric Data
• Partner Data
• Reference Data • Market Disruptions
• Procedural
Initial Profiling, Exceptions
From Reference Data cleansing & Data warehouse or Baselining Classification • Procedural Issues Alerts DQ Issues &
Load processes Baselining Operational data store Exceptions Process • Transformation Opportunities
Efforts

Reference Data Standardization Rules

DQ Mart Governance Supplier of message or Compliance


Council Data File for Data fixes
Enterprise Architecture
Data Security Authentication Authorization Audit Encryption

Multi-Domain MDM Hub


Reporting &
Analytics
Data Sources
Data Ingestion
Structured Data
Oracle
Batch Traditional Data Warehouse Layer
DB2 Informatica Power Center
• INFA BDM

Flat Files • Flume Information


ODS Landing Transform Integration Data Mart Delivery

API Layer
• Sqoop
Mobile
Interactive
• INFA PC
Unstructured Data Mobile
E-Mail Alerts
Data Quality DQ Rules
Profiling & Data Certified Data Alert
Social Framework Scorecards Certification Notifications Reject Audit Portal /
Media Intranet

Speed
RFIDs MS Office
• Spark
Big Data Platform
Logs
Spark / Map Reduce File Transfer
• Kafka
Streams
Stage (RAW) Transformation Integration Email

Metadata
Operational Metadata Technical Metadata Business Glossary Data Lineage Semantic Search Data Stewardship
Management

Data Governance & Data Stewardship

You might also like