0% found this document useful (0 votes)

59 views

Data Quality Management

This document is from Technoforte Pvt. Ltd., covering guidelines on data quality management

Uploaded by

vedantambhanusaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

59 views

Data Quality Management

This document is from Technoforte Pvt. Ltd., covering guidelines on data quality management

Uploaded by

vedantambhanusaran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

Data

Quality
Data Quality 1

S.NO CONTENTS PAGE NO

1 Introduction 2

2 Data Quality Dimensions 3

3 Practical Use of Data Quality 5

Dimensions

4 Data Quality Policies 5

5 Correction of Data 10

6 Data Governance with Technoforte 11

Data Quality 2

Introduction
Data quality is a fundamental aspect that data governance programs must tackle. It
primarily involves ensuring that the data aligns with business needs. Recognizing that
no process or modernization can succeed without reliable and high-quality data is
crucial. This applies to business applications, digital transformation initiatives, and the
deployment of AI/ML technologies.

Gartner describes data quality (DQ) solutions as “the set of processes and technologies
for identifying, understanding, preventing, escalating and correcting issues in data that
supports effective decision making and governance across all business processes.”

DAMA (Data Management Association) defines data quality as “the planning,

implementation and control of activities that apply data quality management
techniques to data in order to assure it is fit for consumption and meeting the needs of
data consumers."

An improved definition might describe data quality as ensuring that data is suitable for
business purposes, facilitating operational activities, analytics, and decision-making in
a way that enhances trust and efficiency.

Breaking down Gartner's definition, data quality consists of the following components:

Identifying: Conducting a series of checks or applying rules to detect data that fails
to meet business requirements.
Understanding: Determining the reasons why data is unreliable in order to address
these issues.
Preventing: Implementing controls to prevent the creation of poor-quality data
during manual data entry, business processes, data pipelines, or other data
manipulation activities.
Data Quality 3

Escalating: Establishing a method to enhance the visibility and priority of urgent

data quality issues.
Correcting Issues: Engaging in activities to improve data quality once issues that
affect business suitability are identified.

It is crucial to understand that data quality isn't solely about analytics; it's also about
maintaining accurate data in operational systems. Poor data in your data platform can
lead to bad business decisions, but poor data quality in operational systems can have
dire consequences. Examples of operational disruptions caused by bad data include:
(1) halting product manufacturing due to raw material shortages, (2) preventing a
logistics provider from delivering products due to payment problems, (3) sending
products to the wrong company or delivering the wrong product to the intended
customer, (4) crediting the wrong account for a customer refund, (5) offering the
incorrect product to a potential customer, (6) granting access to online content for
which a user is not licensed.

To ensure data is functional and business-ready, certain core requirements must be

met. This whitepaper will outline these essential needs and explore these concepts in
depth.

Data Quality Dimensions

Firstly, it is essential to establish a set of data quality dimensions. This concept has
evolved over the years, originally developed to understand perceptions of data quality.

DAMA (Data Management Association) has introduced a set of data quality dimensions
known as the Strong-Wang framework, designed to address various situations
including:

Intrinsic Objectivity Accuracy Believability Reputation

Appropriate
Contextual DQ Relevancy Value-added Timeliness Completeness amount of
data

Representational Ease of Representational Concise

Interpretability
DQ understanding consistency representation

Accessibility Objectivity Accessibility

Data Quality 4

The good news is that this concept has significantly evolved over time. DAMA -
Netherlands recently released a research paper detailing the extensive range of data
quality dimensions that a company could adopt. The reality is that each organization
develops its own dimensions to suit its specific needs. There isn't a single, universal
approach.

A typical dimension for assessing data quality might include:

Data Quality Dimension Meaning

Accuracy Data is correct, verifiable, and current.

Degree to which the data is unique within the

Uniqueness system/entity and across all systems/entities.

Consistency Data is consistent with other related attributes.

Verification that all necessary data elements

Completeness are present and all mandatory checks meet
baseline criteria.

Data is provided on time for both creation and

Timeliness updates.

Data adheres to the required format (pattern)

Conformity
for business use.
Data Quality 5

Practical Use of Data Quality

Dimensions
Data quality dimensions offer several key benefits. Firstly, they provide a framework for
evaluating governed fields by asking questions such as: “When reviewing the data
quality dimension ‘Completeness,’ are there specific rules we should implement to
better assess the business suitability of this data element?” This approach helps
establish the necessary data quality rules or checks. Applying this method across all
data quality dimensions ensures appropriate rules are in place for business suitability.

Secondly, analyzing data quality rules and aggregating them by data quality
dimensions allows us to identify the number of issues within each dimension, by field,
table, functional area, and overall data. This analysis provides a comprehensive view of
the current data quality status. Additionally, tracking these metrics over time reveals
trends in data quality improvement or decline, offering valuable visualizations to share
with senior leadership to showcase progress in enhancing data quality.

Lastly, summarizing data quality by dimensions helps demonstrate to end users why
they should trust the data or how it can address business needs.

Data Quality Policies

To begin with, it's crucial to establish general policies for data quality and publish
standardized guidelines for developing and using data with an emphasis on quality.
This includes:

Defining data quality dimensions (e.g., accuracy, completeness, consistency,

timeliness, uniqueness, validity)
Data Quality 6

Utilizing tools like Talend, SAS etc. and techniques for addressing data quality in
front-end applications, such as data quality firewalls and front-end edits
Implementing data pipelines, data correction, improvement, and exception
handling within standard operating procedures
Creating standard data quality dashboards and reports, including specifications for
new data products

A solid foundation of tools and processes is essential for data quality initiatives to
succeed. Without it, these initiatives will struggle to progress.

Next, core policies at the field level must be established to operationalize data quality.
These policies typically cover data definitions, formats, ranges, and lifecycle
management. Common policy standards include:

Data exception processing (e.g., handling missing or anomalous data)

String handling (e.g., trimming whitespace, ensuring proper case)
Phone number formats (e.g., international vs. local formatting standards)
Date handling, including data formats (e.g., ISO 8601) and types
Handling long strings, such as field lengths, special characters, and carriage returns
Execution of data quality checks, specifying frequency and ownership
Defining data value ranges (e.g., permissible values for fields)
Enforcing valid values (e.g., using lookup tables or reference data)
Maintaining cross-reference (XREF) or crosswalks (e.g., mapping codes to
descriptions)
Correct data handling procedures (e.g., data entry guidelines, data review
processes)
Data Quality 7

Third, define data quality rules (data policies) and document them in a policy
repository. It's crucial to provide a data quality definition that is easily understandable
for any business user. Additionally, specify the technical requirements for these rules
separately so they can be implemented using either a code-heavy approach or the
available data quality tools.

Fourth, develop a set of data quality reports using your preferred BI tool to detail data
quality metrics. Creating a data quality dashboard that offers a comprehensive
overview of data quality across the organization is also beneficial.

While ensuring data is fit for business purposes is a significant aspect of data quality, it
is only one component. To enhance your data governance program, consider
leveraging the following capabilities commonly found in top-tier data quality tools:

Data profiling
Deduplication
Data quality checks within data pipelines
Exception processing
Data quality firewalls
Monitoring data decay and data drift
Dashboards and reporting

Profiling
Profiling involves using data quality tools to automatically run numerous queries and
generate a detailed report about the data set being examined. This report typically
includes inferred and defined data types, minimum and maximum values, most
common value, average, number of nulls, low and high values, duplicates, patterns, and
sample data. Inferred data types and patterns are particularly valuable insights from
profiles. Modern data observability tools may use AI scans instead of traditional
profiling but serve a similar purpose as the next generation of profiling techniques.

Deduplication
Deduplication involves standardizing data, applying algorithms to identify potential
duplicates, generating cross-references, and incorporating business user input to
confirm duplicate groupings. The process aims to eliminate duplicate records, ensure
accurate householding, and enhance data integrity.

Data Pipeline Data Quality Checks

These checks are integrated with data integration routines to evaluate data quality.
They perform various checks and, based on the results, apply additional
transformations to ensure the data is suitable for loading into the target application.
These checks help in maintaining data quality throughout the data pipeline, preventing
the propagation of errors.
Data Quality 8

Exception Processing
This technique, utilized within ELT, ETL, or Code Solutions, involves executing data quality
checks integrated with a data quality tool. Any data that fails to meet the specified
rules and checks is discarded. The exceptional data is then rectified, reprocessed, and
subsequently loaded successfully, ensuring a robust data processing workflow. This
process helps in identifying and correcting data issues early, maintaining high data
quality standards.

Data Quality Firewalls

These are advanced real-time rules that conduct data checks and provide immediate
feedback to rectify data before it is stored in the database or other data repository. By
catching errors at the point of entry, data quality firewalls prevent bad data from
entering the system, ensuring that only high-quality data is stored and used.

Data Decay and Data Drift

This involves monitoring data quality and taking corrective action as data accuracy
diminishes over time. As data ages, it may become less reliable (data decay) or exhibit
predictable distortions (data drift), necessitating correction measures.

Dashboard & Reporting

Dashboard and reporting functions encompass a series of reports that convey the
narrative of data quality. These reports may include summarized perspectives of data
categorized by data quality dimensions or source objects. Alternatively, they may offer
detailed insights, providing data specialists with a breakdown of all data, highlighting
areas requiring attention. Dashboards and reports serve multiple purposes, from
prioritizing data issues to presenting progress to leadership or showcasing data quality
enhancements.

These capabilities aim to tackle challenges and ensure the acquisition of high-quality
data, foster a comprehensive understanding of data quality concepts, and meet the
requirements for data to align seamlessly with business objectives. Additionally, it is
crucial to establish a framework that incorporates data quality into the data lifecycle.
One such framework is POSMAD.

POSMAD, introduced by Danette McGilvray in the book "Executing Data Quality Projects,"
is widely recognized as a seminal work in the field of data quality. POSMAD stands for:
Data Quality 9

(P) Plan: Identify objectives, plan data architecture, and establish standards and
definitions.
(O) Obtain: Document data acquisition methods and procedures.
(S) Store and Share: Define data storage locations, storage methods, and data
accessibility protocols.
(M) Maintain: Outline data maintenance procedures, including cleansing, matching,
merging, deduplication, and enhancement.
(A) Apply: Determine data usage methods and tools for accessing data.
(D) Dispose: Address data retirement, archiving, movement, or deletion as part of the
data lifecycle.

This cycle is often referred to as Information Lifecycle Management (ILM).

Plan

Dispose Obtain

Information
Lifecycle
Management
Apply Store and
Share

Maintain
Data Quality 10

Correction of Data
Another crucial aspect is data correction, involving corrective measures on data
flagged with issues. When utilizing data quality tools, it's essential not only to detect
data anomalies but also to pinpoint the data requiring correction, with precise
instructions on remedial actions. Simply identifying outliers isn't sufficient; we must
rectify data through various means, including:

Logging change requests and tasking a business user with manual data
rectification (though this is the least preferable option).
Incorporating rules into data pipelines to rectify data (a highly efficient approach).
Implementing execution processes wherein erroneous data is diverted from regular
data flows to exception files. Remedial actions are then automated to rectify the
data before reintegrating it into normal data flows.
Integrating front-end edits to prevent the entry of flawed data; these edits may
include validation checks, warnings, or other assistive features. Data is only saved
once all edits are successfully passed.
Establishing a data quality firewall, which represents a more advanced solution than
edits. This firewall verifies data against various web services and function modules,
assisting end-users in preventing duplicate records and ensuring higher-quality
data.
Data Quality 11

Data Governance with Technoforte

Data quality is critical for all data governance initiatives in an organization. With
Technoforte, you can establish data quality standards utilizing our data governance
services.

Our services include:

Data Ingestion
ETL
Data Warehousing
Data Analytics & Insights
Predictive Analytics
Big Data Analytics
Data Integration and Data Migration
Data Governance

Why Technoforte?
Unmatched Data Quality: Meticulously curate and maintain your data

Enhanced Compliance and Security: Navigate the complexities of data regulations

with ease

Operational Efficiency: Streamline operations, and focus on strategic initiatives, driving

productivity and growth

Informed Decision-Making: Strong foundation for analytics and reporting

Scalable Solutions: Sustainable data management practices at every stage of growth.

[email protected] +91 79755 52867

Billing PDF
82% (57)
Billing PDF
1 page
Amity University: Uttar Pradesh Major Project Report On
0% (1)
Amity University: Uttar Pradesh Major Project Report On
3 pages
Hazard Analysis Template
100% (1)
Hazard Analysis Template
3 pages
ISO 9001 Clause 4.1
No ratings yet
ISO 9001 Clause 4.1
6 pages
Quality Management Policy
No ratings yet
Quality Management Policy
1 page
IT Procedures Manual Template
No ratings yet
IT Procedures Manual Template
4 pages
PR Corrective Action
No ratings yet
PR Corrective Action
5 pages
Job Description - Merchandiser
No ratings yet
Job Description - Merchandiser
3 pages
Mapping
No ratings yet
Mapping
1 page
Indgap Basic Requirements PDF
No ratings yet
Indgap Basic Requirements PDF
28 pages
Case Study in ISO 14001
No ratings yet
Case Study in ISO 14001
2 pages
NABCB Quality Manual - Issue 7 Rev 01 - Jul 2020
No ratings yet
NABCB Quality Manual - Issue 7 Rev 01 - Jul 2020
36 pages
Quality Orientation Guide
No ratings yet
Quality Orientation Guide
25 pages
IMS Awarness 00
100% (1)
IMS Awarness 00
14 pages
Analysis of Quality Objectives
100% (1)
Analysis of Quality Objectives
1 page
Process For Customer Satisfaction Measurement
No ratings yet
Process For Customer Satisfaction Measurement
1 page
Quality Management System QMS Policy
No ratings yet
Quality Management System QMS Policy
5 pages
Quality Assurance Project Plan For Solar Evaporation Pond Pilot Project Salton Sea, CA
No ratings yet
Quality Assurance Project Plan For Solar Evaporation Pond Pilot Project Salton Sea, CA
14 pages
Ims MRM PDF
No ratings yet
Ims MRM PDF
2 pages
Internal Auditing Chapter 26 of Arens Chapter 8 and 11:internal Audit Practices in Malaysia
No ratings yet
Internal Auditing Chapter 26 of Arens Chapter 8 and 11:internal Audit Practices in Malaysia
86 pages
Haccp
No ratings yet
Haccp
61 pages
List of Procedures Required For Implementing IMS of ISO 9001
No ratings yet
List of Procedures Required For Implementing IMS of ISO 9001
3 pages
IMSP 7.1 Resources
No ratings yet
IMSP 7.1 Resources
6 pages
Ims MRM Input 2015
No ratings yet
Ims MRM Input 2015
2 pages
Planning of Changes - Clause 6.3
No ratings yet
Planning of Changes - Clause 6.3
3 pages
Procedure Template For ISO 9001
No ratings yet
Procedure Template For ISO 9001
2 pages
12 Software Validation Log
No ratings yet
12 Software Validation Log
1 page
Procedure 04 - Hazard Identification and Risk Assessment
100% (1)
Procedure 04 - Hazard Identification and Risk Assessment
3 pages
Control of Nonconforming Outputs Guidance: SCMH Section 3.3.2
No ratings yet
Control of Nonconforming Outputs Guidance: SCMH Section 3.3.2
20 pages
Procedure For Communication
No ratings yet
Procedure For Communication
5 pages
Clinical Research Organisation Culture
No ratings yet
Clinical Research Organisation Culture
3 pages
By MR - Vikram Joshi
No ratings yet
By MR - Vikram Joshi
56 pages
Quality System Manual DOC. No. QSM - 07 REV. No. 00 Product Realisation Date Page No. 1 of 4
No ratings yet
Quality System Manual DOC. No. QSM - 07 REV. No. 00 Product Realisation Date Page No. 1 of 4
4 pages
QMS 04 A Interested Parties Free
No ratings yet
QMS 04 A Interested Parties Free
5 pages
ISO 9001 Awareness PDF
No ratings yet
ISO 9001 Awareness PDF
1 page
3 Internal Quality Audit
No ratings yet
3 Internal Quality Audit
12 pages
KRA List & Responsibility Matrix
No ratings yet
KRA List & Responsibility Matrix
657 pages
Sample of IMS PDF
No ratings yet
Sample of IMS PDF
24 pages
05 - Procedure For Operational Control
No ratings yet
05 - Procedure For Operational Control
5 pages
ISO 13485 Awareness
No ratings yet
ISO 13485 Awareness
85 pages
SGSB Training Masterlist
No ratings yet
SGSB Training Masterlist
6 pages
Quality Management: ©ian Sommerville 2000 Software Engineering, 6th Edition. Chapter 24 Slide 1
No ratings yet
Quality Management: ©ian Sommerville 2000 Software Engineering, 6th Edition. Chapter 24 Slide 1
55 pages
RISK Mba 17
100% (1)
RISK Mba 17
22 pages
Ch. 13. Health, Safety and Security at Workplace Note
No ratings yet
Ch. 13. Health, Safety and Security at Workplace Note
6 pages
HIRA Template - Old - XLSX - Risk Assessment
No ratings yet
HIRA Template - Old - XLSX - Risk Assessment
4 pages
Corrective Action Plan Template v1
No ratings yet
Corrective Action Plan Template v1
105 pages
Higg Index-Water Module: - Why It Is Used As Business Tool and - How To Score in It
No ratings yet
Higg Index-Water Module: - Why It Is Used As Business Tool and - How To Score in It
36 pages
Personnel Cleanliness and Hygiene Practices 1664204156
100% (1)
Personnel Cleanliness and Hygiene Practices 1664204156
8 pages
Nonconformity Report
No ratings yet
Nonconformity Report
3 pages
Procedure Manual - IMS: Locomotive Workshop, Northern Railway, Lucknow
No ratings yet
Procedure Manual - IMS: Locomotive Workshop, Northern Railway, Lucknow
2 pages
Eaton Supplier Excellence Manual 12172014 PDF
No ratings yet
Eaton Supplier Excellence Manual 12172014 PDF
31 pages
Good Documentation Practices Training Module For HR
No ratings yet
Good Documentation Practices Training Module For HR
233 pages
ISO 9001 2015 Gap Analysis Checklist Sample PDF
No ratings yet
ISO 9001 2015 Gap Analysis Checklist Sample PDF
4 pages
Vendor Rating - Example
No ratings yet
Vendor Rating - Example
5 pages
Date: - Event / Finding / Hazard
No ratings yet
Date: - Event / Finding / Hazard
2 pages
Quality Management Systems Examples
No ratings yet
Quality Management Systems Examples
9 pages
Ami Code of Conduct
No ratings yet
Ami Code of Conduct
5 pages
QMS and EMS Policy Manual (ISO-EMS - 14001-2004)
No ratings yet
QMS and EMS Policy Manual (ISO-EMS - 14001-2004)
23 pages
Data Quality
No ratings yet
Data Quality
76 pages
Data Quality Lec 3
No ratings yet
Data Quality Lec 3
3 pages
dataqualitymanagement
No ratings yet
dataqualitymanagement
20 pages
Create An Enterprise Vision For Data Quality and Observability Whitepaper
No ratings yet
Create An Enterprise Vision For Data Quality and Observability Whitepaper
17 pages
Mashal Jan CV
No ratings yet
Mashal Jan CV
2 pages
Introduction To Process Control System
No ratings yet
Introduction To Process Control System
15 pages
Lecture2 Data
No ratings yet
Lecture2 Data
57 pages
How To Use The Capacitive Sensing Module (CSM)
No ratings yet
How To Use The Capacitive Sensing Module (CSM)
8 pages
Host PowerTools For VMware 2.1 - User Manual
No ratings yet
Host PowerTools For VMware 2.1 - User Manual
35 pages
CAO Lecture Notes
No ratings yet
CAO Lecture Notes
110 pages
Scientific Development of Smart Farming Technologies and Their Application in Brazil
No ratings yet
Scientific Development of Smart Farming Technologies and Their Application in Brazil
12 pages
Enable Disable Constraint
No ratings yet
Enable Disable Constraint
10 pages
RADIALL - RP66393 Plug With Backshell Cap
No ratings yet
RADIALL - RP66393 Plug With Backshell Cap
1 page
Digital Transformation Week 2 Notes
No ratings yet
Digital Transformation Week 2 Notes
3 pages
Omg-12-10-17 - V2.4.1 Uml
No ratings yet
Omg-12-10-17 - V2.4.1 Uml
16 pages
Get STRATEGIZE 2ND EDITION Roman Pichler free all chapters
100% (1)
Get STRATEGIZE 2ND EDITION Roman Pichler free all chapters
77 pages
How To Install The Data Protector License Key
0% (1)
How To Install The Data Protector License Key
4 pages
SAP SD Transction Codes
No ratings yet
SAP SD Transction Codes
8 pages
Programming VS Coding VS Development
No ratings yet
Programming VS Coding VS Development
4 pages
Yk Sugi Resume
No ratings yet
Yk Sugi Resume
2 pages
Applied Management
No ratings yet
Applied Management
3 pages
Glossary
No ratings yet
Glossary
24 pages
installing_Amasty_extensions_via_composer
No ratings yet
installing_Amasty_extensions_via_composer
8 pages
IT-general-controls-checklist
No ratings yet
IT-general-controls-checklist
34 pages
Dynamic Programming
No ratings yet
Dynamic Programming
2 pages
Port To Port User Manual
100% (1)
Port To Port User Manual
57 pages
Connection For Micom P633 Relay Mentioned Below
No ratings yet
Connection For Micom P633 Relay Mentioned Below
3 pages
Linux Architecture Explained
No ratings yet
Linux Architecture Explained
2 pages
Motopeds: Pivot Plate Cap
No ratings yet
Motopeds: Pivot Plate Cap
3 pages
What Are All The AIS ESIM2Fly Data Plans? - ESIM2Fly Shop
No ratings yet
What Are All The AIS ESIM2Fly Data Plans? - ESIM2Fly Shop
6 pages
The Five Generations of Computers
85% (13)
The Five Generations of Computers
2 pages
Dbvisit Concepts and Best Practice Guide
No ratings yet
Dbvisit Concepts and Best Practice Guide
22 pages

Data Quality Management

Uploaded by

Data Quality Management

Uploaded by

Data

S.NO CONTENTS PAGE NO

2 Data Quality Dimensions 3

3 Practical Use of Data Quality 5

4 Data Quality Policies 5

6 Data Governance with Technoforte 11

DAMA (Data Management Association) defines data quality as “the planning,

Escalating: Establishing a method to enhance the visibility and priority of urgent

To ensure data is functional and business-ready, certain core requirements must be

Data Quality Dimensions

Intrinsic Objectivity Accuracy Believability Reputation

Representational Ease of Representational Concise

Accessibility Objectivity Accessibility

A typical dimension for assessing data quality might include:

Data Quality Dimension Meaning

Accuracy Data is correct, verifiable, and current.

Degree to which the data is unique within the

Consistency Data is consistent with other related attributes.

Verification that all necessary data elements

Data is provided on time for both creation and

Data adheres to the required format (pattern)

Practical Use of Data Quality

Data Quality Policies

Defining data quality dimensions (e.g., accuracy, completeness, consistency,

Data exception processing (e.g., handling missing or anomalous data)

Data Pipeline Data Quality Checks

Data Quality Firewalls

Data Decay and Data Drift

Dashboard & Reporting

This cycle is often referred to as Information Lifecycle Management (ILM).

Data Governance with Technoforte

Our services include:

Enhanced Compliance and Security: Navigate the complexities of data regulations

Operational Efficiency: Streamline operations, and focus on strategic initiatives, driving

Informed Decision-Making: Strong foundation for analytics and reporting

Scalable Solutions: Sustainable data management practices at every stage of growth.

[email protected] +91 79755 52867

You might also like