0% found this document useful (0 votes)
59 views

Data Quality Management

This document is from Technoforte Pvt. Ltd., covering guidelines on data quality management
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
59 views

Data Quality Management

This document is from Technoforte Pvt. Ltd., covering guidelines on data quality management
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Data

Quality
Data Quality 1

S.NO CONTENTS PAGE NO

1 Introduction 2

2 Data Quality Dimensions 3

3 Practical Use of Data Quality 5


Dimensions

4 Data Quality Policies 5

5 Correction of Data 10

6 Data Governance with Technoforte 11


Data Quality 2

Introduction
Data quality is a fundamental aspect that data governance programs must tackle. It
primarily involves ensuring that the data aligns with business needs. Recognizing that
no process or modernization can succeed without reliable and high-quality data is
crucial. This applies to business applications, digital transformation initiatives, and the
deployment of AI/ML technologies.

Gartner describes data quality (DQ) solutions as “the set of processes and technologies
for identifying, understanding, preventing, escalating and correcting issues in data that
supports effective decision making and governance across all business processes.”

DAMA (Data Management Association) defines data quality as “the planning,


implementation and control of activities that apply data quality management
techniques to data in order to assure it is fit for consumption and meeting the needs of
data consumers."

An improved definition might describe data quality as ensuring that data is suitable for
business purposes, facilitating operational activities, analytics, and decision-making in
a way that enhances trust and efficiency.

Breaking down Gartner's definition, data quality consists of the following components:

Identifying: Conducting a series of checks or applying rules to detect data that fails
to meet business requirements.
Understanding: Determining the reasons why data is unreliable in order to address
these issues.
Preventing: Implementing controls to prevent the creation of poor-quality data
during manual data entry, business processes, data pipelines, or other data
manipulation activities.
Data Quality 3

Escalating: Establishing a method to enhance the visibility and priority of urgent


data quality issues.
Correcting Issues: Engaging in activities to improve data quality once issues that
affect business suitability are identified.

It is crucial to understand that data quality isn't solely about analytics; it's also about
maintaining accurate data in operational systems. Poor data in your data platform can
lead to bad business decisions, but poor data quality in operational systems can have
dire consequences. Examples of operational disruptions caused by bad data include:
(1) halting product manufacturing due to raw material shortages, (2) preventing a
logistics provider from delivering products due to payment problems, (3) sending
products to the wrong company or delivering the wrong product to the intended
customer, (4) crediting the wrong account for a customer refund, (5) offering the
incorrect product to a potential customer, (6) granting access to online content for
which a user is not licensed.

To ensure data is functional and business-ready, certain core requirements must be


met. This whitepaper will outline these essential needs and explore these concepts in
depth.

Data Quality Dimensions


Firstly, it is essential to establish a set of data quality dimensions. This concept has
evolved over the years, originally developed to understand perceptions of data quality.

DAMA (Data Management Association) has introduced a set of data quality dimensions
known as the Strong-Wang framework, designed to address various situations
including:

Intrinsic Objectivity Accuracy Believability Reputation

Appropriate
Contextual DQ Relevancy Value-added Timeliness Completeness amount of
data

Representational Ease of Representational Concise


Interpretability
DQ understanding consistency representation

Accessibility Objectivity Accessibility


Data Quality 4

The good news is that this concept has significantly evolved over time. DAMA -
Netherlands recently released a research paper detailing the extensive range of data
quality dimensions that a company could adopt. The reality is that each organization
develops its own dimensions to suit its specific needs. There isn't a single, universal
approach.

A typical dimension for assessing data quality might include:

Data Quality Dimension Meaning

Accuracy Data is correct, verifiable, and current.

Degree to which the data is unique within the


Uniqueness system/entity and across all systems/entities.

Consistency Data is consistent with other related attributes.

Verification that all necessary data elements


Completeness are present and all mandatory checks meet
baseline criteria.

Data is provided on time for both creation and


Timeliness updates.

Data adheres to the required format (pattern)


Conformity
for business use.
Data Quality 5

Practical Use of Data Quality


Dimensions
Data quality dimensions offer several key benefits. Firstly, they provide a framework for
evaluating governed fields by asking questions such as: “When reviewing the data
quality dimension ‘Completeness,’ are there specific rules we should implement to
better assess the business suitability of this data element?” This approach helps
establish the necessary data quality rules or checks. Applying this method across all
data quality dimensions ensures appropriate rules are in place for business suitability.

Secondly, analyzing data quality rules and aggregating them by data quality
dimensions allows us to identify the number of issues within each dimension, by field,
table, functional area, and overall data. This analysis provides a comprehensive view of
the current data quality status. Additionally, tracking these metrics over time reveals
trends in data quality improvement or decline, offering valuable visualizations to share
with senior leadership to showcase progress in enhancing data quality.

Lastly, summarizing data quality by dimensions helps demonstrate to end users why
they should trust the data or how it can address business needs.

Data Quality Policies


To begin with, it's crucial to establish general policies for data quality and publish
standardized guidelines for developing and using data with an emphasis on quality.
This includes:

Defining data quality dimensions (e.g., accuracy, completeness, consistency,


timeliness, uniqueness, validity)
Data Quality 6

Utilizing tools like Talend, SAS etc. and techniques for addressing data quality in
front-end applications, such as data quality firewalls and front-end edits
Implementing data pipelines, data correction, improvement, and exception
handling within standard operating procedures
Creating standard data quality dashboards and reports, including specifications for
new data products

A solid foundation of tools and processes is essential for data quality initiatives to
succeed. Without it, these initiatives will struggle to progress.

Next, core policies at the field level must be established to operationalize data quality.
These policies typically cover data definitions, formats, ranges, and lifecycle
management. Common policy standards include:

Data exception processing (e.g., handling missing or anomalous data)


String handling (e.g., trimming whitespace, ensuring proper case)
Phone number formats (e.g., international vs. local formatting standards)
Date handling, including data formats (e.g., ISO 8601) and types
Handling long strings, such as field lengths, special characters, and carriage returns
Execution of data quality checks, specifying frequency and ownership
Defining data value ranges (e.g., permissible values for fields)
Enforcing valid values (e.g., using lookup tables or reference data)
Maintaining cross-reference (XREF) or crosswalks (e.g., mapping codes to
descriptions)
Correct data handling procedures (e.g., data entry guidelines, data review
processes)
Data Quality 7

Third, define data quality rules (data policies) and document them in a policy
repository. It's crucial to provide a data quality definition that is easily understandable
for any business user. Additionally, specify the technical requirements for these rules
separately so they can be implemented using either a code-heavy approach or the
available data quality tools.

Fourth, develop a set of data quality reports using your preferred BI tool to detail data
quality metrics. Creating a data quality dashboard that offers a comprehensive
overview of data quality across the organization is also beneficial.

While ensuring data is fit for business purposes is a significant aspect of data quality, it
is only one component. To enhance your data governance program, consider
leveraging the following capabilities commonly found in top-tier data quality tools:

Data profiling
Deduplication
Data quality checks within data pipelines
Exception processing
Data quality firewalls
Monitoring data decay and data drift
Dashboards and reporting

Profiling
Profiling involves using data quality tools to automatically run numerous queries and
generate a detailed report about the data set being examined. This report typically
includes inferred and defined data types, minimum and maximum values, most
common value, average, number of nulls, low and high values, duplicates, patterns, and
sample data. Inferred data types and patterns are particularly valuable insights from
profiles. Modern data observability tools may use AI scans instead of traditional
profiling but serve a similar purpose as the next generation of profiling techniques.

Deduplication
Deduplication involves standardizing data, applying algorithms to identify potential
duplicates, generating cross-references, and incorporating business user input to
confirm duplicate groupings. The process aims to eliminate duplicate records, ensure
accurate householding, and enhance data integrity.

Data Pipeline Data Quality Checks


These checks are integrated with data integration routines to evaluate data quality.
They perform various checks and, based on the results, apply additional
transformations to ensure the data is suitable for loading into the target application.
These checks help in maintaining data quality throughout the data pipeline, preventing
the propagation of errors.
Data Quality 8

Exception Processing
This technique, utilized within ELT, ETL, or Code Solutions, involves executing data quality
checks integrated with a data quality tool. Any data that fails to meet the specified
rules and checks is discarded. The exceptional data is then rectified, reprocessed, and
subsequently loaded successfully, ensuring a robust data processing workflow. This
process helps in identifying and correcting data issues early, maintaining high data
quality standards.

Data Quality Firewalls


These are advanced real-time rules that conduct data checks and provide immediate
feedback to rectify data before it is stored in the database or other data repository. By
catching errors at the point of entry, data quality firewalls prevent bad data from
entering the system, ensuring that only high-quality data is stored and used.

Data Decay and Data Drift


This involves monitoring data quality and taking corrective action as data accuracy
diminishes over time. As data ages, it may become less reliable (data decay) or exhibit
predictable distortions (data drift), necessitating correction measures.

Dashboard & Reporting


Dashboard and reporting functions encompass a series of reports that convey the
narrative of data quality. These reports may include summarized perspectives of data
categorized by data quality dimensions or source objects. Alternatively, they may offer
detailed insights, providing data specialists with a breakdown of all data, highlighting
areas requiring attention. Dashboards and reports serve multiple purposes, from
prioritizing data issues to presenting progress to leadership or showcasing data quality
enhancements.

These capabilities aim to tackle challenges and ensure the acquisition of high-quality
data, foster a comprehensive understanding of data quality concepts, and meet the
requirements for data to align seamlessly with business objectives. Additionally, it is
crucial to establish a framework that incorporates data quality into the data lifecycle.
One such framework is POSMAD.

POSMAD, introduced by Danette McGilvray in the book "Executing Data Quality Projects,"
is widely recognized as a seminal work in the field of data quality. POSMAD stands for:
Data Quality 9

(P) Plan: Identify objectives, plan data architecture, and establish standards and
definitions.
(O) Obtain: Document data acquisition methods and procedures.
(S) Store and Share: Define data storage locations, storage methods, and data
accessibility protocols.
(M) Maintain: Outline data maintenance procedures, including cleansing, matching,
merging, deduplication, and enhancement.
(A) Apply: Determine data usage methods and tools for accessing data.
(D) Dispose: Address data retirement, archiving, movement, or deletion as part of the
data lifecycle.

This cycle is often referred to as Information Lifecycle Management (ILM).

Plan

Dispose Obtain

Information
Lifecycle
Management
Apply Store and
Share

Maintain
Data Quality 10

Correction of Data
Another crucial aspect is data correction, involving corrective measures on data
flagged with issues. When utilizing data quality tools, it's essential not only to detect
data anomalies but also to pinpoint the data requiring correction, with precise
instructions on remedial actions. Simply identifying outliers isn't sufficient; we must
rectify data through various means, including:

Logging change requests and tasking a business user with manual data
rectification (though this is the least preferable option).
Incorporating rules into data pipelines to rectify data (a highly efficient approach).
Implementing execution processes wherein erroneous data is diverted from regular
data flows to exception files. Remedial actions are then automated to rectify the
data before reintegrating it into normal data flows.
Integrating front-end edits to prevent the entry of flawed data; these edits may
include validation checks, warnings, or other assistive features. Data is only saved
once all edits are successfully passed.
Establishing a data quality firewall, which represents a more advanced solution than
edits. This firewall verifies data against various web services and function modules,
assisting end-users in preventing duplicate records and ensuring higher-quality
data.
Data Quality 11

Data Governance with Technoforte


Data quality is critical for all data governance initiatives in an organization. With
Technoforte, you can establish data quality standards utilizing our data governance
services.

Our services include:


Data Ingestion
ETL
Data Warehousing
Data Analytics & Insights
Predictive Analytics
Big Data Analytics
Data Integration and Data Migration
Data Governance

Why Technoforte?
Unmatched Data Quality: Meticulously curate and maintain your data

Enhanced Compliance and Security: Navigate the complexities of data regulations


with ease

Operational Efficiency: Streamline operations, and focus on strategic initiatives, driving


productivity and growth

Informed Decision-Making: Strong foundation for analytics and reporting

Scalable Solutions: Sustainable data management practices at every stage of growth.

Contact us today to learn how we can enhance your data quality and drive your
business to new heights.

[email protected] +91 79755 52867

You might also like