0% found this document useful (0 votes)

226 views10 pages

Data Quality and Its Parameters

The document discusses data quality and its importance. It defines data quality as a perception of data's fitness for its intended purpose, which is determined by factors like accuracy, completeness, and recency. Poor data quality can cost organizations billions annually. The document then outlines aspects of data quality like accuracy, validity, reliability, and timeliness. It provides examples of how to ensure data has these high quality aspects.

Uploaded by

yash guptaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

226 views10 pages

Data Quality and Its Parameters

Uploaded by

yash guptaa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 10

Data Quality and its Parameters –

Data - facts and statistics collected together for reference or analysis. Data are the
values of subjects with respect to qualitative or quantitative variables. Data and
information are often used interchangeably; however, the extent to which a set of
data is informative to someone depends on the extent to which it is unexpected by
that person.

Data quality-

Data quality is a perception or an assessment of data's fitness to serve its purpose in

a given context. The quality of data is determined by factors such as accuracy,
completeness, reliability, relevance and how up to date it is. As data has become
more intricately linked with the operations of organizations, the emphasis on data
quality has gained greater attention.

Why data quality is important

Poor-quality data is often pegged as the source of inaccurate reporting and ill-
conceived strategies in a variety of companies, and some have attempted to quantify
the damage done. Economic damage due to data quality problems can range from
added miscellaneous expenses when packages are shipped to wrong addresses, all
the way to steep regulatory compliance fines for improper financial reporting.

An oft-cited estimate originating from IBM suggests the yearly cost of data quality
issues in the U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business
managers in data quality is commonly cited among chief impediments to decision-
making.

The demon of poor data quality was particularly common in the early days of
corporate computing, when most data was entered manually. Even as more
automation took hold, data quality issues rose in prominence. For a number of years,
the image of deficient data quality was represented in stories of meetings at which
department heads sorted through differing spreadsheet numbers that ostensibly
described the same activity.

Determining data quality

Aspects, or dimensions, important to data quality include: accuracy, or correctness;

completeness, which determines if data is missing or unusable; conformity, or
adherence to a standard format; consistency, or lack of conflict with other data
values; and duplication, or repeated records.

As a first step toward data quality, organizations typically perform data asset
inventories in which the relative value, uniqueness and validity of data can
undergo baseline studies. Established baseline ratings for known good data sets are
then used for comparison against data in the organization going forward.

Methodologies for such data quality projects include the Data Quality Assessment
Framework (DQAF), which was created by the International Monetary Fund (IMF)
to provide a common method for assessing data quality. The DQAF provides
guidelines for measuring data dimensions that include timeliness, in which actual
times of data delivery are compared to anticipated data delivery schedules.
Data quality management -

Several steps typically mark data quality efforts. In a data quality management cycle
identified by data expert David Loshin, data quality management begins with
identifying and measuring the effect of business outcomes. Rules are defined,
performance targets are set, and quality improvement methods as well as specific
data cleansing, or data scrubbing, and enhancement processes are put in place.
Results are then monitored as part of ongoing measurement of the use of the data in
the organization. This virtuous cycle of data quality management is intended to
assure consistent improvement of overall data quality continues after initial data
quality efforts are completed.

Software tools specialized for data quality management match records, delete
duplicates, establish remediation policies and identify personally identifiable data.
Management consoles for data quality support creation of rules for data handling to
maintain data integrity, discovering data relationships and automated data
transforms that may be part of quality control efforts.

Collaborative views and workflow enablement tools have become more common,
giving data stewards, who are charged with maintaining data quality, views into
corporate data repositories. These tools and related processes are often closely linked
with master data management (MDM) systems that have become part of many data
governance efforts.

Data quality management tools include IBM InfoSphere Information Server for Data
Quality, Informatica Data Quality, Oracle Enterprise Data Quality, Pitney Bowes
Spectrum Technology Platform, SAP Data Quality Management and Data Services,
SAS DataFlux and others.

Defining Quality Data -

While many organizations boast of having good data or improving the quality of
their data, the real challenge is defining what those qualities represent. What some
consider good quality others might view as poor. Judging the quality of data requires
an examination of its characteristics and then weighing those characteristics
according to what is most important to the organization and the application(s) for
which they are being used.

Accuracy-

This characteristic refers to the exactness of the data. It cannot have any erroneous
elements and must convey the correct message without being misleading. This
accuracy and precision have a component that relates to its intended use. Without
understanding how the data will be consumed, ensuring accuracy and precision
could be off-target or more costly than necessary. For example, accuracy in
healthcare might be more important than in another industry (which is to say,
inaccurate data in healthcare could have more serious consequences) and, therefore,
justifiably worth higher levels of investment.
Data should be sufficiently accurate for the intended use and should be captured only
once, although it may have multiple uses. Data should be captured at the point of
activity.

 Data is always captured at the point of activity. Performance data is directly

input into PerformancePlus1 (P+) by the service manager or nominated data
entry staff.
 Access to P+ for the purpose of data entry is restricted through secure
password controls and limited access to appropriate data entry pages.
Individual passwords can be changed by the user and which under no
circumstances should be used by anyone other than that user.
 Where appropriate, base data, i.e. denominators and numerators, will be
input into the system which will then calculate the result. These have been
determined in accordance with published guidance or agreed locally. This
will eliminate calculation errors at this stage of the process, as well as provide
contextual information for the reader.
 Data used for multiple purposes, such as population and number of
households, is input once by the system administrator.

Validity –

Requirements governing data set the boundaries of this characteristic. For example,
on surveys, items such as gender, ethnicity, and nationality are typically limited to a
set of options and open answers are not permitted. Any answers other than these
would not be considered valid or legitimate based on the survey’s requirement. This
is the case for most data and must be carefully considered when determining its
quality. The people in each department in an organization understand what data is
valid or not to them, so the requirements must be leveraged when evaluating data
quality.
Data should be recorded and used in compliance with relevant requirements,
including the correct application of any rules or definitions. This will ensure
consistency between periods and with similar organisations, measuring what is
intended to be measured.

 Relevant guidance and definitions are provided for all statutory performance
indicators. Service Heads are informed of any revisions and amendments
within 24 hours of receipt from the relevant government department. Local
performance indicators comply with locally agreed guidance and definitions.

Reliability –

Many systems in today’s environments use and/or collect the same source data.
Regardless of what source collected the data or where it resides, it cannot contradict
a value residing in a different source or collected by a different system. There must
be a stable and steady mechanism that collects and stores the data without
contradiction or unwarranted variance.
Data should reflect stable and consistent data collection processes across collection
points and over time. Progress toward performance targets should reflect real
changes rather than variations in data collection approaches or methods.

 Source data is clearly identified and readily available from manual,

automated or other systems and records. Protocols exist where data is
provided from a third party, such as Hertfordshire Constabulary and
Hertfordshire County Council

Timeliness
There must be a valid reason to collect the data to justify the effort required, which
also means it has to be collected at the right moment in time. Data collected too soon
or too late could misrepresent a situation and drive inaccurate decisions.
Data should be captured as quickly as possible after the event or activity and must
be available for the intended use within a reasonable time period. Data must be
available quickly and frequently enough to support information needs and to
influence service or management decisions.
 Performance data is requested to be available within one calendar month
from the end of the previous quarter and is subsequently reported to the
respective Policy and Scrutiny Panel on a quarterly basis. As a part of the
ongoing development of PerformancePlus it is intended that performance
information will be exported through custom reporting and made available
via the Three Rivers DC website. This will improve access to information and
eliminate delays in publishing information through traditional methods.

Completeness -

Incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to

a partial view of the overall picture to be displayed. Without a complete picture of
how operations are running, uninformed actions will occur. It’s important to
understand the complete set of requirements that constitute a comprehensive set of
data to determine whether or not the requirements are being fulfilled.

Data requirements should be clearly specified based on the information needs of the
organisation and data collection processes matched to these requirements.
Refers to the relationship between database objects and the abstract universe of all
such objects.
● Includes selection criteria, definitions and other mapping rules used to create the
database

Relevance -
Data captured should be relevant to the purposes for which it is to be used. This
will require a periodic review of requirements to reflect changing needs.

Availability and Accessibility:

This characteristic can be tricky at times due to legal and regulatory constraints.
Regardless of the challenge, though, individuals need the right level of access to
the data in order to perform their jobs. This presumes that the data exists and is
available for access to be granted.
Granularity and Uniqueness: The level of detail at which data is collected is
important, because confusion and inaccurate decisions can otherwise occur.
Aggregated, summarized and manipulated collections of data could offer a different
meaning than the data implied at a lower level. An appropriate level of granularity
must be defined to provide sufficient uniqueness and distinctive properties to
become visible. This is a requirement for operations to function effectively.

There are many elements that determine data quality, and each can be prioritized
differently by different organizations. The prioritization could change depending on
the stage of growth of an organization or even its current business cycle. The key is
to remember you must define what is most important for your organization when
evaluating data. Then, use these characteristics to define the criteria for high-quality,
accurate data. Once defined, you can be assured of a better understanding and are
better positioned to achieve your goals.

Emerging data quality challenges -

Over time, the burden of data quality efforts centered on the governance of relational
data in organizations, but that began to change as web and cloud computing
architectures came into prominence.

Unstructured data, text, natural language processing and object data became part of
the data quality mission. The variety of data was such that data experts began to
assign different degrees of trust to various data sets, forgoing approaches that took a
single, monolithic view of data quality.

Also, the classic issues of garbage in/garbage out that drove data quality efforts in
early computing resurfaced with artificial intelligence (AI) and machine
learning applications, in which data preparation often became the most demanding
of data teams' resources.

The higher volume and speed of arrival of new data also became a greater challenge
for the data quality steward.
Expansion of data's use in digital commerce, along with ubiquitous online activity,
has only intensified data quality concerns. While errors from rekeying data are a
thing of the past, dirty data is still a common nuisance.

Protecting the privacy of individuals' data became a mild concern for data quality
teams beginning in the 1970s, growing to become a major issue with the spread of
data acquired via social media in the 2010s. With the formal implementation of the
General Data Protection Regulation (GDPR) in the European Union (EU) in 2018,
the demands for data quality expertise were expanded yet again.

Fixing data quality issues

With GDPR and the risks of data breaches, many companies find themselves in a
situation where they must fix data quality issues.

The first step toward fixing data quality requires identifying all the problem data.
Software can be used to perform a data quality assessment to verify data sources are
accurate, determine how much data there is and the potential impact of a data breach.
From there, companies can build a data quality program, with the help of data
stewards, data protection officers or other data management professionals. These
data management experts will help implement business processes that ensure future
data collection and use meets regulatory guidelines and provides the value that
businesses expect from data they collect.

Mamaearth+Case+Study+Submission+PPT+ (2) + (2) Final
100% (3)
Mamaearth+Case+Study+Submission+PPT+ (2) + (2) Final
17 pages
Big Data Maturity Model
100% (1)
Big Data Maturity Model
6 pages
Data Quality Concepts PDF
100% (3)
Data Quality Concepts PDF
83 pages
Cio Presentation Stanfords Data Governance Program Final
No ratings yet
Cio Presentation Stanfords Data Governance Program Final
28 pages
Delivering Data Governance With A Yes
100% (1)
Delivering Data Governance With A Yes
39 pages
Data Governance Maturity Model
No ratings yet
Data Governance Maturity Model
42 pages
Data Quality
No ratings yet
Data Quality
76 pages
Data Quality
100% (2)
Data Quality
60 pages
Chapter 1 - Introduction To Quality
No ratings yet
Chapter 1 - Introduction To Quality
10 pages
Trainer Manual Internal Audit
100% (1)
Trainer Manual Internal Audit
28 pages
Test Bank Procurement
No ratings yet
Test Bank Procurement
13 pages
Data Quality Strategy 2013
100% (1)
Data Quality Strategy 2013
16 pages
Course Outline IMC 2025
No ratings yet
Course Outline IMC 2025
6 pages
Data Quality Standards
No ratings yet
Data Quality Standards
11 pages
Data Governance A Conceptual Framework in Order To Prevent Your Data Lake From Becoming A Data Swamp
No ratings yet
Data Governance A Conceptual Framework in Order To Prevent Your Data Lake From Becoming A Data Swamp
45 pages
Collibra Prescriptive Path v2
100% (1)
Collibra Prescriptive Path v2
4 pages
DQ Architecture
0% (1)
DQ Architecture
3 pages
Overview of Data Governance in Business
100% (1)
Overview of Data Governance in Business
14 pages
SAP Landscape Slide v9
100% (1)
SAP Landscape Slide v9
5 pages
Data Quality Procedure
No ratings yet
Data Quality Procedure
8 pages
IDQ - 1WMP Data Migration Use Cases
100% (1)
IDQ - 1WMP Data Migration Use Cases
11 pages
1 2 3 4 5 6 7 8 9 Merged
No ratings yet
1 2 3 4 5 6 7 8 9 Merged
77 pages
Informatica Data Qaulity Technical Design Document
0% (1)
Informatica Data Qaulity Technical Design Document
17 pages
QAP - Gate Valve - NPCIL
No ratings yet
QAP - Gate Valve - NPCIL
6 pages
Best Practices For Implementing Cloud Data Governance and Catalog
100% (1)
Best Practices For Implementing Cloud Data Governance and Catalog
45 pages
Sinokor
No ratings yet
Sinokor
23 pages
Data Quality Management Model
No ratings yet
Data Quality Management Model
8 pages
Data Governance Framework at Stony Brook University: Scope
100% (1)
Data Governance Framework at Stony Brook University: Scope
6 pages
Meaning of New Incoterms 2022
No ratings yet
Meaning of New Incoterms 2022
9 pages
Applying Data Governance Using DAMA-DMBOK 2 Framework The Case For Human Capital Management Operations
No ratings yet
Applying Data Governance Using DAMA-DMBOK 2 Framework The Case For Human Capital Management Operations
8 pages
Data Dictionary Requirements
No ratings yet
Data Dictionary Requirements
5 pages
DSS-BI - Foundation&Technology Chapter 2
100% (1)
DSS-BI - Foundation&Technology Chapter 2
4 pages
5 Fundamental Data Quality Practices
No ratings yet
5 Fundamental Data Quality Practices
12 pages
Data Quality Assessment Manager (DQAM)
No ratings yet
Data Quality Assessment Manager (DQAM)
2 pages
Data Governance Your Number One Priority
No ratings yet
Data Governance Your Number One Priority
15 pages
Data and Information Governance Maturity Questionnaire
No ratings yet
Data and Information Governance Maturity Questionnaire
3 pages
Data Quality Rule
No ratings yet
Data Quality Rule
6 pages
Holistic Data Governance: A Framework For Competitive Advantage
No ratings yet
Holistic Data Governance: A Framework For Competitive Advantage
21 pages
Data Trends and Predictions 2022
No ratings yet
Data Trends and Predictions 2022
18 pages
Designing A Data Governance Model Based
No ratings yet
Designing A Data Governance Model Based
7 pages
Fab Weld
No ratings yet
Fab Weld
83 pages
Frooto
No ratings yet
Frooto
25 pages
Maintenance Module 2012
No ratings yet
Maintenance Module 2012
94 pages
4 - Finding and Fixing Data Quality Issues
No ratings yet
4 - Finding and Fixing Data Quality Issues
48 pages
Project Management FINAL REPORT
No ratings yet
Project Management FINAL REPORT
17 pages
Title The 5 Essential Components of A Data Strategy: White Paper
No ratings yet
Title The 5 Essential Components of A Data Strategy: White Paper
16 pages
Enterprice Architecture
No ratings yet
Enterprice Architecture
13 pages
Idq 1
No ratings yet
Idq 1
13 pages
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
No ratings yet
XBRL US Pacific Rim Workshop Database and Business Intelligence Workshop Karen Hsu Director Product Marketing, Informatica
18 pages
Chapter 5 - (6) - Modern Factory Management - w7
No ratings yet
Chapter 5 - (6) - Modern Factory Management - w7
60 pages
The Ultimate Guide To Modern Data Quality Management DQM For An Effective Data Quality Control Driven
No ratings yet
The Ultimate Guide To Modern Data Quality Management DQM For An Effective Data Quality Control Driven
14 pages
Isom Midterms
No ratings yet
Isom Midterms
27 pages
Life Cycle of Solar Power Plants
No ratings yet
Life Cycle of Solar Power Plants
30 pages
Tdwi Best Practices Report Building The Unified Data Warehouse and Data Lake
No ratings yet
Tdwi Best Practices Report Building The Unified Data Warehouse and Data Lake
32 pages
NYKAA: A Comprehensive Analysis of A Leading Indian E-Commerce Cosmetic Company
No ratings yet
NYKAA: A Comprehensive Analysis of A Leading Indian E-Commerce Cosmetic Company
12 pages
E Learning On ISO TS 17021 3 QMS Rev1
No ratings yet
E Learning On ISO TS 17021 3 QMS Rev1
33 pages
DI CDO Playbook Effective Data Story
No ratings yet
DI CDO Playbook Effective Data Story
6 pages
Data Scorecard
No ratings yet
Data Scorecard
11 pages
Data Governance
No ratings yet
Data Governance
25 pages
Data Quality A Survey of Data Quality Dimensions
No ratings yet
Data Quality A Survey of Data Quality Dimensions
5 pages
Chapter 1
No ratings yet
Chapter 1
31 pages
Basics of MRP Area - SAP Blogs
No ratings yet
Basics of MRP Area - SAP Blogs
6 pages
Data Quality Assessment Guide
No ratings yet
Data Quality Assessment Guide
7 pages
Data-Quality Brochure 6787
No ratings yet
Data-Quality Brochure 6787
6 pages
Informatica Data Quality Data Sheet
No ratings yet
Informatica Data Quality Data Sheet
4 pages
Profisee Datasheet Integrator 8.5x11
No ratings yet
Profisee Datasheet Integrator 8.5x11
1 page
Maf661 Chap6 Latest
No ratings yet
Maf661 Chap6 Latest
17 pages
Three Rivers DC Data Quality Strategy
No ratings yet
Three Rivers DC Data Quality Strategy
11 pages
Mis Group 6 Assignment 1
No ratings yet
Mis Group 6 Assignment 1
10 pages
The Northcap University Internship Report: TOPIC: Digital Marketing Management
No ratings yet
The Northcap University Internship Report: TOPIC: Digital Marketing Management
11 pages
Hospital Management System
No ratings yet
Hospital Management System
6 pages
Encyclopedia 02 00032 v2
No ratings yet
Encyclopedia 02 00032 v2
13 pages
Reviewer Entrep q2
No ratings yet
Reviewer Entrep q2
12 pages
Data Quality MDM
No ratings yet
Data Quality MDM
20 pages
Module 4
No ratings yet
Module 4
24 pages
Dimensional Model Data Warehouse Overview
No ratings yet
Dimensional Model Data Warehouse Overview
2 pages
Data Quality - Trusted Data Across The Entreprise - Overview
100% (1)
Data Quality - Trusted Data Across The Entreprise - Overview
14 pages
Preso Accenture - INFADAY - 2011
No ratings yet
Preso Accenture - INFADAY - 2011
18 pages
Data Quality
No ratings yet
Data Quality
5 pages
CMP1042 Information Systems
No ratings yet
CMP1042 Information Systems
4 pages
Alert and Macro
No ratings yet
Alert and Macro
11 pages
DQM Successfulmigration
No ratings yet
DQM Successfulmigration
3 pages
Infographic Poster (Netflix)
No ratings yet
Infographic Poster (Netflix)
2 pages
Insurance DataWare House Design Vechiles
No ratings yet
Insurance DataWare House Design Vechiles
2 pages
Resume (1) New
No ratings yet
Resume (1) New
2 pages
FSLDM Data Modeller
No ratings yet
FSLDM Data Modeller
1 page
Data Warehouse ETL Testing Best Practices
No ratings yet
Data Warehouse ETL Testing Best Practices
6 pages
Company: Groww, India Job Profile: Apply With CV1 Apply With CV2 Apply With CV3
No ratings yet
Company: Groww, India Job Profile: Apply With CV1 Apply With CV2 Apply With CV3
1 page
Ds Data Quality Business Intelligence
No ratings yet
Ds Data Quality Business Intelligence
2 pages
Innovations in MDM Implementation: Success Via A Boxed Approach
No ratings yet
Innovations in MDM Implementation: Success Via A Boxed Approach
4 pages

Data Quality and Its Parameters

Uploaded by

Data Quality and Its Parameters

Uploaded by

Data Quality and its Parameters –

Data quality is a perception or an assessment of data's fitness to serve its purpose in

Why data quality is important

Determining data quality

Aspects, or dimensions, important to data quality include: accuracy, or correctness;

Defining Quality Data -

 Data is always captured at the point of activity. Performance data is directly

 Source data is clearly identified and readily available from manual,

Incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to

Availability and Accessibility:

Emerging data quality challenges -

Fixing data quality issues

You might also like