0% found this document useful (0 votes)
226 views10 pages

Data Quality and Its Parameters

The document discusses data quality and its importance. It defines data quality as a perception of data's fitness for its intended purpose, which is determined by factors like accuracy, completeness, and recency. Poor data quality can cost organizations billions annually. The document then outlines aspects of data quality like accuracy, validity, reliability, and timeliness. It provides examples of how to ensure data has these high quality aspects.

Uploaded by

yash guptaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
226 views10 pages

Data Quality and Its Parameters

The document discusses data quality and its importance. It defines data quality as a perception of data's fitness for its intended purpose, which is determined by factors like accuracy, completeness, and recency. Poor data quality can cost organizations billions annually. The document then outlines aspects of data quality like accuracy, validity, reliability, and timeliness. It provides examples of how to ensure data has these high quality aspects.

Uploaded by

yash guptaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Data Quality and its Parameters –

Data - facts and statistics collected together for reference or analysis. Data are the
values of subjects with respect to qualitative or quantitative variables. Data and
information are often used interchangeably; however, the extent to which a set of
data is informative to someone depends on the extent to which it is unexpected by
that person.

Data quality-

Data quality is a perception or an assessment of data's fitness to serve its purpose in


a given context. The quality of data is determined by factors such as accuracy,
completeness, reliability, relevance and how up to date it is. As data has become
more intricately linked with the operations of organizations, the emphasis on data
quality has gained greater attention.

Why data quality is important

Poor-quality data is often pegged as the source of inaccurate reporting and ill-
conceived strategies in a variety of companies, and some have attempted to quantify
the damage done. Economic damage due to data quality problems can range from
added miscellaneous expenses when packages are shipped to wrong addresses, all
the way to steep regulatory compliance fines for improper financial reporting.

An oft-cited estimate originating from IBM suggests the yearly cost of data quality
issues in the U.S. during 2016 alone was about $3.1 trillion. Lack of trust by business
managers in data quality is commonly cited among chief impediments to decision-
making.

The demon of poor data quality was particularly common in the early days of
corporate computing, when most data was entered manually. Even as more
automation took hold, data quality issues rose in prominence. For a number of years,
the image of deficient data quality was represented in stories of meetings at which
department heads sorted through differing spreadsheet numbers that ostensibly
described the same activity.

Determining data quality

Aspects, or dimensions, important to data quality include: accuracy, or correctness;


completeness, which determines if data is missing or unusable; conformity, or
adherence to a standard format; consistency, or lack of conflict with other data
values; and duplication, or repeated records.

As a first step toward data quality, organizations typically perform data asset
inventories in which the relative value, uniqueness and validity of data can
undergo baseline studies. Established baseline ratings for known good data sets are
then used for comparison against data in the organization going forward.

Methodologies for such data quality projects include the Data Quality Assessment
Framework (DQAF), which was created by the International Monetary Fund (IMF)
to provide a common method for assessing data quality. The DQAF provides
guidelines for measuring data dimensions that include timeliness, in which actual
times of data delivery are compared to anticipated data delivery schedules.
Data quality management -

Several steps typically mark data quality efforts. In a data quality management cycle
identified by data expert David Loshin, data quality management begins with
identifying and measuring the effect of business outcomes. Rules are defined,
performance targets are set, and quality improvement methods as well as specific
data cleansing, or data scrubbing, and enhancement processes are put in place.
Results are then monitored as part of ongoing measurement of the use of the data in
the organization. This virtuous cycle of data quality management is intended to
assure consistent improvement of overall data quality continues after initial data
quality efforts are completed.

Software tools specialized for data quality management match records, delete
duplicates, establish remediation policies and identify personally identifiable data.
Management consoles for data quality support creation of rules for data handling to
maintain data integrity, discovering data relationships and automated data
transforms that may be part of quality control efforts.

Collaborative views and workflow enablement tools have become more common,
giving data stewards, who are charged with maintaining data quality, views into
corporate data repositories. These tools and related processes are often closely linked
with master data management (MDM) systems that have become part of many data
governance efforts.

Data quality management tools include IBM InfoSphere Information Server for Data
Quality, Informatica Data Quality, Oracle Enterprise Data Quality, Pitney Bowes
Spectrum Technology Platform, SAP Data Quality Management and Data Services,
SAS DataFlux and others.

Defining Quality Data -

While many organizations boast of having good data or improving the quality of
their data, the real challenge is defining what those qualities represent. What some
consider good quality others might view as poor. Judging the quality of data requires
an examination of its characteristics and then weighing those characteristics
according to what is most important to the organization and the application(s) for
which they are being used.

Accuracy-

This characteristic refers to the exactness of the data. It cannot have any erroneous
elements and must convey the correct message without being misleading. This
accuracy and precision have a component that relates to its intended use. Without
understanding how the data will be consumed, ensuring accuracy and precision
could be off-target or more costly than necessary. For example, accuracy in
healthcare might be more important than in another industry (which is to say,
inaccurate data in healthcare could have more serious consequences) and, therefore,
justifiably worth higher levels of investment.
Data should be sufficiently accurate for the intended use and should be captured only
once, although it may have multiple uses. Data should be captured at the point of
activity.

 Data is always captured at the point of activity. Performance data is directly


input into PerformancePlus1 (P+) by the service manager or nominated data
entry staff.
 Access to P+ for the purpose of data entry is restricted through secure
password controls and limited access to appropriate data entry pages.
Individual passwords can be changed by the user and which under no
circumstances should be used by anyone other than that user.
 Where appropriate, base data, i.e. denominators and numerators, will be
input into the system which will then calculate the result. These have been
determined in accordance with published guidance or agreed locally. This
will eliminate calculation errors at this stage of the process, as well as provide
contextual information for the reader.
 Data used for multiple purposes, such as population and number of
households, is input once by the system administrator.

Validity –

Requirements governing data set the boundaries of this characteristic. For example,
on surveys, items such as gender, ethnicity, and nationality are typically limited to a
set of options and open answers are not permitted. Any answers other than these
would not be considered valid or legitimate based on the survey’s requirement. This
is the case for most data and must be carefully considered when determining its
quality. The people in each department in an organization understand what data is
valid or not to them, so the requirements must be leveraged when evaluating data
quality.
Data should be recorded and used in compliance with relevant requirements,
including the correct application of any rules or definitions. This will ensure
consistency between periods and with similar organisations, measuring what is
intended to be measured.

 Relevant guidance and definitions are provided for all statutory performance
indicators. Service Heads are informed of any revisions and amendments
within 24 hours of receipt from the relevant government department. Local
performance indicators comply with locally agreed guidance and definitions.

Reliability –

Many systems in today’s environments use and/or collect the same source data.
Regardless of what source collected the data or where it resides, it cannot contradict
a value residing in a different source or collected by a different system. There must
be a stable and steady mechanism that collects and stores the data without
contradiction or unwarranted variance.
Data should reflect stable and consistent data collection processes across collection
points and over time. Progress toward performance targets should reflect real
changes rather than variations in data collection approaches or methods.

 Source data is clearly identified and readily available from manual,


automated or other systems and records. Protocols exist where data is
provided from a third party, such as Hertfordshire Constabulary and
Hertfordshire County Council

Timeliness
There must be a valid reason to collect the data to justify the effort required, which
also means it has to be collected at the right moment in time. Data collected too soon
or too late could misrepresent a situation and drive inaccurate decisions.
Data should be captured as quickly as possible after the event or activity and must
be available for the intended use within a reasonable time period. Data must be
available quickly and frequently enough to support information needs and to
influence service or management decisions.
 Performance data is requested to be available within one calendar month
from the end of the previous quarter and is subsequently reported to the
respective Policy and Scrutiny Panel on a quarterly basis. As a part of the
ongoing development of PerformancePlus it is intended that performance
information will be exported through custom reporting and made available
via the Three Rivers DC website. This will improve access to information and
eliminate delays in publishing information through traditional methods.

Completeness -

Incomplete data is as dangerous as inaccurate data. Gaps in data collection lead to


a partial view of the overall picture to be displayed. Without a complete picture of
how operations are running, uninformed actions will occur. It’s important to
understand the complete set of requirements that constitute a comprehensive set of
data to determine whether or not the requirements are being fulfilled.

Data requirements should be clearly specified based on the information needs of the
organisation and data collection processes matched to these requirements.
Refers to the relationship between database objects and the abstract universe of all
such objects.
● Includes selection criteria, definitions and other mapping rules used to create the
database

Relevance -
Data captured should be relevant to the purposes for which it is to be used. This
will require a periodic review of requirements to reflect changing needs.

Availability and Accessibility:


This characteristic can be tricky at times due to legal and regulatory constraints.
Regardless of the challenge, though, individuals need the right level of access to
the data in order to perform their jobs. This presumes that the data exists and is
available for access to be granted.
Granularity and Uniqueness: The level of detail at which data is collected is
important, because confusion and inaccurate decisions can otherwise occur.
Aggregated, summarized and manipulated collections of data could offer a different
meaning than the data implied at a lower level. An appropriate level of granularity
must be defined to provide sufficient uniqueness and distinctive properties to
become visible. This is a requirement for operations to function effectively.

There are many elements that determine data quality, and each can be prioritized
differently by different organizations. The prioritization could change depending on
the stage of growth of an organization or even its current business cycle. The key is
to remember you must define what is most important for your organization when
evaluating data. Then, use these characteristics to define the criteria for high-quality,
accurate data. Once defined, you can be assured of a better understanding and are
better positioned to achieve your goals.

Emerging data quality challenges -

Over time, the burden of data quality efforts centered on the governance of relational
data in organizations, but that began to change as web and cloud computing
architectures came into prominence.

Unstructured data, text, natural language processing and object data became part of
the data quality mission. The variety of data was such that data experts began to
assign different degrees of trust to various data sets, forgoing approaches that took a
single, monolithic view of data quality.

Also, the classic issues of garbage in/garbage out that drove data quality efforts in
early computing resurfaced with artificial intelligence (AI) and machine
learning applications, in which data preparation often became the most demanding
of data teams' resources.

The higher volume and speed of arrival of new data also became a greater challenge
for the data quality steward.
Expansion of data's use in digital commerce, along with ubiquitous online activity,
has only intensified data quality concerns. While errors from rekeying data are a
thing of the past, dirty data is still a common nuisance.

Protecting the privacy of individuals' data became a mild concern for data quality
teams beginning in the 1970s, growing to become a major issue with the spread of
data acquired via social media in the 2010s. With the formal implementation of the
General Data Protection Regulation (GDPR) in the European Union (EU) in 2018,
the demands for data quality expertise were expanded yet again.

Fixing data quality issues

With GDPR and the risks of data breaches, many companies find themselves in a
situation where they must fix data quality issues.

The first step toward fixing data quality requires identifying all the problem data.
Software can be used to perform a data quality assessment to verify data sources are
accurate, determine how much data there is and the potential impact of a data breach.
From there, companies can build a data quality program, with the help of data
stewards, data protection officers or other data management professionals. These
data management experts will help implement business processes that ensure future
data collection and use meets regulatory guidelines and provides the value that
businesses expect from data they collect.

You might also like