0% found this document useful (0 votes)

30 views15 pages

Data Quality

Uploaded by

hemrajwaghbusiness

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views15 pages

Data Quality

Uploaded by

hemrajwaghbusiness

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Quality

Introduction

Today is world of heterogeneity.

We have different technologies.
We operate on different platforms.
We have large amount of data being generated everyday in all sorts of
organizations and Enterprises.
And we do have problems with data.
Problems

Duplicated , inconsistent, ambiguous, incomplete.

So there is a need to collect data in one place and clean up the data
Why data quality matters?

Good data is your most valuable asset, and bad data can seriously harm
your business and credibility…

1.What have you missed?

2.When things go wrong.

3.Making confident decisions.

What is data quality?

Data quality is a perception or an assessment of data’s fitness to serve

its purpose in a given context.

It is described by several dimensions like

•Correctness / Accuracy : Accuracy of data is the degree to which the

captured data correctly describes the real world entity.
•Consistency: This is about the single version of truth. Consistency
means data throughout the enterprise should be sync with each other
Contd…

•Completeness: It is the extent to which the expected attributes of data

are provided.

•Timeliness: Right data to the right person at the right time is

important for business.

•Metadata: Data about data.

Maintenance of data quality

Data quality results from the process of going through the data and
scrubbing it, standardizing it, and de-duplicating records, as well as
doing some of the data enrichment.
1. Maintain complete data.
2. Clean up your data by standardizing it using rules.
3. Use fancy algorithms to detect duplicates.
4. Avoid entry of duplicate leads and contacts.
5. Merge existing duplicate records.
6. Use roles for security.
Inconsistent data before
cleaning up
Consistent data after cleaning
up
Data Profiling
It is the process of statistically examining and analyzing the content in a
data source, and hence collecting information about the data. It
consists of techniques used to analyze the data we have for accuracy
and completeness.

1. Data profiling helps us make a thorough assessment of data quality.

2. It assists the discovery of anomalies in data.

3. It helps us understand content, structure, relationships, etc. about

the data in the data source we are analyzing
Data Profiling
4. It helps us know whether the existing data can be applied to other
areas or purposes.

5. It helps us understand the various issues/challenges we may face in a

database project much before the actual work begins. This enables us
to make early decisions and act accordingly.

6. It is also used to assess and validate metadata.

When to conduct Data Profiling?

-> At the discovery/requirements gathering phase

-> Just before the dimensional modeling process

-> During ETL package design

How to conduct Data Profiling?

Data profiling involves statistical analysis of the data at source and the
data being loaded, as well as analysis of metadata. These statistics may
be used for various analysis purposes.
Common examples of analyses to be done are:

Data quality: Analyze the quality of data at the data source.

NULL values: Look out for the number of NULL values in an attribute.
How to conduct Data Profiling?

Candidate keys: Analysis of the extent to which certain columns are distinct will
give developer useful information w. r. t. selection of candidate keys.

Primary key selection: To check whether the candidate key column does not
violate the basic requirements of not having NULL values or duplicate values.

Empty string values: A string column may contain NULL or even empty sting values
that may create problems later.

String length: An analysis of largest and shortest possible length as well as the
average string length of a sting-type column can help us decide what data type
would be most suitable for the said column.
How to conduct Data Profiling?

Identification of cardinality: The cardinality relationships are important

for inner and outer join considerations with regard to several BI tools.

Data format: Sometimes, the format in which certain data is written in

some columns may or may not be user-friendly

Save The PI/SPI Securely: Option 1 & 2
No ratings yet
Save The PI/SPI Securely: Option 1 & 2
7 pages
Coursera - Data Analytics - Course 1
No ratings yet
Coursera - Data Analytics - Course 1
8 pages
AMDP
No ratings yet
AMDP
14 pages
Data Quality and Data Cleaning: An Overview
0% (1)
Data Quality and Data Cleaning: An Overview
132 pages
Informatica MDM Training Course Content
No ratings yet
Informatica MDM Training Course Content
5 pages
Data Quality - Information Quality For Northwind
No ratings yet
Data Quality - Information Quality For Northwind
18 pages
Cognos Interview Questions - Very Good
No ratings yet
Cognos Interview Questions - Very Good
7 pages
NoSQL Notes
No ratings yet
NoSQL Notes
5 pages
Data Profiling White Paper1003-Final
No ratings yet
Data Profiling White Paper1003-Final
17 pages
IDQ Functionality Imp
No ratings yet
IDQ Functionality Imp
7 pages
Data Quality Product Directory 2009
100% (1)
Data Quality Product Directory 2009
23 pages
Data Profiling
No ratings yet
Data Profiling
15 pages
SAP Data Architecture - NEW
100% (2)
SAP Data Architecture - NEW
74 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Week 2 - Data Analytics Life Cycle
No ratings yet
Week 2 - Data Analytics Life Cycle
41 pages
Data Quality and Data Cleaning: An Overview
No ratings yet
Data Quality and Data Cleaning: An Overview
27 pages
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
No ratings yet
Informatics Practces: Name-Himanshu Saini Class-Xii - H Roll No. 17
62 pages
Acronis True Image WD Edition: User Guide
No ratings yet
Acronis True Image WD Edition: User Guide
86 pages
Data Profiling
No ratings yet
Data Profiling
7 pages
Lecture 6 23-24
No ratings yet
Lecture 6 23-24
20 pages
BIA 5000 Introduction To Analytics - Lesson 6
No ratings yet
BIA 5000 Introduction To Analytics - Lesson 6
59 pages
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
No ratings yet
Data Profiling Overview: What Is Data Profiling, and How Can It Help With Data Quality?
3 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Chapter 6 Part2
No ratings yet
Chapter 6 Part2
23 pages
The Necessity of Data Profiling: A How-To Guide To Getting Started and Driving Value
No ratings yet
The Necessity of Data Profiling: A How-To Guide To Getting Started and Driving Value
2 pages
Monitor and Adminstration
No ratings yet
Monitor and Adminstration
3 pages
Sreenivas K: A Pentaho Data Integration Tool
No ratings yet
Sreenivas K: A Pentaho Data Integration Tool
42 pages
Data - Analytics - Interview - Q and A
No ratings yet
Data - Analytics - Interview - Q and A
64 pages
Replication Assesment
No ratings yet
Replication Assesment
4 pages
Cyber Security Unit - 5
No ratings yet
Cyber Security Unit - 5
43 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
Shrinking
No ratings yet
Shrinking
21 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
SIA Romney Ch06
No ratings yet
SIA Romney Ch06
22 pages
Data Quality Lec 3
No ratings yet
Data Quality Lec 3
3 pages
Monitor and Support Data Conversion
No ratings yet
Monitor and Support Data Conversion
5 pages
Data Profiling
No ratings yet
Data Profiling
3 pages
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
Basics of Data Integration
No ratings yet
Basics of Data Integration
67 pages
Chapter 2 Introduction To Data Science
No ratings yet
Chapter 2 Introduction To Data Science
50 pages
SAP T Codes Chat GPT
No ratings yet
SAP T Codes Chat GPT
2 pages
Data Analytics - Module-1.2
No ratings yet
Data Analytics - Module-1.2
55 pages
468 - DM Bok 2
No ratings yet
468 - DM Bok 2
157 pages
Ba - Data Quality
No ratings yet
Ba - Data Quality
2 pages
Analysis Terms
No ratings yet
Analysis Terms
1 page
Data Analytics Template - Task 3 - Final
No ratings yet
Data Analytics Template - Task 3 - Final
11 pages
Essbase 11-1-2 MetaData Export To XML v3
No ratings yet
Essbase 11-1-2 MetaData Export To XML v3
9 pages
Ais Elect - Reviewer
No ratings yet
Ais Elect - Reviewer
5 pages
Web STorage
No ratings yet
Web STorage
5 pages
Data Cleaning
No ratings yet
Data Cleaning
35 pages
NetBackup10 AdminGuide MongoDB
No ratings yet
NetBackup10 AdminGuide MongoDB
88 pages
DBMS File
No ratings yet
DBMS File
36 pages
Data Preparation Part1
No ratings yet
Data Preparation Part1
12 pages
Handouts
No ratings yet
Handouts
19 pages
Unit 2
No ratings yet
Unit 2
22 pages
Lect 6
No ratings yet
Lect 6
36 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
2.1 Fundamentals of RDBMS
No ratings yet
2.1 Fundamentals of RDBMS
34 pages
Activity Overview - Course 3 Module 3 Google Data ANALYTICS
No ratings yet
Activity Overview - Course 3 Module 3 Google Data ANALYTICS
15 pages
Database HW3 Chap5
No ratings yet
Database HW3 Chap5
5 pages
DBMS Unit-3
No ratings yet
DBMS Unit-3
28 pages
Data Analysis and Information Management
No ratings yet
Data Analysis and Information Management
13 pages
Asset List Audio
No ratings yet
Asset List Audio
31 pages
DSBD
No ratings yet
DSBD
23 pages
Anti Forensics
No ratings yet
Anti Forensics
49 pages
20PMHS012 RH
No ratings yet
20PMHS012 RH
32 pages
CS822 DataMining Week3
No ratings yet
CS822 DataMining Week3
91 pages
(Ebook PDF) Database Design Application Development Administration 7th by Mannino Download
100% (3)
(Ebook PDF) Database Design Application Development Administration 7th by Mannino Download
50 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
1 Da
No ratings yet
1 Da
44 pages
Database Activity Monitoring Gartner
No ratings yet
Database Activity Monitoring Gartner
8 pages
Unit 1 - Basics of RDBMS
No ratings yet
Unit 1 - Basics of RDBMS
11 pages
DBMS Interview Questions by Company
No ratings yet
DBMS Interview Questions by Company
15 pages
Data Profiling Is A Critical Step in Data Manageme
No ratings yet
Data Profiling Is A Critical Step in Data Manageme
7 pages
What Is Big Data00
No ratings yet
What Is Big Data00
5 pages
Anvesh - Sr. Data Engineer
No ratings yet
Anvesh - Sr. Data Engineer
6 pages
Sonu Dbms
No ratings yet
Sonu Dbms
41 pages
Data Analysis
No ratings yet
Data Analysis
6 pages
Part II, Meet 4 - CH 6 Dan 7 UNP
No ratings yet
Part II, Meet 4 - CH 6 Dan 7 UNP
19 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Cloud Computing Unit 2
No ratings yet
Cloud Computing Unit 2
13 pages
6a - Data Quality and Data Cleaning
No ratings yet
6a - Data Quality and Data Cleaning
5 pages
Unit 2 Data Profiling
No ratings yet
Unit 2 Data Profiling
14 pages
Unit - 2
No ratings yet
Unit - 2
4 pages
UNIT-1: What Is Data Analytics? Why Data Analytics Is Important? What Is The Role of Data Analytics and Ways To Use It?
No ratings yet
UNIT-1: What Is Data Analytics? Why Data Analytics Is Important? What Is The Role of Data Analytics and Ways To Use It?
10 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
From Everand
PYTHON FOR DATA ANALYTICS: Mastering Python for Comprehensive Data Analysis and Insights (2023 Guide for Beginners)
Waldo Todd
No ratings yet
PYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course)
From Everand
PYTHON FOR DATA ANALYSIS: A Practical Guide to Manipulating, Cleaning, and Analyzing Data Using Python (2023 Beginner Crash Course)
Ike Beck
No ratings yet
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
From Everand
CompTIA Data+ (Plus) The Ultimate Exam Prep Study Guide to Pass the Exam
Jamie Murphy
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet

Data Quality

Uploaded by

Data Quality

Uploaded by

Data Quality

Today is world of heterogeneity.

Duplicated , inconsistent, ambiguous, incomplete.

1.What have you missed?

2.When things go wrong.

3.Making confident decisions.

Data quality is a perception or an assessment of data’s fitness to serve

It is described by several dimensions like

•Correctness / Accuracy : Accuracy of data is the degree to which the

•Completeness: It is the extent to which the expected attributes of data

•Timeliness: Right data to the right person at the right time is

•Metadata: Data about data.

1. Data profiling helps us make a thorough assessment of data quality.

2. It assists the discovery of anomalies in data.

3. It helps us understand content, structure, relationships, etc. about

5. It helps us understand the various issues/challenges we may face in a

6. It is also used to assess and validate metadata.

-> At the discovery/requirements gathering phase

-> Just before the dimensional modeling process

-> During ETL package design

Data quality: Analyze the quality of data at the data source.

Identification of cardinality: The cardinality relationships are important

Data format: Sometimes, the format in which certain data is written in

You might also like