Common Data Errors

The document discusses common data errors that can affect the accuracy of datasets in Power BI, specifically focusing on missing or null values, duplicate rows, and inconsistent data types. It emphasizes the importance of identifying and resolving these errors to avoid skewed analysis results and unnecessary storage issues. The document concludes by urging data analysts to thoroughly scan their datasets for these errors before conducting any analysis.

Uploaded by

Aya Laadaili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views4 pages

Common Data Errors

Uploaded by

Aya Laadaili

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Common data errors

Introduction
Before you begin to transform data in Power BI, you must first make sure that your dataset is accurate and
reliable. Otherwise, you risk producing data analysis results that are incorrect.

There are several types of errors that commonly occur in data sets. In this reading, you’ll learn what these
errors are and how to identify them in your datasets.

Scenario
Adventure Works recently produced a large dataset containing data on customers and sales. The marketing
department plans to use this dataset to generate insights into the business and to help the business grow.

However, one of the data analysts believes that there are errors in the data set. These are common errors
Adventure Works must identify and remedy before analysis.

Common errors
There are three main types of errors that you’ll encounter as a data analyst. These are:

Missing or null values

Duplicate rows
Inconsistent data types.
You must be able to identify instances of these errors in your datasets. If the errors are not identified, then
their inclusion will lead to inaccurate, skewed, and inflated results. They can also give rise to extra,
unnecessary storage and processing requirements.

Missing or null values

Let’s begin by learning how to identify instances of missing or null values.

A missing or null value occurs when data is absent or unavailable for certain cells or records within a
dataset.

For example, in the following Adventure Works datasheet, for the Sales Price column, the cell content on
row 6 states NULL, indicating that there is no value in this location.

It’s important to scan your dataset for missing or null values before you perform data analysis. The
inclusion of these values can lead to incorrect calculations, skew statistical results, or generate misleading
insights.

Duplicate rows
Another common error that you find in datasets is that of duplicate rows or records.

Duplicate rows are instances in a dataset when two or more rows have identical values across all columns.
This error often occurs because of data entry errors, glitches within the system, or data that’s been merged
from multiple sources.
For example, the Adventure Works dataset contains identical records in rows 13 and 14. Most likely, this
occurred because the dataset was created by merging two different spreadsheets that contained an
overlap of data. Both instances of this data have now merged into one spreadsheet leading to duplication.

You must make sure that you resolve all instances of data duplication before processing your dataset. If
left unresolved, these errors can inflate the size of the dataset. This inflation could then skew your results.

Such errors could also lead to unnecessary storage because your storage solutions need to host data that
your projects don’t require. Or they could give rise to extra processing overheads because your software
needs to process large amounts of unnecessary data.

Inconsistent data types

As a data analyst, you also need to be aware of any occurrences of inconsistent data types.

Inconsistent data types occur when values within a single column contain different types of data. There
are numerous instances of inconsistent data types in the Adventure Works dataset.

For example, row 12 of the Units Sold column in the Adventure Works dataset contains inconsistent data
types. The data types for cells of the Units Sold column should all be numeric. Instead, the column has a
mix of numeric and text data types.
It’s important to identify and resolve any inconsistent data types within your dataset. If they remain in the
dataset, they can cause calculations to misbehave, which can lead to errors in results.

Conclusion
You should now be familiar with the three most common types of data errors that can occur within your
datasets. Missing or null values, duplicate rows, and inconsistent data types are all common issues that
must be identified and resolved before data analysis can begin. Scan your datasets before performing data
analysis to make sure that all instances of these errors have been removed.

BV350 Workshop Manual PDF
No ratings yet
BV350 Workshop Manual PDF
346 pages
Data Cleaning: A Brief Guide To
No ratings yet
Data Cleaning: A Brief Guide To
15 pages
Painless Statistics
From Everand
Painless Statistics
Barron's Educational Series
No ratings yet
Data Cleaning: A Brief Guide To
100% (2)
Data Cleaning: A Brief Guide To
15 pages
Microsoft Excel Statistical and Advanced Functions for Decision Making
From Everand
Microsoft Excel Statistical and Advanced Functions for Decision Making
Palani Murugappan
4/5 (2)
MYRIAD MODEL User Reference Guide
No ratings yet
MYRIAD MODEL User Reference Guide
74 pages
Subtitle Big Data Coursera 4
No ratings yet
Subtitle Big Data Coursera 4
2 pages
EDA
100% (1)
EDA
9 pages
Cleaning Techniques (Slides)
No ratings yet
Cleaning Techniques (Slides)
20 pages
Data Science Essentials: Missing and Repeated Values
No ratings yet
Data Science Essentials: Missing and Repeated Values
5 pages
Inconsistent Data
No ratings yet
Inconsistent Data
3 pages
Data Cleaning
No ratings yet
Data Cleaning
35 pages
FDS Chapter 3
No ratings yet
FDS Chapter 3
103 pages
1.3 Data Quality
No ratings yet
1.3 Data Quality
6 pages
Module 2 Data Science New
No ratings yet
Module 2 Data Science New
57 pages
Handouts
No ratings yet
Handouts
19 pages
Lect 6
No ratings yet
Lect 6
36 pages
Duplicates
No ratings yet
Duplicates
3 pages
Session2 Short
No ratings yet
Session2 Short
196 pages
Data Quality
No ratings yet
Data Quality
14 pages
Chapter3 DS
No ratings yet
Chapter3 DS
17 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
30 pages
Common Data-Cleaning Pitfalls
No ratings yet
Common Data-Cleaning Pitfalls
3 pages
M-II FDS U-II Questions
No ratings yet
M-II FDS U-II Questions
43 pages
Module 4 - (Process Data From Dirty To Clean)
No ratings yet
Module 4 - (Process Data From Dirty To Clean)
36 pages
Null Values in Data Complete Guide
No ratings yet
Null Values in Data Complete Guide
5 pages
Session2 Parts 3 4
No ratings yet
Session2 Parts 3 4
202 pages
Excel Statistics: Step by Step
From Everand
Excel Statistics: Step by Step
Stephanie Glen
4/5 (8)
Big Data Lec5
No ratings yet
Big Data Lec5
37 pages
Unit 2
No ratings yet
Unit 2
76 pages
Da 5
No ratings yet
Da 5
6 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
DSF 3-4
No ratings yet
DSF 3-4
18 pages
Unit 4 Notes
No ratings yet
Unit 4 Notes
20 pages
Descriptive Statistics
No ratings yet
Descriptive Statistics
28 pages
Subtitle
No ratings yet
Subtitle
2 pages
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
No ratings yet
Data Wrangling and Descriptive Analytics: DR Sandipan Karmakar Department of Management Studies MNIT Jaipur
57 pages
Integrating Data From Different Sources
No ratings yet
Integrating Data From Different Sources
11 pages
Data Cleaning 2021
No ratings yet
Data Cleaning 2021
61 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
34 pages
Data Preprocessing
No ratings yet
Data Preprocessing
11 pages
Data Analitics 4
No ratings yet
Data Analitics 4
10 pages
Project Questions
No ratings yet
Project Questions
5 pages
Module II - Data Processing
No ratings yet
Module II - Data Processing
54 pages
DS Lec 6
No ratings yet
DS Lec 6
27 pages
03 Data Science Process - Fall 23-24
No ratings yet
03 Data Science Process - Fall 23-24
38 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Data Analytics Questions
No ratings yet
Data Analytics Questions
6 pages
Process of Data Form Dirty Cleaning
No ratings yet
Process of Data Form Dirty Cleaning
48 pages
Data Analytics Program - Introduction To Data Analytics - Lesson 1
No ratings yet
Data Analytics Program - Introduction To Data Analytics - Lesson 1
56 pages
STA 1004 Problem Solving Assignment Sagar Kunwar
No ratings yet
STA 1004 Problem Solving Assignment Sagar Kunwar
24 pages
Individual Coursework (Replacing In-Class Test) : Big Data (6CS030)
No ratings yet
Individual Coursework (Replacing In-Class Test) : Big Data (6CS030)
8 pages
Best Practices For Data Cleaning - EN - 1802
No ratings yet
Best Practices For Data Cleaning - EN - 1802
13 pages
New DM
No ratings yet
New DM
47 pages
Data Integrity and Compliance
No ratings yet
Data Integrity and Compliance
4 pages
Using Excel To Clean and Prepare Data
No ratings yet
Using Excel To Clean and Prepare Data
9 pages
Excel For Data Analysis
No ratings yet
Excel For Data Analysis
9 pages
Using Excel To Clean and Prepare Data For Analysis
No ratings yet
Using Excel To Clean and Prepare Data For Analysis
9 pages
Exploratory Data
No ratings yet
Exploratory Data
47 pages
Not For Sale: 17.6 Cleansing Data
No ratings yet
Not For Sale: 17.6 Cleansing Data
8 pages
Cleaning Excel Data With Power Query Straight to the Point
From Everand
Cleaning Excel Data With Power Query Straight to the Point
Oz du Soleil
4.5/5 (3)
Hankook Brochure Manual
No ratings yet
Hankook Brochure Manual
22 pages
1CD PDF
No ratings yet
1CD PDF
522 pages
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
No ratings yet
Numerical and Experimental Modelling of The Steam Assisted Gravity Drainage (SAGD) Process
7 pages
Reciprocating Engines and Systems
No ratings yet
Reciprocating Engines and Systems
171 pages
Concept: Mathematics 4 - Quarter 1 Week 2
No ratings yet
Concept: Mathematics 4 - Quarter 1 Week 2
9 pages
Object Oriented File
No ratings yet
Object Oriented File
62 pages
S Block dpp2
No ratings yet
S Block dpp2
3 pages
Titanic Survival Prediction
No ratings yet
Titanic Survival Prediction
14 pages
High Frequency Isolated Bidirectional Dual Active Bridge DC-DC Converters and Its Application To Distributed Energy Systems: An Overview
No ratings yet
High Frequency Isolated Bidirectional Dual Active Bridge DC-DC Converters and Its Application To Distributed Energy Systems: An Overview
23 pages
Holsetpartnumbers 2008
No ratings yet
Holsetpartnumbers 2008
1 page
Sharp Photodevices Application Cirquits
No ratings yet
Sharp Photodevices Application Cirquits
7 pages
Motherboard: Wilmar Jennie V. Motea, Mit
No ratings yet
Motherboard: Wilmar Jennie V. Motea, Mit
83 pages
Insertion Sort Algorithm and Complexity Analysis
No ratings yet
Insertion Sort Algorithm and Complexity Analysis
1 page
Damage Stability-3
No ratings yet
Damage Stability-3
1 page
Market Structure
No ratings yet
Market Structure
14 pages
Chap 3 Vectors EC
No ratings yet
Chap 3 Vectors EC
12 pages
TORAX: A Fast and Differentiable Tokamak Transport Simulator in JAX
No ratings yet
TORAX: A Fast and Differentiable Tokamak Transport Simulator in JAX
16 pages
19e Multifunctional Indicator Operator Manual
No ratings yet
19e Multifunctional Indicator Operator Manual
73 pages
EMR3 все необходимое
No ratings yet
EMR3 все необходимое
65 pages
Kulfoldi Kutatasi Jelentesek Gyujtemenye
No ratings yet
Kulfoldi Kutatasi Jelentesek Gyujtemenye
92 pages
Exp # 1 Melting Point
No ratings yet
Exp # 1 Melting Point
11 pages
008 ISO 10421998 Laboratory Glassware One Mark Volum
100% (1)
008 ISO 10421998 Laboratory Glassware One Mark Volum
2 pages
TM800V Service Manual
No ratings yet
TM800V Service Manual
149 pages
Stability & Routh Hurwitz Criterion
No ratings yet
Stability & Routh Hurwitz Criterion
5 pages
C++ All Modules
No ratings yet
C++ All Modules
68 pages
Objective Problems: (Level 1)
No ratings yet
Objective Problems: (Level 1)
7 pages
Uv-K5 User Manuel
No ratings yet
Uv-K5 User Manuel
55 pages
010 Strebord v91 Section 10 Acoustics
No ratings yet
010 Strebord v91 Section 10 Acoustics
64 pages

Common Data Errors

Uploaded by

Common Data Errors

Uploaded by

Common data errors

Missing or null values

Missing or null values

Inconsistent data types

You might also like