0% found this document useful (0 votes)
18 views4 pages

Common Data Errors

The document discusses common data errors that can affect the accuracy of datasets in Power BI, specifically focusing on missing or null values, duplicate rows, and inconsistent data types. It emphasizes the importance of identifying and resolving these errors to avoid skewed analysis results and unnecessary storage issues. The document concludes by urging data analysts to thoroughly scan their datasets for these errors before conducting any analysis.

Uploaded by

Aya Laadaili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views4 pages

Common Data Errors

The document discusses common data errors that can affect the accuracy of datasets in Power BI, specifically focusing on missing or null values, duplicate rows, and inconsistent data types. It emphasizes the importance of identifying and resolving these errors to avoid skewed analysis results and unnecessary storage issues. The document concludes by urging data analysts to thoroughly scan their datasets for these errors before conducting any analysis.

Uploaded by

Aya Laadaili
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Common data errors

Introduction
Before you begin to transform data in Power BI, you must first make sure that your dataset is accurate and
reliable. Otherwise, you risk producing data analysis results that are incorrect.

There are several types of errors that commonly occur in data sets. In this reading, you’ll learn what these
errors are and how to identify them in your datasets.

Scenario
Adventure Works recently produced a large dataset containing data on customers and sales. The marketing
department plans to use this dataset to generate insights into the business and to help the business grow.

However, one of the data analysts believes that there are errors in the data set. These are common errors
Adventure Works must identify and remedy before analysis.

Common errors
There are three main types of errors that you’ll encounter as a data analyst. These are:

Missing or null values


Duplicate rows
Inconsistent data types.
You must be able to identify instances of these errors in your datasets. If the errors are not identified, then
their inclusion will lead to inaccurate, skewed, and inflated results. They can also give rise to extra,
unnecessary storage and processing requirements.

Missing or null values


Let’s begin by learning how to identify instances of missing or null values.

A missing or null value occurs when data is absent or unavailable for certain cells or records within a
dataset.

For example, in the following Adventure Works datasheet, for the Sales Price column, the cell content on
row 6 states NULL, indicating that there is no value in this location.

It’s important to scan your dataset for missing or null values before you perform data analysis. The
inclusion of these values can lead to incorrect calculations, skew statistical results, or generate misleading
insights.

Duplicate rows
Another common error that you find in datasets is that of duplicate rows or records.

Duplicate rows are instances in a dataset when two or more rows have identical values across all columns.
This error often occurs because of data entry errors, glitches within the system, or data that’s been merged
from multiple sources.
For example, the Adventure Works dataset contains identical records in rows 13 and 14. Most likely, this
occurred because the dataset was created by merging two different spreadsheets that contained an
overlap of data. Both instances of this data have now merged into one spreadsheet leading to duplication.

You must make sure that you resolve all instances of data duplication before processing your dataset. If
left unresolved, these errors can inflate the size of the dataset. This inflation could then skew your results.

Such errors could also lead to unnecessary storage because your storage solutions need to host data that
your projects don’t require. Or they could give rise to extra processing overheads because your software
needs to process large amounts of unnecessary data.

Inconsistent data types


As a data analyst, you also need to be aware of any occurrences of inconsistent data types.

Inconsistent data types occur when values within a single column contain different types of data. There
are numerous instances of inconsistent data types in the Adventure Works dataset.

For example, row 12 of the Units Sold column in the Adventure Works dataset contains inconsistent data
types. The data types for cells of the Units Sold column should all be numeric. Instead, the column has a
mix of numeric and text data types.
It’s important to identify and resolve any inconsistent data types within your dataset. If they remain in the
dataset, they can cause calculations to misbehave, which can lead to errors in results.

Conclusion
You should now be familiar with the three most common types of data errors that can occur within your
datasets. Missing or null values, duplicate rows, and inconsistent data types are all common issues that
must be identified and resolved before data analysis can begin. Scan your datasets before performing data
analysis to make sure that all instances of these errors have been removed.

You might also like