0% found this document useful (0 votes)

19 views9 pages

Data Validation

Data validation involves checking data accuracy and quality before use. Common validation types include data type checks, code checks, range checks, format checks, consistency checks, uniqueness checks, presence checks, length checks, and lookups. Validation is often done using scripts or programs and involves determining data samples, validating databases and data formats, and identifying inconsistencies.

Uploaded by

rishabh28072002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

19 views9 pages

Data Validation

Uploaded by

rishabh28072002

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

DATA VALIDATION

Data validation means checking the accuracy and quality of source data before using,
importing, or otherwise processing data. Different types of validation can be performed
depending on destination constraints or objectives. Data validation is a form of data
cleansing.

Why perform data validation?

When moving and merging data it’s important to make sure data from different sources and
repositories will conform to business rules and not become corrupted due to inconsistencies
in type or context. The goal is to create data that is consistent, accurate and complete so to
prevent data loss and errors during a move.

When is data validation performed?

In data warehousing, data validation is often performed prior to the ETL (Extraction
Translation Load) process. A data validation test is performed so that analyst can get insight
into the scope or nature of data conflicts. Data validation is a general term and can be
performed on any type of data, however, including data within a single application (such as
Microsoft Excel) or when merging simple data within a single data store.

Definition of Criteria for Validating Data Mining Models

Measures of data mining generally fall into the categories of accuracy, reliability, and
usefulness.

Accuracy is a measure of how well the model correlates an outcome with the attributes
in the data that has been provided. There are various measures of accuracy, but all
measures of accuracy are dependent on the data that is used. In reality, values might be
missing or approximate, or the data might have been changed by multiple processes.
Particularly in the phase of exploration and development, you might decide to accept a
certain amount of error in the data, especially if the data is fairly uniform in its
characteristics. For example, a model that predicts sales for a particular store based on past
sales can be strongly correlated and very accurate, even if that store consistently used the
wrong accounting method. Therefore, measurements of accuracy must be balanced by
assessments of reliability.

Reliability assesses the way that a data mining model performs on different data sets. A
data mining model is reliable if it generates the same type of predictions or finds the
same general kinds of patterns regardless of the test data that is supplied.

For example, the model that you generate for the store that used the wrong accounting
method would not generalize well to other stores, and therefore would not be reliable.

Usefulness includes various metrics that tell you whether the model provides useful
information. For example, a data mining model that correlates store location with sales
might be both accurate and reliable, but might not be useful, because you cannot generalize
that result by adding more stores at the same location. Moreover, it does not answer the
fundamental business question of why certain locations have more sales. You might also find
that a model that appears successful in fact is meaningless, because it is based on cross-
correlations in the data.

What are the Types of Data Validation?

Every organization will have its own set of rules for storing and maintaining data.
Setting basic data validation rules will assist your company in maintaining
organized standards that will make working with data more efficient. Most Data
Validation procedures will run one or more of these checks to ensure that the data
is correct before it is stored in the database.

The following are the common Data Validation Types:

● Data Type Check

● Code Check

● Range Check

● Format Check
● Consistency Check

● Uniqueness Check

● Presence Check

● Length Check

● Look Up

1) Data Type Check

A Data Type check ensures that data entered into a field is of the correct data
type. A field, for example, may only accept numeric data. The system should then
reject any data containing other characters, such as letters or special symbols, and
an error message should be displayed.

2) Code Check

A Code Check ensures that a field is chosen from a valid list of values or that
certain formatting rules are followed. For example, it is easier to verify the validity
of a postal code by comparing it to a list of valid codes. Other items, such as
country codes and NAICS industry codes, can be approached in the same way.

3) Range Check

A Range Check will determine whether the input data falls within a given range.
Latitude and longitude, for example, are frequently used in geographic data.
Latitude should be between -90 and 90, and longitude should be between -180 and
180. Any values outside of this range are considered invalid.

4) Format Check

Many data types have a predefined format. A Format Check will ensure that the
data is in the correct format. Date fields, for example, are stored in a fixed format
such as “YYYY-MM-DD” or “DD-MM-YYYY.” If the date is entered in any
other format, it will be rejected. A National Insurance number looks like this: LL
99 99 99 L, where L can be any letter and 9 can be any number.

5) Consistency Check

A Consistency Check is a type of logical check that ensures data is entered in a

logically consistent manner. Checking if the delivery date for a parcel is after the
shipping date is one example.

6) Uniqueness Check

Some data, such as IDs or e-mail addresses, are inherently unique. These fields in a
database should most likely have unique entries. A Uniqueness Check ensures
that an item is not entered into a database more than once.

7) Presence Check

A Presence Check ensures that all mandatory fields are not left blank. If someone
tries to leave the field blank, an error message will be displayed, and they will be
unable to proceed to the next step or save any other data that they have entered. A
key field, for example, cannot be left blank in most databases.

8) Length Check

A Length Check ensures that the appropriate number of characters are entered
into the field. It verifies that the entered character string is neither too short nor too
long. Consider a password that must be at least 8 characters long. The Length
Check ensures that the field is filled with exactly 8 characters.

9) Look Up

Look Up assists in reducing errors in a field with a limited set of values. It consults
a table to find acceptable values. The fact that there are only 7 possible days in a
week, for example, ensures that the list of possible values is limited
What are the Methods to Perform Data Validation?

There are various methods for Data Validation available, and each method includes
specific features for the best Data Validation process.

These methods to perform Data Validation are as follows:

● Validation by Scripts

● Validation by Programs

1) Validation by Scripts

● In this method, the validation process is carried out using a scripting

language such as Python, which is used to write the entire script for the
validation process.
● To ensure that all necessary information is within the required quality
parameters, you can compare data values and structure to your defined rules.
● This method of Data Validation can be time-consuming depending on the
complexity and size of the data set you are validating.
● 2) Validation by Programs
● Many software programs are available to help you validate data.
● Because these programs have been developed to understand your rules and
the file structures you are working with, this method of validation is very
simple.
● The ideal tool will allow you to incorporate validation into every step of
your workflow without requiring a deep understanding of the underlying
format.

The different programs that can be used are:

● Open Source Tools

● Enterprise Tools

A) Open Source Tools

Because open-source options are cost-effective, developers can save money if they
are cloud-based. However, in order to complete the process effectively, this
method necessitates extensive knowledge and hand-coding. OpenRefine and
SourceForge are two excellent examples of open-source tools.

B) Enterprise Tools

For the Data Validation process, various enterprise tools are available. Enterprise
tools are secure and stable, but they require infrastructure and are more expensive
than open-source tools. For instance, the FME tool area is used to repair and
validate data.

What are the Steps to perform Data Validation?

The steps carried out to perform Data Validation are as follows:

● Determine Data Sample

● Database Validation

● Data Format Validation

Step 1: Determine Data Sample

If you have a large amount of data to validate, you will need a sample rather than
the entire dataset. To ensure the project’s success, you must first understand and
decide on the volume of the data sample, as well as the error rate.
Step 2: Database Validation

You must ensure that all requirements are met with the existing database during the
Database Validation process. To compare source and target data fields, unique IDs
and the number of records must be determined.

Step 3: Data Format Validation

Determine the overall data capability and the variation that requires source data for
the targeted validation, and then search for inconsistencies, duplicate data,
incorrect formats, and null field values.

What are the Benefits of Data Validation?

Some of the benefits of Data Validation are as follows:

● It is cost-effective because it saves the appropriate amount of time and

money through dataset collection.
● Because it removes duplication from the entire dataset, it is simple to use
and is compatible with other processes.
● With improved information collection, data validation can directly help to
improve the business.
● It is made up of a data-efficient structure that provides a standard database
and cleaned dataset information.

What are the Limitations of Data Validation?

Some of the limitations of Data Validation are as follows:

● Because of the organization’s multiple databases, there may be some
disruption. As a result, data may be out of date, which can cause issues
when validating the data.
● When you have a large database, the process of data validation can be time-
consuming because you have to perform the validation manually.

Pythone Notes
No ratings yet
Pythone Notes
103 pages
Data Cleaning and Data Transformation
No ratings yet
Data Cleaning and Data Transformation
13 pages
Data Integrity and Compliance
No ratings yet
Data Integrity and Compliance
4 pages
Unit V: Distance and Rule Based Models
No ratings yet
Unit V: Distance and Rule Based Models
56 pages
A Guide To Improving Data Integrity and Adoption
No ratings yet
A Guide To Improving Data Integrity and Adoption
39 pages
DATA Management Concepts
No ratings yet
DATA Management Concepts
10 pages
Intelligent Systems 1
No ratings yet
Intelligent Systems 1
38 pages
R For Business Analytics
No ratings yet
R For Business Analytics
1 page
Data Preparation and Analysis
No ratings yet
Data Preparation and Analysis
22 pages
Data Validation & Research
No ratings yet
Data Validation & Research
41 pages
Data Mining-Exams
100% (2)
Data Mining-Exams
3 pages
L07 - Advance Analytical Theory and Methods - Clustering
No ratings yet
L07 - Advance Analytical Theory and Methods - Clustering
22 pages
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
100% (1)
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
24 pages
Data Preprocessing
100% (1)
Data Preprocessing
33 pages
Pharmacy Lecture (Data Processing)
No ratings yet
Pharmacy Lecture (Data Processing)
11 pages
Cse2026 Module 1 & 2 Detailed Notes
No ratings yet
Cse2026 Module 1 & 2 Detailed Notes
185 pages
DS-Unit-2 ABM Final
No ratings yet
DS-Unit-2 ABM Final
134 pages
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
No ratings yet
Topic 12 Manipulating Data: Prepared By: Mohammad Nabeel Arshad
101 pages
Deep Learning Overview
No ratings yet
Deep Learning Overview
102 pages
Data Mining - Reference - 1
No ratings yet
Data Mining - Reference - 1
91 pages
Unit-3 Bi
No ratings yet
Unit-3 Bi
57 pages
Module 7. Data Quality
No ratings yet
Module 7. Data Quality
42 pages
DV Chapter 2
No ratings yet
DV Chapter 2
36 pages
Ch2-L2 - Validating and Formats Presentation PDF
No ratings yet
Ch2-L2 - Validating and Formats Presentation PDF
41 pages
Unit-3 Bi
No ratings yet
Unit-3 Bi
48 pages
Part II, Meet 4 - CH 6 Dan 7 UNP
No ratings yet
Part II, Meet 4 - CH 6 Dan 7 UNP
19 pages
Techniquesfor Ensuring Data Quality
No ratings yet
Techniquesfor Ensuring Data Quality
19 pages
NME1 Unit4 Notes
No ratings yet
NME1 Unit4 Notes
21 pages
Process Data From Dirty To Clean
No ratings yet
Process Data From Dirty To Clean
34 pages
Data Cleaning
No ratings yet
Data Cleaning
35 pages
Final Report BPP & Company 3091 Anindya
No ratings yet
Final Report BPP & Company 3091 Anindya
66 pages
Process of Data Form Dirty Cleaning
No ratings yet
Process of Data Form Dirty Cleaning
48 pages
Your Paragraph Text
No ratings yet
Your Paragraph Text
17 pages
AS Level IT U1.5
No ratings yet
AS Level IT U1.5
18 pages
Isom Midterms
No ratings yet
Isom Midterms
27 pages
Unit 2
No ratings yet
Unit 2
22 pages
Assignment Data Mining
No ratings yet
Assignment Data Mining
20 pages
Data Analysis and Interpretation
No ratings yet
Data Analysis and Interpretation
32 pages
Unit 2
No ratings yet
Unit 2
19 pages
A Review On Predictive Analytics in Data Mining
No ratings yet
A Review On Predictive Analytics in Data Mining
8 pages
DM Unit 3
No ratings yet
DM Unit 3
15 pages
Data Analysis Data Validation Process Importance 1694268870
No ratings yet
Data Analysis Data Validation Process Importance 1694268870
12 pages
Artikel Audit 4 PDF
No ratings yet
Artikel Audit 4 PDF
10 pages
6.5 Computer
No ratings yet
6.5 Computer
13 pages
Unit 3
No ratings yet
Unit 3
18 pages
Mannila 1997
No ratings yet
Mannila 1997
15 pages
Data Validation and Verification
No ratings yet
Data Validation and Verification
13 pages
Data Cleaning
No ratings yet
Data Cleaning
42 pages
Data Analitics 4
No ratings yet
Data Analitics 4
10 pages
Research Article: Improved KNN Algorithm Based On Preprocessing of Center in Smart Cities
No ratings yet
Research Article: Improved KNN Algorithm Based On Preprocessing of Center in Smart Cities
10 pages
1 - Competition Mechanics Intro
No ratings yet
1 - Competition Mechanics Intro
23 pages
Denisha FINAL - PROPOSAL.
No ratings yet
Denisha FINAL - PROPOSAL.
11 pages
Data Verification and Validation
No ratings yet
Data Verification and Validation
2 pages
Please Give Me Simple Definition and Steeps of Va...
No ratings yet
Please Give Me Simple Definition and Steeps of Va...
3 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
Mis Group 6 Assignment 1
No ratings yet
Mis Group 6 Assignment 1
10 pages
Bithons (Information Systems) (12245000) : University of Pretoria Yearbook 2023
No ratings yet
Bithons (Information Systems) (12245000) : University of Pretoria Yearbook 2023
7 pages
What Is Data Validation
No ratings yet
What Is Data Validation
5 pages
Data Validation and Verification: Information Security Spring-2020
No ratings yet
Data Validation and Verification: Information Security Spring-2020
16 pages
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
No ratings yet
A Comparative Study On Predicting The Probability of Liver Disease IJERTV8IS100314 PDF
5 pages
Ba - Data Quality
No ratings yet
Ba - Data Quality
2 pages
Business Data Mining Week 2
No ratings yet
Business Data Mining Week 2
6 pages
Data Quality - 079 Moumon
No ratings yet
Data Quality - 079 Moumon
8 pages
Data Capture
No ratings yet
Data Capture
4 pages
MCS Online Course Rotation & Area List v4
No ratings yet
MCS Online Course Rotation & Area List v4
1 page
Ahad Beykaei AI ML Manager
No ratings yet
Ahad Beykaei AI ML Manager
7 pages
Data Quality - Information Quality For Northwind
No ratings yet
Data Quality - Information Quality For Northwind
18 pages
Machine Learning Methods To Predict Diabetes Complications
No ratings yet
Machine Learning Methods To Predict Diabetes Complications
8 pages
AIS Ques
No ratings yet
AIS Ques
13 pages
1.5 Checking The Accuracy of Data A Level IT
No ratings yet
1.5 Checking The Accuracy of Data A Level IT
26 pages
Andromeda
No ratings yet
Andromeda
6 pages
Data Validation and Verification
No ratings yet
Data Validation and Verification
29 pages
PHD Computer Science Program: 1. Admission Criteria
No ratings yet
PHD Computer Science Program: 1. Admission Criteria
15 pages
Field Validation: Why & How: Fmug March 7, 2008
No ratings yet
Field Validation: Why & How: Fmug March 7, 2008
19 pages
Databases & Charts: Chapter 11, Applied A-Level
No ratings yet
Databases & Charts: Chapter 11, Applied A-Level
12 pages
Notes On Data Validation & Verification, Test Data
No ratings yet
Notes On Data Validation & Verification, Test Data
3 pages
Basic Data Profiling
No ratings yet
Basic Data Profiling
2 pages
CHAPTER 5: Clarifying The Research Question Through Secondary Data and Exploration (Handout) A Search Strategy For Exploration
No ratings yet
CHAPTER 5: Clarifying The Research Question Through Secondary Data and Exploration (Handout) A Search Strategy For Exploration
7 pages
Unit - 1 INTRODUCTION, DATA - 1: What Is Data Mining? Motivating Challenges The Origins of Data 6 Hours
No ratings yet
Unit - 1 INTRODUCTION, DATA - 1: What Is Data Mining? Motivating Challenges The Origins of Data 6 Hours
6 pages
Differences Between Verification and Validation
No ratings yet
Differences Between Verification and Validation
4 pages
Lesson Notes - Validation & Verification
No ratings yet
Lesson Notes - Validation & Verification
4 pages
Customer Relationship Management in Banking Industry
No ratings yet
Customer Relationship Management in Banking Industry
12 pages
Bcse208l Data-Mining TH 1.0 71 Bcse208l 66 Acp
No ratings yet
Bcse208l Data-Mining TH 1.0 71 Bcse208l 66 Acp
2 pages
Trends of E-Learning Research From 2000 To 2008 Use of Text
No ratings yet
Trends of E-Learning Research From 2000 To 2008 Use of Text
12 pages
Mini Project On Current Topics in Computer Security: Tips, Resources, Timeline
No ratings yet
Mini Project On Current Topics in Computer Security: Tips, Resources, Timeline
6 pages
Course Schedule3
No ratings yet
Course Schedule3
3 pages
The Data Warehouse Quality Audit Session Overview
No ratings yet
The Data Warehouse Quality Audit Session Overview
5 pages
Data Validation
No ratings yet
Data Validation
2 pages
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
From Everand
Great Expectations Checkpoints in Data Validation: The Complete Guide for Developers and Engineers
William Smith
No ratings yet
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet

Data Validation

Uploaded by

Data Validation

Uploaded by

DATA VALIDATION

Why perform data validation?

When is data validation performed?

Definition of Criteria for Validating Data Mining Models

What are the Types of Data Validation?

The following are the common Data Validation Types:

● Data Type Check

1) Data Type Check

A Consistency Check is a type of logical check that ensures data is entered in a

These methods to perform Data Validation are as follows:

● In this method, the validation process is carried out using a scripting

The different programs that can be used are:

● Open Source Tools

A) Open Source Tools

What are the Steps to perform Data Validation?

The steps carried out to perform Data Validation are as follows:

● Determine Data Sample

● Data Format Validation

Step 1: Determine Data Sample

Step 3: Data Format Validation

What are the Benefits of Data Validation?

Some of the benefits of Data Validation are as follows:

● It is cost-effective because it saves the appropriate amount of time and

What are the Limitations of Data Validation?

Some of the limitations of Data Validation are as follows:

You might also like