0% found this document useful (0 votes)
13 views2 pages

FDS CH 3

Uploaded by

sonuchaure548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views2 pages

FDS CH 3

Uploaded by

sonuchaure548
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

FDS_CH_3 : Data Preprocessing

• Data preprocessing is the method of collecting raw data and translating it into usable/meaningful information.
• The data preprocessing is required to improve the quality of data.
DATA OBJECTS :
• Data is a collection of data objects and their attributes.
• A collection of attributes describe an object.
• Data objects can also be referred to as samples, examples, instances, case, entity, data points or objects.
Data Attributes :
• A data attribute is a singlevalue descriptor for a data object.
• An attribute is a property or characteristic of an object.
• There are broadly four types of attributes namely, Nominal attribute, Binary attribute, Ordinal attribute and
Numeric attributes.
DATA QUALITY :
• Data quality can be defined as, “the ability of a given data set to serve an intended purpose”.
• Data preprocessing is responsible for maintain the quality of data.
• There are many factors comprising data quality, including accuracy, completeness, consistency, timeliness,
M
believability, and interpretability.
• There are many reasons for inaccurate, incomplete, and inconsistent in real-world databases and data
warehouses.
Inaccuracy: • Inaccurate data means having incorrect attribute values.
Data Cleaning :
• Data cleaning is used to handle missing data.
r.
• Data cleaning also known as data scrubbing.
• Data cleaning is the process of correcting or removing incorrect, incomplete or duplicate data within a dataset.
Missing Values :
• Some values in the data may not be filled up for various reasons and hence are considered missing.
• there can be three cases of missing data:
Missing Completely At Random (MCAR), Missing At Random Data (MAR) , Missing Not At Random (MNAR).
R
**Data Transformation :
• Data transformation is the process of converting raw data into a structure data.
• Data transformation is a data preprocessing technique that transforms the data into alternate forms.
• Data transformation is a process of converting raw data into a single and easy-to-read format.
• Data transformation is the process of changing the format, structure, or values of data.
oh
Rescaling:
• Rescaling means transforming the data so that it fits within a specific scale, like 0-100 or 0-1.
• Rescaling of data allows scaling all data values to lie between a specified minimum and maximum value.
Normalizing:
• To avoid dependence on the choice of measurement units, the data should be normalized.
• Normalization scaled data is fall within a smaller range, such as 0.0 to 1.0 or -1.0 to 1.0
it
• Normalizing the data attempts to give all attributes an equal weight.
Binarizing:
• It is the process of converting data to either 0 or 1 based on a threshold value.
• All the data values above the threshold value are marked 1 whereas all the data values equal to or below
the threshold value are marked as 0.
Standarizing
• Standardization also called mean removal.
• In other words, Standardization is another scaling technique where the values are centered around the
mean with a unit standard deviation.
Data Discretization :
• Data discretization is method of translating attribute values of continuous data into a finite set of intervals with
minimal information loss.
• The data discretization technique is used to divide the attributes of the continuous nature into data with
intervals.
it
oh
R
r.
M

You might also like