0% found this document useful (0 votes)

65 views

Unit 2 Preparing To Model

The document discusses the key steps and concepts in preparing data for machine learning models. It covers types of data like qualitative/categorical and quantitative/numeric data, structures of data like structured data stored in databases, and important preprocessing steps like checking data quality, resolving issues, and cleaning data through remediation. The goal of these preprocessing steps is to transform raw data into a suitable format for building and analyzing machine learning models.

Uploaded by

Yash Desai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

65 views

Unit 2 Preparing To Model

Uploaded by

Yash Desai

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 49

Silver Oal College Of Engineering And Technology

Unit 2 :
Preparing to Model

1 Prof. Monali Suthar (SOCET-CE)

Outline
 Machine Learning activities,
 Types of data in Machine Learning,
 Structures of data,
 Data quality and remediation,
 Data Pre-Processing: Dimensionality reduction, Feature
subset selection

2 Prof. Monali Suthar (SOCET-CE)

Framework For Developing Machine Learning Models

Problem or Opportunity
Identification

Feature Extraction

Data Preprocessing

Model Building

Communication and deployment of

Data Analysis

3 Prof. Monali Suthar (SOCET-CE)

Machine Learning activities

4 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Most data can be categorized into
2 basic types from a Machine
Learning perspective:
1. Qualitative Data
Type/Categorical data
2. Quantitative Data
Type/Numerical data

5 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Qualitative Data
/Categorical data
 Qualitative or Categorical Data
describes the object under
consideration using a finite set of
discrete classes.
 It means that this type of data can‘t be
counted or measured easily using
numbers and therefore divided into
categories.
 Ex: The gender of a person (male,
female, or others).
 There are two subcategories under
this:
 Nominal data
 Ordinal data

6 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Qualitative Data /Categorical
data
 Nominal data
 These are the set of values that
don‘t possess a natural ordering.
 nominal data type there is no
comparison among the
categories
 Ex: The color of a smartphone
can be considered as a nominal
data type as we can‘t compare
one color with others.
 Ex: The gender of a person is
another one where we can‘t
differentiate between male,
female, or others

7 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Qualitative Data /Categorical data
 Ordinal data
 These types of values have a natural
ordering while maintaining their
class of values.
 These categories help us deciding
which encoding strategy can be
applied to which type of data.
 Ex: nominal data type where there is
no comparison among the
categories small < medium < large.
 Data encoding for Qualitative data is
important because machine learning
models can‘t handle these values
directly and needed to be converted
to numerical types as the models
are mathematical in nature.

8 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Quantitative/ Numeric
Data
 This data type tries to
quantify things and it does
by considering numerical
values that make it
countable in nature.
 Discrete
 Continuous

9 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Quantitative/ Numeric
Data
 Discrete
 The numerical values
which fall under are
integers or whole
numbers are placed under
this category. The number
of speakers in the phone,
cameras, cores in the
processor, the number of
sims supported all these
are some of the examples
of the discrete data type.

10 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning
 Quantitative/ Numeric
Data
 Continuous
 The fractional numbers
are considered as
continuous values. These
can take the form of the
operating frequency of the
processors, the android
version of the phone, wifi
frequency, temperature of
the cores, and so on.

11 Prof. Monali Suthar (SOCET-CE)

Types of data in Machine Learning

12 Prof. Monali Suthar (SOCET-CE)

Structures of data
 The term structured data refers to data that resides in a
fixed field within a file or record. Structured data is
typically stored in a relational database (RDBMS).
 It can consist of numbers and text, and sourcing can
happen automatically or manually, as long as it's within an
RDBMS structure.
 It depends on the creation of a data model, defining what
types of data to include and how to store and process it.


13 Prof. Monali Suthar (SOCET-CE)

Data quality and remediation
 Data quality is an assessment or a perception
of data's fitness to fulfill its purpose. Simply put, data is said
to be high quality if it satisfies the requirements of its
intended purpose.
 There are many aspects to data quality, including consistency,
integrity, accuracy, and completeness.
 Achieving the data quality required for machine learning
 This includes checking for consistency, accuracy, compatibility,
completeness, timeliness, and duplicate or corrupted records.
 At the scale required for a typical ML project, adequately
cleansing training or production data manually is a near
impossibility.

14 Prof. Monali Suthar (SOCET-CE)

Importance of Data quality
 Data Quality matters for machine learning.
Unsupervised machine learning is a savior when the
desired quality of data is missing to reach the requirements
of the business.
 It is capable of delivering precise business insights by
evaluating data for AI-based programs.
 Improved data quality leads to better decision-making across
an organization.
 The more high-quality data you have, the more confidence
you can have in your decisions.
 Data quality is of critical importance especially in the era of
automated decisions, ML, and continuous process optimization

15 Prof. Monali Suthar (SOCET-CE)

Importance of Data quality
 Confusion, limited trust, poor decisions
 Data quality issues explain limited trust in data from corporate
users, waste of resources, or even poor decisions.
 Failures due to low data quality
 Users need to trust the data — if they don‘t, they will gradually
abandon the system impacting its major KPIs and success
criteria.

16 Prof. Monali Suthar (SOCET-CE)

Data quality issues
 Data quality issues can take many forms, for example:
 particular properties in a specific object have invalid or missing
values
 a value coming in an unexpected or corrupted format
 duplicate instances
 inconsistent references or unit of measures
 incomplete cases
 broken URLs
 corrupted binary data
 missing packages of data
 gaps in the feeds
 incorrectly -mapped properties

17 Prof. Monali Suthar (SOCET-CE)

Data quality
 Data quality issues are typically the result of:
 poor software implementations: bugs or improper
handling of particular cases
 system-level issues: failures in certain processes
 changes in data formats, impacting the source and/or
target data stores

18 Prof. Monali Suthar (SOCET-CE)

Data remediation
 Data remediation is the process of cleansing,
organizing and migrating data so that it's properly
protected and best serves its intended purpose. ... Since
the core initiative is to correct data, the data
remediation process typically involves replacing,
modifying, cleansing or deleting any ―dirty‖ data.
 It can be performed manually, with cleansing tools, as a
batch process (script), through data migration or a
combination of these methods.

19 Prof. Monali Suthar (SOCET-CE)

Data remediation
 Need for data remediation: Consider these additional
factors that will drive the need for data remediation
 Moving to a new system or environment
 Eliminating personally identifiable information (a.k.a.
PII)
 Dealing with mergers and acquisitions activity
 Addressing human errors
 Remedying errors in reports
 Other business drivers

20 Prof. Monali Suthar (SOCET-CE)

Data remediation terminology
 Data Migration – The process of moving data between two or more systems, data formats
or servers.
 Data Discovery – A manual or automated process of searching for patterns in data sets to
identify structured and unstructured data in an organization‘s systems.
 ROT – An acronym that stands for redundant, obsolete and trivial data. According to the
Association for Intelligent Information Management, ROT data accounts for nearly 80 percent
of the unstructured data that is beyond its recommended retention period and no longer
useful to an organization.
 Dark Data – Any information that businesses collect, process and store, but do not use for
other purposes. Some examples include customer call records, raw survey data or email
correspondences. Often, the storing and securing of this type of data incurs more expense
and sometimes even greater risk than it does value.
 Dirty Data – Data that damages the integrity of the organization‘s complete dataset. This can
include data that is unnecessarily duplicated, outdated, incomplete or inaccurate.
 Data Overload – This is when an organization has acquired too much data, including low-
quality or dark data. Data overload makes the tasks of identifying, classifying and remediating
data laborious.
 Data Cleansing – Transforming data in its native state to a predefined standardized format.
 Data Governance – Management of the availability, usability, integrity and security of the
21data stored within an organization. Prof. Monali Suthar (SOCET-CE)
Stages of data remediation
 Data remediation is an involved process. After all, it‘s more
than simply purging your organization‘s systems of dirty data.
 It requires knowledgeable assessment on how to most
effectively resolve unclean data.
 Assessment:
 you need to have a complete understanding of the data you possess.
 Organizing and segmentation:
 Not all data is created equally, which means that not all pieces of
data require the same level of protection or storage features.
 when creating segments is determining which historical data is
essential to business operations and needs to be stored in an archive
system versus data that can be safely deleted.

22 Prof. Monali Suthar (SOCET-CE)

Stages of data remediation
 Indexation and classification:
 These steps build off of the data segments you have created and
helps you determine action steps.
 organizations will focus on segments containing non-ROT data
and classify the level of sensitivity of this remaining data.
 Migrating:
 If an organization‘s end goal is to consolidate their data into a new,
cleansed storage environment, then migration is an essential step in
the data remediation process.
 Data cleansing:
 The final task for your organization‘s data may not always involve
migration.
 There may be other actions better suited for the data depending on
what segmentation group it falls under and its classification.
 A few vital actions that a team may proceed with include shredding,
redacting, quarantining, ACL removal and script execution to clean up
data.
23 Prof. Monali Suthar (SOCET-CE)
Benefits of data remediation
 Reduced data storage costs
 Protection for unstructured sensitive data
 Reduced sensitive data footprint
 Adherence to compliance laws and regulations
 Increased staff productivity
 Minimized cyberattack risks
 Improved overall data security

24 Prof. Monali Suthar (SOCET-CE)

Dimensionality reduction
 Dimensionality reduction
 The number of input variables or features for a dataset is referred to as its
dimensionality.

 Dimensionality reduction refers to techniques that reduce the number of

input variables in a dataset.

 More input features often make a predictive modeling task more challenging
to model, more generally referred to as the curse of dimensionality.

 High-dimensionality statistics and dimensionality reduction techniques are

often used for data visualization. Nevertheless these techniques can be used
in applied machine learning to simplify a classification or regression dataset
in order to better fit a predictive model.

25 Prof. Monali Suthar (SOCET-CE)

Why dimensionality reduction needed?
 Some features (dimensions) bear little or nor useful
information (e.g. color of hair for a car selection)
 Can drop some features
 Have to estimate which features can be dropped from data

 Several features can be combined together without loss

or even with gain of information (e.g. income of all family
members for loan application)
 Some features can be combined together
 Have to estimate which features to combine from data

26 Prof. Monali Suthar (SOCET-CE)

Feature selection vs extraction

 Feature selection: Choosing k<d important features,

ignoring the remaining d – k
 Subset selection algorithms
 Feature extraction: Project the original xi , i =1,...,d
dimensions to new k<d dimensions, zj , j =1,...,k
 Principal Components Analysis (PCA)
 Linear Discriminant Analysis (LDA)
 Factor Analysis (FA)

27 Prof. Monali Suthar (SOCET-CE)

Principal Components Analysis (PCA)
 The Principal Component Analysis is a popular unsupervised learning
technique for reducing the dimensionality of data.
 It increases interpretability yet, at the same time, it minimizes information
loss.
 It helps to find the most significant features in a dataset and makes the data
easy for plotting in 2D and 3D.
 PCA helps in finding a sequence of linear combinations of variables.

28 Prof. Monali Suthar (SOCET-CE)

Principal Components
 The Principal Components are a straight line that captures most of the
variance of the data.
 They have a direction and magnitude. Principal components are orthogonal
projections (perpendicular) of data onto lower-dimensional space.

29 Prof. Monali Suthar (SOCET-CE)

Application of PCA
• PCA is used to visualize multidimensional data.
• It is used to reduce the number of dimensions in healthcare data.
• PCA can help resize an image.
• It can be used in finance to analyze stock data and forecast returns.
• PCA helps to find patterns in the high-dimensional datasets.

30 Prof. Monali Suthar (SOCET-CE)

Uses of PCA
• To reduce the number of dimensions in the dataset.
• To find patterns in the high-dimensional dataset
• To visualize the data of high dimensionality
• To ignore noise
• To improve classification
• To gets a compact description
• To captures as much of the original variance in the
data as possible

31 Prof. Monali Suthar (SOCET-CE)

objective of PCA
• Find an orthonormal basis for the data.
• Sort dimensions in the order of importance.
• Discard the low significance dimensions.
• Focus on uncorrelated and Gaussian components.

32 Prof. Monali Suthar (SOCET-CE)

Working of PCA
1. Normalize the data
Standardize the data before performing PCA. This will ensure that each feature has a mean = 0 and variance = 1.

2. Build the covariance matrix

Construct a square matrix to express the correlation between two or more features in a multidimensional dataset.

33 Prof. Monali Suthar (SOCET-CE)

Working of PCA
3. Find the Eigenvectors and Eigenvalues
Calculate the eigenvectors/unit vectors and eigenvalues. Eigenvalues are
scalars by which we multiply the eigenvector of the covariance matrix.

4. Sort the eigenvectors in highest to lowest order and select the number
of principal components.

34 Prof. Monali Suthar (SOCET-CE)

PCA in python
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)

print(pca.explained_variance_ratio_)

print(pca.singular_values_)

35 Prof. Monali Suthar (SOCET-CE)

PCA in python
import numpy as np
from sklearn.decomposition import PCA
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
pca = PCA(n_components=2)
pca.fit(X)

print(pca.explained_variance_ratio_)

print(pca.singular_values_)

36 Prof. Monali Suthar (SOCET-CE)

LDA : Linear Discriminant Analysis
 LDA is most commonly used as dimensionality reduction
technique.
 It is very similar to PCA
 If PCA involves finding the components axes that
maximize the variance of the entire data, LDA involves
finding the axes that maximize the separation between
multiple classes.
 LDA projects a features space of size n onto a smaller
subspace k [where k  (n-)] while maintaining the class-
discriminatory information.
 Avoid overfitting
 Reduce computational cost.

37 Prof. Monali Suthar (SOCET-CE)

ICA : Independent Component Analysis
 ICA is the method for finding underlying factors or
components from multivariate statistical data.
 It looks for components that are both statistically
independent and non-Gaussian.
 Statistically independent and non-Gaussian components
can be separated using a blind source separation method.
Nonlinear decorre-lation and maximum non-Gaussianity
are the methods used in ICA.
 Methods for Estimating ICA
1. Nonlinear decorrelation:
2. Maximum non-Gaussianity:

38 Prof. Monali Suthar (SOCET-CE)

ICA : Independent Component Analysis
 Methods for Estimating ICA
1. Nonlinear decorrelation:
 Nonlinear decorrelation method involves fnding the matrix W so that
for any i ≠ j, the components yi and yj are completely uncorrelated.
 The transformed components g(yi) and h(yj) are also uncorrelated,
where g and h are some suitable nonlinear functions.
 In this method of estimating ICA, if the nonlinearities are properly
chosen, the method does find the independent components.
 The main problem in this method is to address how the
nonlinearities g and h are chosen. One of the approaches to select
the nonlinear functions is to use maximum likelihood method in
information theory.

39 Prof. Monali Suthar (SOCET-CE)

ICA : Independent Component Analysis
 Methods for Estimating ICA
2. Maximum non-Gaussianity:
 The maximum non-Gaussianity approach for estimating independent
component involves finding the local maxima of non-Gaussianity of a
linear combination, y = Σbixi, under the constraint that the variance of
y is constant.
 Each local maximum corresponds to one independent component. In
practice, kurtosis is used to measure non-Gaussianity. Kurtosis is a
higher order cumulate method, which involves some ways of
generalizations of variance using higher order polynomials.
 Cumulants are used for ICA as they have important algebraic and
statistical properties.

40 Prof. Monali Suthar (SOCET-CE)

High Dimensional data
 High Dimensional refers to the high number of variables or
attributes or features present in certain data sets, more so in
the domains.
 Ex: DNA analysis, geographic information system (GIS), etc.
 A model built on an extremely high number of features may be
very difficult to understand.
 So, Start with Feature selection
 Benefits :
1. Having a faster and more cost-effective (less need for
computational resources) learning model
2. Having a better understanding of the underlying model that
generates the data.
3. Improving the efficacy of the learning model.

41 Prof. Monali Suthar (SOCET-CE)

Feature subset selection
 Feature Selection is the most critical pre-
processing activity in any machine learning process.
 It intends to select a subset of attributes or
features that makes the most meaningful
contribution to a machine learning activity. In order
to understand it, let us consider a small example
i.e.
 Predict the weight of students based on the
past information about similar students,
which is captured inside a ‗Student Weight‘ data set.
 The data set has 04 features like Roll Number,
Age, Height & Weight.
 Roll Number has no effect on the weight of the
students, so we eliminate this feature. Reduced Data Set
 So now the new data set will be having only 03
features.
 This subset of the data set is expected to give
better results than the full set.

42 Prof. Monali Suthar (SOCET-CE)

Feature Subset Selection
 The Goal of Feature Subset Selection is to find the optimal
feature subset.
 Feature Subset Selection Methods can be classified into three
broad categories:
 Filter Methods
 Wrapper Methods
 Embedded Methods
 Requirements:
 A measure for assessing the goodness of a feature subset
(scoring function)
 A strategy to search the space of possible feature subsets
 Finding a minimal optimal feature set for an arbitrary
target concept is hard. It would need Good Heuristics.

43 Prof. Monali Suthar (SOCET-CE)

Filter Methods
 In this method, select subsets of variables as a pre-processing
step, independently of the used classifier
 It would be worthwhile to note that Variable Ranking-
Feature Selection is a Filter Method

 Key features of Filter Methods

 Filter Methods are usually fast
 Filter Methods provide generic selection of features, not tuned by given
learner (universal)
 Filter Methods are also often criticized (feature set not optimized for
used classifier)
 Filter Methods are sometimes used as a pre-processing step for other
44 methods. Prof. Monali Suthar (SOCET-CE)
Wrapper Methods
 In Wrapper Methods, the
Learner is considered a
black-box. Interface of the
black-box is used to score
subsets of variables
according to the predictive
power of the learner when
using the subsets.
 Results vary for different
learners
 One needs to define: – how
to search the space of all
possible variable subsets ?–
how to assess the prediction
performance of a learner ?

45 Prof. Monali Suthar (SOCET-CE)

Embedded Methods
 Embedded Methods are specific to a given learning
machine
 Performs variable selection (implicitly) in the process of
training
 E.g. WINNOW-algorithm (linear unit with multiplicative
updates).

46 Prof. Monali Suthar (SOCET-CE)

47 Prof. Monali Suthar (SOCET-CE)
 https://medium.com/ml-research-lab/chapter-2-data-and-
its-different-types-3dfebcbb4dbe
 https://blog.statsbot.co/data-structures-related-to-
machine-learning-algorithms-
5edf77c8bbf4#:~:text=Array,mathematical%20tool%20at%
20your%20disposal.
 https://www.upgrad.com/blog/types-of-data/
 https://www.spirion.com/data-remediation/

48 Prof. Monali Suthar (SOCET-CE)

49 Prof. Monali Suthar (SOCET-CE)

Case Study Analysis
No ratings yet
Case Study Analysis
4 pages
Unit 4 Basics of Feature Engineering
No ratings yet
Unit 4 Basics of Feature Engineering
33 pages
DOC-20231118-WA0008new Unit 3
No ratings yet
DOC-20231118-WA0008new Unit 3
15 pages
Working With Data
No ratings yet
Working With Data
71 pages
Dmml Notes
No ratings yet
Dmml Notes
89 pages
Unit01-Advanced Data Management Techniques
No ratings yet
Unit01-Advanced Data Management Techniques
11 pages
Advanced Data Management Techniques
No ratings yet
Advanced Data Management Techniques
257 pages
ML Unit1.notes
No ratings yet
ML Unit1.notes
8 pages
Data Science PPT Module 1
100% (1)
Data Science PPT Module 1
24 pages
Understanding Data (1)
No ratings yet
Understanding Data (1)
44 pages
Article-7
No ratings yet
Article-7
5 pages
Chapter 2. Introduction to Data Science
No ratings yet
Chapter 2. Introduction to Data Science
41 pages
Data Mining Unit-1 Notes
No ratings yet
Data Mining Unit-1 Notes
18 pages
UNIT-2
No ratings yet
UNIT-2
19 pages
Unit 2 - Data Munging PDF
No ratings yet
Unit 2 - Data Munging PDF
54 pages
Google Certificate Notes
No ratings yet
Google Certificate Notes
36 pages
L 4 and 5-Data Cleaning DS-Sa
No ratings yet
L 4 and 5-Data Cleaning DS-Sa
44 pages
Working With Data - Annotated
No ratings yet
Working With Data - Annotated
62 pages
Data Mining and Data Warehousing - Data Preprocessing - L03
No ratings yet
Data Mining and Data Warehousing - Data Preprocessing - L03
10 pages
CSC 3301-Lecture06 Introduction To Machine Learning
No ratings yet
CSC 3301-Lecture06 Introduction To Machine Learning
56 pages
UNIT I - Introduction - DataScience - New
No ratings yet
UNIT I - Introduction - DataScience - New
34 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
DataMining S
No ratings yet
DataMining S
103 pages
Data Types
No ratings yet
Data Types
2 pages
Machine Learning Chapter 2
No ratings yet
Machine Learning Chapter 2
37 pages
Working With Data - Annotated
No ratings yet
Working With Data - Annotated
62 pages
Bi 20soeit11002 Antala Krishnaa
No ratings yet
Bi 20soeit11002 Antala Krishnaa
5 pages
Data_in_machine_learning
No ratings yet
Data_in_machine_learning
7 pages
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Chapter 2 Data Science
No ratings yet
Chapter 2 Data Science
55 pages
Introduction To Data Science: Chapter Two
No ratings yet
Introduction To Data Science: Chapter Two
52 pages
Week3 02 Dataset Characteristics
No ratings yet
Week3 02 Dataset Characteristics
41 pages
DSF 3-4
No ratings yet
DSF 3-4
18 pages
Chapter 2 - Introduction to Data Science
No ratings yet
Chapter 2 - Introduction to Data Science
37 pages
Unit 1 - DA - Introduction To Data Science
No ratings yet
Unit 1 - DA - Introduction To Data Science
70 pages
Unit 2
No ratings yet
Unit 2
18 pages
CHAPTER-1
No ratings yet
CHAPTER-1
149 pages
Disruptive Technologies DA Lecture 8
No ratings yet
Disruptive Technologies DA Lecture 8
17 pages
Data Analysis and Modelling
No ratings yet
Data Analysis and Modelling
107 pages
Data - part 1
No ratings yet
Data - part 1
58 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Data Mining Basic Techniques
No ratings yet
Data Mining Basic Techniques
14 pages
Chapter - 2 - Data Science
No ratings yet
Chapter - 2 - Data Science
32 pages
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
UNIT _ Introduction_DataScience_new (1)
No ratings yet
UNIT _ Introduction_DataScience_new (1)
55 pages
Building A ML System
No ratings yet
Building A ML System
42 pages
hammad raza.
No ratings yet
hammad raza.
28 pages
ML Notes All
No ratings yet
ML Notes All
257 pages
Data
No ratings yet
Data
36 pages
4_Unit 2 - Lecture 1 Types of DataSet-L1
No ratings yet
4_Unit 2 - Lecture 1 Types of DataSet-L1
17 pages
DA (1)
No ratings yet
DA (1)
86 pages
CH-2 Introduction To Data Science
No ratings yet
CH-2 Introduction To Data Science
26 pages
Class 2 - Extraction, Transformation and Load (ETL)
No ratings yet
Class 2 - Extraction, Transformation and Load (ETL)
25 pages
DWDMUNIT1A
No ratings yet
DWDMUNIT1A
95 pages
Chaper 3 FoDS - Copy
No ratings yet
Chaper 3 FoDS - Copy
127 pages
Unit 3 DW
No ratings yet
Unit 3 DW
19 pages
Data Science
No ratings yet
Data Science
9 pages
Data Modeling in System Analysis
No ratings yet
Data Modeling in System Analysis
9 pages
CL 2
No ratings yet
CL 2
85 pages
Data Quality and Data Cleaning: An Overview
No ratings yet
Data Quality and Data Cleaning: An Overview
27 pages
Facets of Data:: Self-Describing Structure
No ratings yet
Facets of Data:: Self-Describing Structure
6 pages
DataScienceLVCSession2 (1) (1)
No ratings yet
DataScienceLVCSession2 (1) (1)
13 pages
Methods To Solve Recurrence
No ratings yet
Methods To Solve Recurrence
4 pages
Unit 3 Modelling and Evaluation
No ratings yet
Unit 3 Modelling and Evaluation
40 pages
Unit 1 Introduction To ML
100% (1)
Unit 1 Introduction To ML
52 pages
Geometry of Electron
No ratings yet
Geometry of Electron
2 pages
UNDERSTANDING THE SELF
No ratings yet
UNDERSTANDING THE SELF
14 pages
Waddel
100% (2)
Waddel
404 pages
How Zodiac Signs Show That They Care About You
No ratings yet
How Zodiac Signs Show That They Care About You
15 pages
Network Security University QP 2 Mark Questions and Answers
No ratings yet
Network Security University QP 2 Mark Questions and Answers
4 pages
Cardiac Rehabilitation
No ratings yet
Cardiac Rehabilitation
28 pages
SQL Primjeri
No ratings yet
SQL Primjeri
18 pages
Department of Information Technology Academic Year: 2020-2021 Subject: DLD SAP ID-60003200163
No ratings yet
Department of Information Technology Academic Year: 2020-2021 Subject: DLD SAP ID-60003200163
3 pages
BUS 1038 W2 Study Guide
No ratings yet
BUS 1038 W2 Study Guide
14 pages
Jenny D. Napuran - Activity 1
50% (2)
Jenny D. Napuran - Activity 1
5 pages
Labour Allocation For Tuesday 22nd October 2024
No ratings yet
Labour Allocation For Tuesday 22nd October 2024
3 pages
Lesson Plan Hermosa Jhon Marco M.
No ratings yet
Lesson Plan Hermosa Jhon Marco M.
7 pages
Global Design Competition For National War Memorial: Terms and Conditions
No ratings yet
Global Design Competition For National War Memorial: Terms and Conditions
3 pages
Formal Requirements of Master Thesis - PVF
No ratings yet
Formal Requirements of Master Thesis - PVF
5 pages
Suicide, Assisted Suicide and Euthanasia - A Buddhist Perspective
No ratings yet
Suicide, Assisted Suicide and Euthanasia - A Buddhist Perspective
22 pages
PT - Filipino 6 - Q1
No ratings yet
PT - Filipino 6 - Q1
8 pages
3D Face Recognition
No ratings yet
3D Face Recognition
8 pages
Funke - 2022 - Be Prepared For The Complexities of The Twenty-First Century
No ratings yet
Funke - 2022 - Be Prepared For The Complexities of The Twenty-First Century
14 pages
Reaction Paper For "Batas Militar": (SS2 Phil. Gov't and Constitution)
100% (1)
Reaction Paper For "Batas Militar": (SS2 Phil. Gov't and Constitution)
2 pages
lec_12_generative_adversarial_networks
No ratings yet
lec_12_generative_adversarial_networks
85 pages
Blood Transfusion
No ratings yet
Blood Transfusion
6 pages
Salita vs. Salve, A.C. No. 8101, February 4, 2015
No ratings yet
Salita vs. Salve, A.C. No. 8101, February 4, 2015
4 pages
Imperialism in Frankenstein
100% (1)
Imperialism in Frankenstein
3 pages
Transition Plan For Binder
No ratings yet
Transition Plan For Binder
6 pages
End Stage Renal Disease (ESRD)
No ratings yet
End Stage Renal Disease (ESRD)
21 pages
EdgeSwitch CLI Command Reference UG
No ratings yet
EdgeSwitch CLI Command Reference UG
431 pages
An Email Giving Holiday Advice: Before Reading
No ratings yet
An Email Giving Holiday Advice: Before Reading
6 pages
Oriental Region 3
No ratings yet
Oriental Region 3
2 pages
A Study On Consumer Behavior Towards FMCG Products Among The Rural Suburban Hhs of Ernakulam 2375 4389.1000127
No ratings yet
A Study On Consumer Behavior Towards FMCG Products Among The Rural Suburban Hhs of Ernakulam 2375 4389.1000127
10 pages

Unit 2 Preparing To Model

Uploaded by

Unit 2 Preparing To Model

Uploaded by

Silver Oal College Of Engineering And Technology

1 Prof. Monali Suthar (SOCET-CE)

2 Prof. Monali Suthar (SOCET-CE)

Communication and deployment of

3 Prof. Monali Suthar (SOCET-CE)

4 Prof. Monali Suthar (SOCET-CE)

5 Prof. Monali Suthar (SOCET-CE)

6 Prof. Monali Suthar (SOCET-CE)

7 Prof. Monali Suthar (SOCET-CE)

8 Prof. Monali Suthar (SOCET-CE)

9 Prof. Monali Suthar (SOCET-CE)

10 Prof. Monali Suthar (SOCET-CE)

11 Prof. Monali Suthar (SOCET-CE)

12 Prof. Monali Suthar (SOCET-CE)

13 Prof. Monali Suthar (SOCET-CE)

14 Prof. Monali Suthar (SOCET-CE)

15 Prof. Monali Suthar (SOCET-CE)

16 Prof. Monali Suthar (SOCET-CE)

17 Prof. Monali Suthar (SOCET-CE)

18 Prof. Monali Suthar (SOCET-CE)

19 Prof. Monali Suthar (SOCET-CE)

20 Prof. Monali Suthar (SOCET-CE)

22 Prof. Monali Suthar (SOCET-CE)

24 Prof. Monali Suthar (SOCET-CE)

 Dimensionality reduction refers to techniques that reduce the number of

 High-dimensionality statistics and dimensionality reduction techniques are

25 Prof. Monali Suthar (SOCET-CE)

 Several features can be combined together without loss

26 Prof. Monali Suthar (SOCET-CE)

 Feature selection: Choosing k<d important features,

27 Prof. Monali Suthar (SOCET-CE)

28 Prof. Monali Suthar (SOCET-CE)

29 Prof. Monali Suthar (SOCET-CE)

30 Prof. Monali Suthar (SOCET-CE)

31 Prof. Monali Suthar (SOCET-CE)

32 Prof. Monali Suthar (SOCET-CE)

2. Build the covariance matrix

33 Prof. Monali Suthar (SOCET-CE)

34 Prof. Monali Suthar (SOCET-CE)

35 Prof. Monali Suthar (SOCET-CE)

36 Prof. Monali Suthar (SOCET-CE)

37 Prof. Monali Suthar (SOCET-CE)

38 Prof. Monali Suthar (SOCET-CE)

39 Prof. Monali Suthar (SOCET-CE)

40 Prof. Monali Suthar (SOCET-CE)

41 Prof. Monali Suthar (SOCET-CE)

42 Prof. Monali Suthar (SOCET-CE)

43 Prof. Monali Suthar (SOCET-CE)

 Key features of Filter Methods

45 Prof. Monali Suthar (SOCET-CE)

46 Prof. Monali Suthar (SOCET-CE)

48 Prof. Monali Suthar (SOCET-CE)

You might also like