0% found this document useful (0 votes)

14 views15 pages

Module 2 Rnsit

Module 2 of the Machine Learning course covers understanding data, focusing on bivariate and multivariate data analysis, essential mathematics, and feature engineering techniques. It explains concepts like covariance, correlation, and various statistical methods for data visualization and analysis, including scatter plots, heat maps, and pair plots. Additionally, it discusses feature transformation, selection, and dimensionality reduction methods such as PCA and LDA, which are crucial for improving machine learning model performance.

Uploaded by

emanikanta535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views15 pages

Module 2 Rnsit

Uploaded by

emanikanta535

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Module 2- Machine Learning (BCS602)

Module 2
Understanding Data
Bivariate and Multivariate data, Multivariate statistics, Essential mathematics for Multivariate data,
Overview hypothesis, Feature engineering and dimensionality reduction techniques, Basics of Learning
Theory: Introduction to learning and its types, Introduction computation learning theory, Design of
learning system, Introduction concept learning. Similarity-based learning: Introduction to Similarity or
instance based learning, Nearest-neighbour learning, weighted k- Nearest - Neighbour algorithm.

CHAPTER -2
2.6 BIVARIATE DATA AND MULTIVARIATE DATA
Bivariate Data involves two variables. Bivariate data deals with causes of relationships. The aim is
to find relationships among data. Consider the following Table 2.3, with data of the temperature in
a shop and sales of sweaters.

Here, the aim of bivariate analysis is to find relationships among variables. The relationships can then be
used in comparisons, finding causes, and in further explorations. To do that, graphical display of the data is
necessary. One such graph method is called scatter plot.

Scatter plot is used to visualize bivariate data. It is useful to plot two variables with or without nominal
variables, to illustrate the trends, and also to show differences. It is a plot between explanatory and response
variables. It is a 2D graph showing the relationship between two variables. Line graphs are similar to scatter
plots. The Line Chart for sales data is shown in Figure 2.12.

2.6.1 Bivariate Statistics

Covariance and Correlation are examples of bivariate statistics. Covariance is a measure of joint probability
of random variables, say X and Y. Generally, random variables are represented in capital letters. It is defined

Deepa S, Asst. Professor, Dept of CSE, RNSIT 1

Module 2- Machine Learning (BCS602)

as covariance (X, Y) or COV (X, Y) and is used to measure the variance between two dimensions. The formula
for finding co-variance for specific x, and y are:

Here, xi and yi are data values from X and Y. E(X) and E(Y) are the mean values of xi and yi. N is the number
of given data. Also, the COV(X, Y) is same as COV(Y, X).

If the given attributes are X = (x1, x2, … , xN) and Y = (y1, y2, … , yN), then the Pearson correlation coefficient,
that is denoted as r, is given as: (σX, σY are the standard deviations of X and Y.)

2.7 MULTIVARIATE STATISTICS

In machine learning, almost all datasets are multivariable. Multivariate data is the analysis of more than two
observable variables, and often, thousands of multiple measurements need to be conducted for one or more
subjects. Multivariate data has three or more variables. The aim of the multivariate analysis is much more.
They are regression analysis, factor analysis and multivariate analysis of variance.

Heatmap A heat map is a graphical representation of data where individual values are represented by
colors. Heat maps are often used in data analysis and visualization to show patterns, density, or intensity of
data points in a two-dimensional grid.
Example: Let's consider a heat map to display the average temperatures (in °C) across different regions in
a country over a week. Each cell in the heat map will represent a temperature for a specific region on a
specific day. This is useful to quickly identify trends, such as higher temperatures in certain regions or
specific days with unusual weather patterns. The color gradient (from blue to red) indicates the
temperature range: cooler colors represent lower temperatures, while warmer colors represent higher
temperatures.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 2

Module 2- Machine Learning (BCS602)

Pairplot
Pairplot or scatter matrix is a data visual technique for multivariate data. A scatter matrix consists of several
pair-wise scatter plots of variables of the multivariate data. A random matrix of three columns is chosen and
the relationships of the columns is plotted as a pairplot (or scatter matrix) as shown in Figure 2.14.

2.8 ESSENTIAL MATHEMATICS FOR MULTIVARIATE DATA

Machine learning involves many mathematical concepts from the domain of Linear algebra, Statistics,
Probability and Information theory. The subsequent sections discuss important aspects of linear algebra
and probability.

2.8.1 Linear Systems and Gaussian Elimination for Multivariate Data

A linear system of equations is a group of equations with unknown variables. Let Ax = y, then the solution
x is given as: x= y/A= A-1y. This is true if y is not zero and A is not zero. The logic can be extended for N-

Deepa S, Asst. Professor, Dept of CSE, RNSIT 3

Module 2- Machine Learning (BCS602)

set of equations with ‘n’ unknown variables. It means if A= and y=(y1 y2…yn), then the unknown
variable x can be computed as: x= y/A= A-1y

If there is a unique solution, then the system is called consistent independent. If there are various
solutions, then the system is called consistent dependant. If there are no solutions and if the equations are
contradictory, then the system is called inconsistent.

For solving large number of system of equations, Gaussian elimination can be used. The
procedure for applying Gaussian elimination is given as follows:
1. Write the given matrix.
2. Append vector y to the matrix A. This matrix is called augmentation matrix.
3. Keep the element a11 as pivot and eliminate all a11 in second row using the matrix operation,

R2 - (a21/a11), here R2 is the 2nd row and (a21/a11) is called the multiplier.

The same logic can be used to remove a11 in all other equations.
4. Repeat the same logic and reduce it to reduced echelon form. Then, the unknown variable as:

5. Then, the remaining unknown variables can be found by back-substitution as:

To facilitate the application of Gaussian elimination method, the following row operations are
applied:
1. Swapping the rows
2. Multiplying or dividing a row by a constant
3. Replacing a row by adding or subtracting a multiple of another row to it

Deepa S, Asst. Professor, Dept of CSE, RNSIT 4

Module 2- Machine Learning (BCS602)

These concepts are illustrated in Example 2.8.

2.8.2 Matrix Decomposition

It is often necessary to reduce a matrix to its constituent parts so that complex matrix operations can be
performed.
Then, the matrix A can be decomposed as: A=Q ^ QT

where, Q is the matrix of eigen vectors, Λ is the diagonal matrix and QT is the transpose of matrix Q.

LU Decomposition
One of the simplest matrix decomposition is LU decomposition where the matrix A can be decomposed
matrices: A = LU. Here, L is the lower triangular matrix and U is the upper triangular matrix. The
decomposition can be done using Gaussian elimination method as discussed in the previous section. First,
an identity matrix is augmented to the given matrix. Then, row operations and Gaussian elimination is
applied to reduce the given matrix to get matrices L and U. Example 2.9 illustrates the application of
Gaussian elimination to get LU.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 5

Module 2- Machine Learning (BCS602)

Now, it can be observed that the first matrix is L as it is the lower triangular matrix whose values are the
determiners used in the reduction of equations above such as 3, 3 and 2/3.
The second matrix is U, the upper triangular matrix whose values are the values of the reduced matrix
because of Gaussian elimination.

Introduction to Machine Learning and Probability/Statistics

• Importance: Machine learning relies heavily on statistics and probability to make

predictions and analyze data.
• Statistics in ML: Key for understanding data patterns, measuring relationships, and
quantifying uncertainties.

Probability Distributions

• Definition: A probability distribution describes the likelihood of various outcomes for a variable XXX.
• Types:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 6

Module 2- Machine Learning (BCS602)

o Discrete Probability Distributions: For countable events (e.g., binomial, Poisson).

o Continuous Probability Distributions: For measurable events on a continuum (e.g., normal,
exponential).

Continuous Probability Distributions

1. Normal Distribution (Gaussian Distribution)

• Shape: Bell curve, symmetric around the mean.

• Characteristics: Defined by mean μ and standard deviation σ.
• Probability Density Function (PDF)

• Applications: Common in natural data (e.g., heights, exam scores).

• Z-score: Standardizes data points. Z=X−μ/σ
2. Uniform Distribution (Rectangular Distribution)

• Definition: Equal probability for all outcomes within range [a,b].

• PDF :

3. Exponential Distribution

Definition: Models time between events in a Poisson process

Discrete Probability Distributions

1 Binomial Distribution

• Definition: For trials with two outcomes (success/failure).

• Formula for Probability of k Successes in n Trials:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 7

Module 2- Machine Learning (BCS602)

2 Poisson Distribution

• Definition: Models the number of events in a fixed interval of time.

• PDF

3 Bernoulli Distribution

• Definition: Models a single trial with two outcomes (success/failure).

• Probability Mass Function (PMF)

Density Estimation

• Goal: Estimate the probability density function (PDF) of data.

• Types:
o Parametric Density Estimation: Assumes a known distribution (e.g., Gaussian)
and estimates parameters.
o Non-Parametric Density Estimation: Does not assume a fixed distribution (e.g.,
Parzen window, k-Nearest Neighbors)

Parametric Density Estimation

1 Maximum Likelihood Estimation (MLE)

• Definition: A method for estimating the parameters of a distribution by maximizing the

likelihood function.
• Likelihood Function: Maximize L(ϴ) for parameter ϴ

Gaussian Mixture Model (GMM) and Expectation-Maximization (EM) Algorithm

• GMM: A probabilistic model assuming data is generated from a mixture of Gaussian

distributions.
• EM Algorithm:
o E-Step: Estimate the distribution parameters for each latent variable.
o M-Step: Optimize parameters using MLE.
• Iteration: Repeat until convergence.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 8

Module 2- Machine Learning (BCS602)

Non-Parametric Density Estimation Methods

1 Parzen Window

• Definition: A non-parametric technique that estimates the PDF based on local samples.
• Example: Uses a kernel function like Gaussian around each data point.

2 k-Nearest Neighbors (KNN)

• Definition: Estimates density by considering the kkk closest neighbors.

• Application: Frequently used in classification tasks.

2.10 FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES

Features are attributes. Feature engineering is about determining the subset of features that form
an important part of the input that improves the performance of the model, be it classification or any other
model in machine learning.

Feature engineering deals with two problems – Feature Transformation and Feature Selection.
Feature transformation is extraction of features and creating new features that may be helpful in increasing
performance. For example, the height and weight may give a new attribute called Body Mass Index (BMI).

Feature subset selection is another important aspect of feature engineering that focuses on selection of
features to reduce the time but not at the cost of reliability.

The features can be removed based on two aspects:

1. Feature relevancy – Some features contribute more for classification than other features. For
example, a mole on the face can help in face detection than common features like nose. In simple
words, the features should be relevant.
Feature redundancy – Some features are redundant. For example, when a database table has a field called
Date of birth, then age field is not relevant as age can be computed easily from date of birth.
So, the procedure is:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection

Filter-based selection uses statistical measures for assessing features. In this approach, no learning
algorithm is used. Correlation and information gain measures like mutual information and entropy are all
examples of this approach.

Wrapper-based methods use classifiers to identify the best features. These are selected and evaluated by
the learning algorithms. This procedure is computationally intensive but has superior performance.

2.10.1 Stepwise Forward Selection

This procedure starts with an empty set of attributes. Every time, an attribute is tested for statistical
significance for best quality and is added to the reduced set. This process is continued till a good reduced
set of attributes is obtained.

2.10.2 Stepwise Backward Elimination

This procedure starts with a complete set of attributes. At every stage, the procedure removes the worst
attribute from the set, leading to the reduced set.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 9

Module 2- Machine Learning (BCS602)

2.10.3 Principal Component Analysis

The idea of the principal component analysis (PCA) or KL transform is to transform a given set of
measurements to a new set of features so that the features exhibit high information packing properties.
This leads to a reduced and compact set of features. Consider a group of random vectors of the form:

The mean vector of the set of random vectors is defined as:

The operator E refers to the expected value of the population. This is calculated theoretically using the
probability density functions (PDF) of the elements xi and the joint probability density functions between
the elements xi and xj. From this, the covariance matrix can be calculated as:

The mapping of the vectors x to y using the transformation can now be described as:

This transform is also called as Karhunen-Loeve or Hoteling transform. The original vector x
can now be reconstructed as follows:

If K largest eigen values are used, the recovered information would be:

The PCA algorithm is as follows:

1. The target dataset x is obtained
2. The mean is subtracted from the dataset. Let the mean be m. Thus, the adjusted dataset is X – m.
The objective of this process is to transform the dataset with zero mean.
3. The covariance of dataset x is obtained. Let it be C.
4. Eigen values and eigen vectors of the covariance matrix are calculated.
5. The eigen vector of the highest eigen value is the principal component of the dataset. The eigen
values are arranged in a descending order. The feature vector is formed with these eigen vectors in
its columns.
Feature vector = {eigen vector1, eigen vector2, … , eigen vectorn}
6. Obtain the transpose of feature vector. Let it be A.
7. PCA transform is y = A × (x – m), where x is the input dataset, m is the mean, and A is the transpose
of the feature vector.
The original data can be retrieved using the formula given below:

The new data is a dimensionaly reduced matrix that represents the original data.
Figure 2.15. The scree plot indicates that only 6 out of 246 attributes are important.

From Figure 2.15, one can infer the relevance of the attributes. The scree plot indicates that
the first attribute is more important than all other attributes.

2.10.4 Linear Discriminant Analysis

Linear Discriminant Analysis (LDA) is also a feature reduction technique like PCA. The focus of LDA
is to project higher dimension data to a line (lower dimension data). LDA is also used to classify the
data. Let there be two classes, c1 and c2. Let m1 and m2 be the mean of the patterns of two classes.
The mean of the class c1 and c2 can be computed as:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 10

Module 2- Machine Learning (BCS602)

The aim of LDA is to optimize the function:

2.10.5 Singular Value Decomposition

Singular Value Decomposition (SVD) is another useful decomposition technique. Let A be the
matrix, then the matrix A can be decomposed as:

Here, A is the given matrix of dimension m × n, U is the orthogonal matrix whose dimension is m × n, S is the
diagonal matrix of dimension n × n, and V is the orthogonal matrix. The procedure for finding decomposition
matrix is given as follows:
1. For a given matrix, find AA^T
2. Find eigen values of AA^T
3. Sort the eigen values in a descending order. Pack the eigen vectors as a matrix U.
4. Arrange the square root of the eigen values in diagonal. This matrix is diagonal matrix, S.
5. Find eigen values and eigen vectors for A^TA. Find the eigen value and pack the eigen vector as a
matrix called V.
Thus, A = USV^T. Here, U and V are orthogonal matrices. The columns of U and V are left and right
singular values, respectively. SVD is useful in compression, as one can decide to retain only a certain
component instead of the original matrix A as:

Based on the choice of retention, the compression can be controlled.

CHAPTER 3 - BASICS OF LEARNING THEORY

3.3 DESIGN OF A LEARNING SYSTEM

3.4 INTRODUCTION TO CONCEPT LEARNING

Concept learning is a learning strategy that involves acquiring abstract knowledge or inferring a general
concept based on the given training samples. It aims to derive a category or classification from the data,
facilitating abstraction and generalization. In machine learning, concept learning is about finding a function
that categorizes or labels instances correctly based on the observed features.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 11

Module 2- Machine Learning (BCS602)

3.4.1 Representation of a Hypothesis

A hypothesis, denoted by h, is an approximation of the target function f. It represents the relationship

between independent attributes (input features) and the dependent attribute (output or label) of the
training instances. The hypothesis acts as the predicted model that maps inputs to outputs effectively.
In concept learning, each hypothesis is represented as a conjunction (AND combination) of attribute
conditions in the antecedent part, defining specific constraints on attributes to classify instances
accurately.

3.4.2 Hypothesis Space

Hypothesis space is the set of all possible hypotheses that approximates the target function
f.

The subset of hypothesis space that is consistent with all-observed training instances is
called as Version Space.

3.4.3 Heuristic Space Search

Heuristic search is a search strategy that finds an optimized hypothesis/solution to a

problem by iteratively improving the hypothesis/solution based on a given heuristic
function or a cost measure.

3.4.4 Generalization and Specialization

Searching the Hypothesis Space

There are two ways of learning the hypothesis, consistent with all training instances from
the large hypothesis space.

Deepa S, Asst. Professor, Dept of CSE, RNSIT 12

Module 2- Machine Learning (BCS602)

1. Specialization – General to Specific learning

2. Generalization – Specific to General learning

Generalization – Specific to General Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by generalizing the most
specific hypothesis.

Specialization – General to Specific Learning This learning methodology will search

through the hypothesis space for an approximate hypothesis by specializing the most
general hypothesis.

3.4.5 Hypothesis Space Search by Find-S Algorithm

Limitations of Find-S Algorithm

3.4.6 Version Spaces

Deepa S, Asst. Professor, Dept of CSE, RNSIT 13

Module 2- Machine Learning (BCS602)

List-Then-Eliminate Algorithm

Candidate Elimination Algorithm

The diagrammatic representation of deriving the version space is shown below:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 14

Module 2- Machine Learning (BCS602)

Deriving the Version Space

Deepa S, Asst. Professor, Dept of CSE, RNSIT 15

Module 2
No ratings yet
Module 2
107 pages
Multivariate
0% (1)
Multivariate
319 pages
Notes For Multivariate Statistics With R
No ratings yet
Notes For Multivariate Statistics With R
189 pages
Ml. Model 2
100% (1)
Ml. Model 2
31 pages
A Journey From Linear Algebra To Machine Learning
No ratings yet
A Journey From Linear Algebra To Machine Learning
50 pages
Mml-Book (1) Removed
No ratings yet
Mml-Book (1) Removed
371 pages
150+ Detailed Mathematics Questions and Answers
No ratings yet
150+ Detailed Mathematics Questions and Answers
7 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
ML Module2
No ratings yet
ML Module2
59 pages
Module 4 - Chapter 2
No ratings yet
Module 4 - Chapter 2
14 pages
Textbook ML - Removed
No ratings yet
Textbook ML - Removed
22 pages
ML Module 02
No ratings yet
ML Module 02
37 pages
Module - 02 Machine Learning (BCS602)
No ratings yet
Module - 02 Machine Learning (BCS602)
45 pages
ML PPT 2
No ratings yet
ML PPT 2
206 pages
AI&ML Module 2
No ratings yet
AI&ML Module 2
65 pages
Multivariate Statistics - An Introduction 8th Edition
100% (1)
Multivariate Statistics - An Introduction 8th Edition
202 pages
Unit 1 Ganeshk e
No ratings yet
Unit 1 Ganeshk e
24 pages
Module - 02 Machine Learning (BCS602)
No ratings yet
Module - 02 Machine Learning (BCS602)
31 pages
Bishop Solutions PDF
No ratings yet
Bishop Solutions PDF
87 pages
CS-601 Machine Learning Unit-1 New
No ratings yet
CS-601 Machine Learning Unit-1 New
70 pages
Mathematics For Machine Learning
No ratings yet
Mathematics For Machine Learning
134 pages
EDAN96 2024 Last Lecture-1
No ratings yet
EDAN96 2024 Last Lecture-1
78 pages
AIML Module4
No ratings yet
AIML Module4
44 pages
DL (Unit I)
No ratings yet
DL (Unit I)
25 pages
Maths$Stats NOTES
No ratings yet
Maths$Stats NOTES
50 pages
Machine - Learning - Chapter 1 and 2
No ratings yet
Machine - Learning - Chapter 1 and 2
70 pages
Mod2 Notes
No ratings yet
Mod2 Notes
72 pages
DLMDSAS01 - Advanced Statistics.
100% (1)
DLMDSAS01 - Advanced Statistics.
248 pages
Module 2 Notes Bcs602
No ratings yet
Module 2 Notes Bcs602
19 pages
Mod 2
No ratings yet
Mod 2
39 pages
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
No ratings yet
Week 6 v1.61 (Hidden) - Revision, CW1, and Probabilistic Graphical Models
65 pages
CH 2
No ratings yet
CH 2
121 pages
Module-2 Notes-Bcs602
No ratings yet
Module-2 Notes-Bcs602
18 pages
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
No ratings yet
Multivariate Analysis Lecture Notes For Stat 5353: J. D. Tubbs Department of Mathematical Sciences Fall Semester 2002
234 pages
Lecture 3 Introduction To Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction To Linear Algebra (Part 2)
57 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
AI Module4
No ratings yet
AI Module4
17 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
Emetnotes 1
No ratings yet
Emetnotes 1
67 pages
Block 1 ST3189
No ratings yet
Block 1 ST3189
2 pages
Linear Algebra
No ratings yet
Linear Algebra
21 pages
Unit 1
No ratings yet
Unit 1
21 pages
Statlearn PDF
No ratings yet
Statlearn PDF
123 pages
Multivariate Statistical Analysis: Old School
No ratings yet
Multivariate Statistical Analysis: Old School
319 pages
Machine Learning Handbook - Radivojac and White
No ratings yet
Machine Learning Handbook - Radivojac and White
108 pages
Materi 5 - 2
No ratings yet
Materi 5 - 2
25 pages
178 HW 9
No ratings yet
178 HW 9
153 pages
178 HW 6
No ratings yet
178 HW 6
125 pages
My Notes
No ratings yet
My Notes
15 pages
Math Review For ML
No ratings yet
Math Review For ML
41 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
The History of Volleyball Project
0% (1)
The History of Volleyball Project
7 pages
Preguntas y Respuestas The Canterville Ghost
100% (1)
Preguntas y Respuestas The Canterville Ghost
13 pages
Machine Learning: The Basics
No ratings yet
Machine Learning: The Basics
288 pages
Prob Toc
No ratings yet
Prob Toc
12 pages
Course Outline 2
No ratings yet
Course Outline 2
4 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
Pattern Classification
No ratings yet
Pattern Classification
41 pages
Disciple Making and Church Planting
100% (1)
Disciple Making and Church Planting
5 pages
Curs 1 SSL - Introduction
No ratings yet
Curs 1 SSL - Introduction
57 pages
Math Behind Machine Learning
No ratings yet
Math Behind Machine Learning
9 pages
Fundamentals of Modern Mathematics: A Practical Review
From Everand
Fundamentals of Modern Mathematics: A Practical Review
David B. MacNeil
No ratings yet
Loraine Boettner Marriage
No ratings yet
Loraine Boettner Marriage
17 pages
Guia Usuario Monitor CFX 750 Trimble - Uniport 2500
No ratings yet
Guia Usuario Monitor CFX 750 Trimble - Uniport 2500
184 pages
Java Fundamentals PDF
No ratings yet
Java Fundamentals PDF
106 pages
Agadu Du Du
No ratings yet
Agadu Du Du
15 pages
Processing - Options - For - Gold-Tellurides VIE 21 JUL 2017
No ratings yet
Processing - Options - For - Gold-Tellurides VIE 21 JUL 2017
9 pages
Gce Npv20n2.en
100% (1)
Gce Npv20n2.en
52 pages
Expository Writing Notes 2
No ratings yet
Expository Writing Notes 2
21 pages
The Art of Support
No ratings yet
The Art of Support
203 pages
Activity On Precis Writing
No ratings yet
Activity On Precis Writing
2 pages
Online Guest Room Booking System
No ratings yet
Online Guest Room Booking System
19 pages
Plate Buckling Slides
No ratings yet
Plate Buckling Slides
117 pages
Eco Chill Leaflet Final - Web - 20.02.2024
No ratings yet
Eco Chill Leaflet Final - Web - 20.02.2024
6 pages
Specimen MS - Paper 1H Edexcel Maths (A) IGCSE
No ratings yet
Specimen MS - Paper 1H Edexcel Maths (A) IGCSE
12 pages
Carens Brochure 2025 Mobile
No ratings yet
Carens Brochure 2025 Mobile
21 pages
WAD Final Report
No ratings yet
WAD Final Report
31 pages
Brandon Jones 2017
No ratings yet
Brandon Jones 2017
6 pages
Literature and Literary Criticism
No ratings yet
Literature and Literary Criticism
27 pages
SAPinsider 2019 Compendium BI-Analytics-HANA
No ratings yet
SAPinsider 2019 Compendium BI-Analytics-HANA
68 pages
SL-QMS-26 Outstation Audit Checklist
No ratings yet
SL-QMS-26 Outstation Audit Checklist
4 pages
E11-15 Standard Specification For Woven Wire Test Sieve Cloth and Test Sieves
No ratings yet
E11-15 Standard Specification For Woven Wire Test Sieve Cloth and Test Sieves
9 pages
2024 Exercise Allomorph Der Inf
No ratings yet
2024 Exercise Allomorph Der Inf
5 pages
Prepositions of Place and Movement in Spanish Verbs Table
No ratings yet
Prepositions of Place and Movement in Spanish Verbs Table
6 pages
KET Speaking TIPS
No ratings yet
KET Speaking TIPS
3 pages
PROPOSED - Date Sheet For Mid-Term Examination. March 2024
No ratings yet
PROPOSED - Date Sheet For Mid-Term Examination. March 2024
5 pages
Jasmine B Resume Revised
No ratings yet
Jasmine B Resume Revised
2 pages
Recommendation Forms
No ratings yet
Recommendation Forms
1 page
Research References Sample APA
No ratings yet
Research References Sample APA
6 pages

Module 2 Rnsit

Uploaded by

Module 2 Rnsit

Uploaded by

Module 2- Machine Learning (BCS602)

2.6.1 Bivariate Statistics

Deepa S, Asst. Professor, Dept of CSE, RNSIT 1

2.7 MULTIVARIATE STATISTICS

Deepa S, Asst. Professor, Dept of CSE, RNSIT 2

2.8 ESSENTIAL MATHEMATICS FOR MULTIVARIATE DATA

2.8.1 Linear Systems and Gaussian Elimination for Multivariate Data

Deepa S, Asst. Professor, Dept of CSE, RNSIT 3

5. Then, the remaining unknown variables can be found by back-substitution as:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 4

These concepts are illustrated in Example 2.8.

2.8.2 Matrix Decomposition

Deepa S, Asst. Professor, Dept of CSE, RNSIT 5

Introduction to Machine Learning and Probability/Statistics

• Importance: Machine learning relies heavily on statistics and probability to make

Deepa S, Asst. Professor, Dept of CSE, RNSIT 6

o Discrete Probability Distributions: For countable events (e.g., binomial, Poisson).

Continuous Probability Distributions

1. Normal Distribution (Gaussian Distribution)

• Shape: Bell curve, symmetric around the mean.

• Applications: Common in natural data (e.g., heights, exam scores).

• Definition: Equal probability for all outcomes within range [a,b].

Definition: Models time between events in a Poisson process

Discrete Probability Distributions

• Definition: For trials with two outcomes (success/failure).

Deepa S, Asst. Professor, Dept of CSE, RNSIT 7

• Definition: Models the number of events in a fixed interval of time.

• Definition: Models a single trial with two outcomes (success/failure).

• Goal: Estimate the probability density function (PDF) of data.

Parametric Density Estimation

1 Maximum Likelihood Estimation (MLE)

• Definition: A method for estimating the parameters of a distribution by maximizing the

Gaussian Mixture Model (GMM) and Expectation-Maximization (EM) Algorithm

• GMM: A probabilistic model assuming data is generated from a mixture of Gaussian

Deepa S, Asst. Professor, Dept of CSE, RNSIT 8

Non-Parametric Density Estimation Methods

2 k-Nearest Neighbors (KNN)

• Definition: Estimates density by considering the kkk closest neighbors.

2.10 FEATURE ENGINEERING AND DIMENSIONALITY REDUCTION TECHNIQUES

The features can be removed based on two aspects:

2.10.1 Stepwise Forward Selection

2.10.2 Stepwise Backward Elimination

Deepa S, Asst. Professor, Dept of CSE, RNSIT 9

2.10.3 Principal Component Analysis

The mean vector of the set of random vectors is defined as:

The PCA algorithm is as follows:

2.10.4 Linear Discriminant Analysis

Deepa S, Asst. Professor, Dept of CSE, RNSIT 10

The aim of LDA is to optimize the function:

2.10.5 Singular Value Decomposition

Based on the choice of retention, the compression can be controlled.

CHAPTER 3 - BASICS OF LEARNING THEORY

3.4 INTRODUCTION TO CONCEPT LEARNING

Deepa S, Asst. Professor, Dept of CSE, RNSIT 11

3.4.1 Representation of a Hypothesis

A hypothesis, denoted by h, is an approximation of the target function f. It represents the relationship

3.4.2 Hypothesis Space

3.4.3 Heuristic Space Search

Heuristic search is a search strategy that finds an optimized hypothesis/solution to a

3.4.4 Generalization and Specialization

Searching the Hypothesis Space

Deepa S, Asst. Professor, Dept of CSE, RNSIT 12

1. Specialization – General to Specific learning

Generalization – Specific to General Learning This learning methodology will search

Specialization – General to Specific Learning This learning methodology will search

3.4.5 Hypothesis Space Search by Find-S Algorithm

Limitations of Find-S Algorithm

3.4.6 Version Spaces

Deepa S, Asst. Professor, Dept of CSE, RNSIT 13

Candidate Elimination Algorithm

The diagrammatic representation of deriving the version space is shown below:

Deepa S, Asst. Professor, Dept of CSE, RNSIT 14

Deriving the Version Space

Deepa S, Asst. Professor, Dept of CSE, RNSIT 15

You might also like