0% found this document useful (0 votes)

4 views72 pages

Mod2 Notes (4)

Module 2 covers essential concepts in data analysis, focusing on bivariate and multivariate data, statistics, and feature engineering techniques. It discusses methods like covariance, correlation, and visualizations such as scatter plots and heatmaps to understand relationships between variables. Additionally, it introduces probability distributions and density estimation methods, emphasizing their applications in machine learning.

Uploaded by

aishwaryayadavise2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views72 pages

Mod2 Notes (4)

Uploaded by

aishwaryayadavise2022

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 72

Module2 notes

Prepared By
Indu K S
Asst Professor,Dept Of ISE,TOCE,Bangalore
Module2
• Understanding Data – 2: Bivariate Data and Multivariate
Data,
• Multivariate Statistics,
• Essential Mathematics for Multivariate Data,
• Feature Engineering and Dimensionality Reduction
Techniques.
• Basic Learning Theory: Design of Learning System,
• Introduction to Concept of Learning,
• Modelling in Machine Learning.

Chapter-2 (2.6-2.8, 2.10), Chapter-3 (3.3, 3.4, 3.6)

2.6 BIVARIATE DATA AND MULTIVARIATE DATA
• Bivariate Data involves two variables.
• Bivariate data deals with causes of relationships.
• The aim is to find relationships among data.
Consider the following Table 2.3, with data of the temperature in a shop and sales of
sweaters.
Scatter plot and line graphs are used to visualize bivariate data
Scatter plot
It is a 2D graph showing the relationship between two variables.
It is a plot between explanatory and response variables. The scatter plot (Refer
Figure 2.11) indicates strength, shape, direction and the presence of Outliers.
2.6.1 Bivariate Statistics
Covariance and Correlation are examples of bivariate statistics.

1. Covariance
Covariance is a measure of joint probability of random variables, say X and Y.
Generally, random variables are represented in capital letters.
It is defined as covariance(X, Y) or COV(X, Y) and is used to measure the
variance between two dimensions. The formula for finding co-variance for specific x
and y are:
2. Correlation
2.7 MULTIVARIATE STATISTICS
• In machine learning, almost all datasets are multivariable.
• Multivariate data is the analysis of more than two observable variables.
• The multivariate data is like bivariate data but may have more than two
dependent variables.
• Some of the multivariate analyses are regression analysis, principal
component analysis, and path analysis.

The mean of multivariate data is a mean vector, and the mean of the above three
attributes is given as (2, 7.5, 1.33).
The variance of multivariate data becomes the covariance matrix. The mean
vector is called centroid, and variance is called dispersion matrix.
Multivariate data has three or more variables.
The aim of the multivariate analysis is much more.
They are regression analysis, factor analysis, and multivariate analysis of variance.
Heatmap
• A heatmap is a graphical representation of data where individual values in a matrix are
represented as colors.
• It is commonly used in data science and machine learning to visualize correlations
between variables, feature importance, or distributions in datasets.
• Heatmap is a graphical representation of 2D matrix.
• A heatmap is like a table, but instead of numbers, we use colors to indicate values.
• It takes a matrix as input and colours it.
• The darker colours indicate very large values and lighter colours indicate smaller
values.
2.7 MULTIVARIATE STATISTICS
• In machine learning, almost all datasets are multivariable.
• Multivariate data is the analysis of more than two observable variables.
• The multivariate data is like bivariate data but may have more than two
dependent variables.
• Some of the multivariate analyses are regression analysis, principal
component analysis, and path analysis.

The mean of multivariate data is a mean vector, and the mean of the above three
attributes is given as (2, 7.5, 1.33).
The variance of multivariate data becomes the covariance matrix. The mean
vector is called centroid, and variance is called dispersion matrix.
Multivariate data has three or more variables.
The aim of the multivariate analysis is much more.
They are regression analysis, factor analysis, and multivariate analysis of variance.
Heatmap
• A heatmap is a graphical representation of data where individual values in a matrix are
represented as colors.
• It is commonly used in data science and machine learning to visualize correlations
between variables, feature importance, or distributions in datasets.
• Heatmap is a graphical representation of 2D matrix.
• A heatmap is like a table, but instead of numbers, we use colors to indicate values.
• It takes a matrix as input and colours it.
• The darker colours indicate very large values and lighter colours indicate smaller
values.
Understanding Correlation Heatmaps
A correlation heatmap is a special type of heatmap that visualizes the correlation between variables
in a dataset. It helps identify relationships between features.
Correlation values range from -1 to 1:
● +1 → Strong positive correlation (one increases, the other also increases).
● -1 → Strong negative correlation (one increases, the other decreases).
● 0 → No correlation.

Example Multivariate Dataset

Let's create a simple multivariate dataset (10 samples, 4 features) and compute the correlation
matrix. Then, we'll visualize it using a heatmap.
Interpretation of the Correlation Heatmap
● The correlation matrix shows how strongly each variable is related to the others.
● Dark red areas indicate a strong positive correlation, while dark blue areas indicate a strong negative
correlation.
● Key insights from this dataset:
○ Height & Weight have a strong positive correlation (~0.97), meaning taller people tend to weigh
more.
○ Age & Blood Pressure are also positively correlated (~0.98), suggesting blood pressure increases
with age.
○ Other relationships have moderate correlations.
• The advantage of this method is that humans perceive colours well.
• So, by colour shaping, larger values can be perceived well.
For example, in vehicle traffic data, heavy traffic regions can be differentiated from
low traffic regions through heatmap.
• In Figure 2.13, patient data highlighting weight and health status is plotted. Here, X-
axis is weights and Y-axis is patient counts. The dark colour regions highlight patients’
weights vs patient counts in health status.
Pairplot
• Pairplot or scatter matrix is a data visual technique for multivariate
data.
• A scatter matrix consists of several pair-wise scatter plots of
variables of the multivariate data.
• All the results are presented in a matrix format. By visual
examination of the chart, one can easily find relationships among
the variables such as correlation between the variables.
• A random matrix of three columns is chosen and the relationships
of the columns is plotted as a pairplot (or scattermatrix) as shown
below in Figure 2.14.
Another example
Interpretation of the Pairplot
● The scatterplots show the relationships between each pair of variables.
● The diagonal histograms show the distribution of each individual variable.
● Key observations:
○ Study Hours & Exam Score: Strong positive correlation (students who
study more tend to score higher).
○ Study Hours & Sleep Hours: Negative correlation (students who study
more sleep less).
○ Stress Level & Sleep Hours: Negative correlation (less sleep leads to higher
stress).
2.8 ESSENTIAL MATHEMATICS FOR MULTIVARIATE
DATA
Gaussian Elimination Method
Gaussian elimination is a systematic method for solving systems of linear equations by
transforming the augmented matrix into row echelon form (upper triangular form) using row
operations.
Probability Distributions
The probability distribution gives the possibility of each outcome of a random
experiment or event.
Example: Rolling a Die 🎲
If you roll a fair six-sided die, the possible outcomes are:
1,2,3,4,5,6
Probability=No.of ways an
event can occur /Total no
of possible outcomes
Exponential Distribution

It describes how long you have to wait for something to happen.

imagine you are waiting for a bus that arrives randomly every 10 minutes (on average).

● Sometimes, it comes earlier, sometimes later.

● The Exponential Distribution models how long you will wait
3.1.2 Discrete Probability Distributions
Binomial, Poisson, and Bernoulli distributions fall under this category
Binomial Distribution: models the number of times an event happens in a fixed number of independent trials,
where each trial has only two possible outcomes:

● Success (e.g., heads in a coin flip)

● Failure (e.g., tails in a coin flip)

Example: Imagine you flip a coin 5 times.

● The probability of getting heads (H) = 0.5

● The probability of getting tails (T) = 0.5
● The number of heads you get follows a Binomial Distribution.
Poisson Distribution
The Poisson Distribution is used to model the number of times an event occurs in a fixed time or
space interval, where events happen randomly and independently at a constant rate.
Example: Customers Arriving at a Bank
Bernoulli Distribution
The Bernoulli Distribution models a situation where there are only two possible outcomes:
1. Success (1) – The event happens (e.g., getting heads in a coin flip).
2. Failure (0) – The event does not happen (e.g., getting tails in a coin flip).
It is the simplest probability distribution and is used when an event happens only once.

Example: Flipping a Coin

Density Estimation
Density estimation is a way to figure out the shape of a dataset when we don’t
know its exact distribution. It helps us understand how data is spread out and can be
used to detect unusual (anomalous) points(like outliers).
Ex: Imagine you have a bag of marbles of different sizes and colors, but you don’t
know how many of each type exist in the bag.
● You take out a few marbles (observed data) and count them.
● Based on this sample, you estimate how many of each type exist in the whole
bag (density estimation).
Let there be a set of observed values x1,x2,…,xn from a larger set of data whose
distribution is not known.
Density estimation is the problem of estimating the density function from an
observed data.
How Does It Work?
1. We collect observed values (e.g., heights of people, temperatures, sales data).
2. We estimate the density function, which tells us how frequently different values appear.
3. We use this function to check new data points:
○ If a new point fits well within the estimated density, it is normal.
○ If it is far from the expected density, it is an anomaly (outlier).
The estimated density function, denoted as p(x), can be used to value directly for any unknown
data, say xt as p(xt).
If its value is less than ϵ, then xt is not an outlier or anomaly data. Else, it is categorized as an
anomaly data.
There are two types of density estimation methods:
• Parametric density estimation and
• Non-parametric density estimation.
1. Parametric Density Estimation
Parametric density estimation assumes that the data follows a known probability
distribution (such as Normal, Exponential, or Poisson) and estimates the parameters of
that distribution.
The probability density function (PDF) is represented as: p(x∣Θ)
where Θ represents the set of parameters of the chosen distribution.
Maximum Likelihood Estimation(MLE)
MLE is a method used in statistics and machine learning to find the best parameters
(like mean and variance) for a probability distribution that best explains the given data.
Ex: Imagine you are a detective trying to figure out which suspect committed a crime.
You analyze clues (data) and try to find the most likely suspect (parameters).
Similarly, in MLE, we analyze data and try to find the most likely values for
parameters of a probability distribution.
How Does MLE Work?
1. Start with a probability distribution

○ We assume the data follows a known distribution, like Normal (Gaussian),

Binomial, or Exponential.
2. Define the Likelihood Function

○ This function tells us how likely it is to see the observed data given some
parameter values.
3. Maximize the Likelihood

○ We adjust the parameters so that the likelihood of seeing our data is as high as
possible.
○ This gives us the best estimate of the parameters.
Maximum Likelihood Estimation (MLE) is a method used in parametric density
estimation to estimate the parameters of a probability distribution by maximizing the
likelihood function. It aims to find the parameter values that make the observed data
most probable.
In parametric density estimation, we assume that the data follows a specific probability
distribution (e.g., Normal, Exponential, Binomial) with unknown parameters. MLE helps
us determine these parameters.
The relevance of this theory of MLE for machine learning is that
MLE can solve the problem of predictive modelling in
machine learning.
If one assumes that the regression problem can be framed as
predicting output y given input x, then for p(y∣x), the MLE
framework can be applied as: max∑log(y∣xi,h).
Gaussian Mixture Model and Expectation-Maximization (EM) Algorithm

In machine learning, clustering is one of the important tasks. MLE framework is quite useful for
designing model-based methods for clustering data.
1. Gaussian Mixture Model (GMM)
A Gaussian Mixture Model (GMM) is a soft clustering algorithm that assumes data points
come from multiple Gaussian (bell-shaped) distributions. Instead of assigning each point to
one specific cluster (like K-Means), GMM gives a probability score for a data point belonging
to different clusters.
Example:
Imagine you have height and weight data for people from three different countries, but you
don’t know which country each person belongs to.
● A GMM would assume the data comes from three different Gaussian distributions.
● Instead of saying "this person is definitely from country A", GMM will say "this person
has a 70% chance of being from country A, 20% from B, and 10% from C."
2. Expectation-Maximization (EM) Algorithm
The EM algorithm is one algorithm that is commonly used for estimating the
MLE in the presence of latent or missing variables.

The EM algorithm is a smart way for a computer to learn hidden patterns

in data, especially when some information is missing or unknown. It helps
us estimate the best parameters for a model when direct calculation is
difficult.
Ex1: Imagine you are in a dark room with different objects, but you can’t
see them clearly. You try to guess what they are by touching them
(Expectation step), then use that guess to improve your understanding
(Maximization step). You keep repeating this until you’re confident about
what’s in the room.
Ex2: Guessing Student Heights in a Classroom
● Suppose you have a group of students, but you only know their weights, not
their heights.
● The EM algorithm first guesses heights based on weight (E-step).
● Then, it adjusts the guessed heights to make them fit the data better (M-step).
● It keeps improving these guesses until the best estimates are found.
Generally, there can be many unspecified distributions with different set
of parameters. The EM algorithm has two stages:
Expectation (E) Stage – Guess the missing or hidden values based on
current estimates. In this stage, the expected PDF and its parameters are
estimated for each latent variable.

Maximization (M) Stage – Update the model’s parameters (like mean

and variance) to fit the guessed data better. In this, the parameters are
optimized using the MLE function.
This process is iterative, and the iteration is continued till all the latent
variables are fitted by probability distributions effectively along with the
parameters.
2. Non-parametric Density Estimation
Non-parametric density estimation is a way to figure out how data is
distributed without assuming a specific shape (like a normal or binomial
distribution). It lets the data itself decide the shape.
Ex: Imagine you are a chef trying to guess the most popular dish in a
restaurant. Instead of assuming a fixed menu, you observe what people
order and count the most frequent dishes. Over time, you get an estimate of
what people prefer.
Similarly, in non-parametric density estimation, we observe data points
and estimate how they are spread out—without assuming a specific shape.
Parzen Window
The Parzen Window is a non-parametric method used to estimate the probability
density function (PDF) of a dataset. It helps us understand how data is distributed without
assuming a fixed shape (like normal or exponential distributions).
KNN Estimation
• The KNN estimation is another non-parametric density estimation method.
• Here, the initial parameter k is determined and based on that k-neighbours are
determined.
• The probability density function estimate is the average of the values that are
returned by the neighbours.
2.10 FEATURE ENGINEERING AND DIMENSIONALITY
REDUCTION TECHNIQUES
Features are attributes. Feature engineering is about determining the subset
of features that form an important part of the input that improves the
performance of the model in machine learning.
Feature engineering deals with two problems –
1.Feature Transformation and
2.Feature Selection.
Feature transformation: is extraction of features and creating new features
that may be helpful in increasing performance.
Ex: The height and weight may give a new attribute called Body Mass Index
(BMI).
Feature subset selection: is another important aspect of feature engineering that focuses
on selection of features to reduce the time but not at the cost of reliability.
The subset selection reduces the dataset size by removing irrelevant features and
constructs a minimum set of attributes for machine learning.
• If the dataset has n attributes, then time complexity is extremely high as n dimensions
need to be processed for the given dataset.
• For n attributes, there are 2ⁿ possible subsets.
• If the value of n is high, the problem becomes intractable(difficult). This is called
‘curse of dimensionality’.
• Since, as the number of dimensions increases, the time complexity increases. The
remedy is that some of the components that do not contribute much can be deleted.
• This results in the reduction of dimensionality.
• Typically, the feature subset selection problem uses greedy approach by looking for
the best choice at the time using locally optimal choice while hoping that it would lead
to global optimal solutions.
The features can be removed based on two aspects:
1. Feature relevancy: Some features contribute more for classification than other
features. Ex: a mole on the face can help in face detection more than common
features like the nose.
In simple words, the features should be relevant. The relevancy of the features can
be determined based on information measures such as mutual information,
correlation-based features like correlation coefficient and distance measures.

1. Feature redundancy: Some features are redundant.

Ex: when a database table has a field called Date of birth, then age field is not
relevant as age can be computed easily from date of birth. This helps in removing
the column age that leads to reduction of dimension one.
So, the procedure is:
1. Generate all possible subsets
2. Evaluate the subsets and model performance
3. Evaluate the results for optimal feature selection
Filter-based selection:
Uses statistical measures for assessing features. In this approach, no learning
algorithm is used. Correlation and information gain measures like mutual
information and entropy are all examples of this approach.
Wrapper-based methods:
Use classifiers to identify the best features. These are selected and evaluated
by the learning algorithms. This procedure is computationally intensive but
has superior performance.
2.10.1 Stepwise Forward Selection
This procedure starts with an empty set of attributes. Every time, an
attribute is tested for statistical significance for best quality and is added to
the reduced set. This process is continued till a good reduced set of attributes
is obtained.
2.10.2 Stepwise Backward Elimination
This procedure starts with a complete set of attributes. At every stage, the
procedure removes the worst attribute from the set, leading to the reduced
set.
Combined Approach Both forward and reverse methods can be combined
so that the procedure can add the best attribute and remove the worst
Eigenvalues and Eigenvectors
Eigenvalues and eigenvectors are fundamental concepts in linear algebra, widely used in machine
learning, PCA, computer vision, and deep learning.
Ex: Imagine you have a rubber sheet (a 2D surface), and you stretch or shrink it in different directions.
● Eigenvectors are the special directions in which the stretching happens.
● Eigenvalues tell you how much the stretching (or shrinking) happens in those directions.

Formal Definition:
For a square matrix A, an eigenvector v and its corresponding eigenvalue λ satisfy:

Av=λv
This means:
● When you multiply matrix A with vector v, it only scales the vector (does not change its
direction).
● The scaling factor is the eigenvalue λ.
2.10.3 Principal Component Analysis
PCA (Principal Component Analysis) is a method used in machine learning and statistics to
reduce the number of variables in a dataset while keeping the most important information.
This leads to a reduced and compact set of features. Basically, this elimination is made
possible because of the information redundancies. This compact representation is of a reduced
dimension.
Ex: Imagine you have a big collection of books and want to organize them efficiently.
● Instead of sorting them by every small detail (title, author, genre, year, pages, price,
etc.),
● You pick only the most important factors (genre and author) to classify them.
PCA does something similar—it reduces the number of features (variables) in a dataset but
keeps the most important patterns.
The PCA algorithm is as follows:

1. The target dataset x is obtained.

2. The mean is subtracted from the dataset. Let the mean be mmm. Thus, the adjusted dataset is X−m. The objective
of this process is to transform the dataset with zero mean.

3. The covariance of dataset x is obtained. Let it be C.

4. Eigenvalues and eigenvectors of the covariance matrix are calculated.

5. The eigenvector of the highest eigenvalue is the principal component of the dataset. The eigenvalues are arranged
in a descending order. The feature vector is formed with these eigenvectors in its columns.

Feature vector = {eigen vector 1,eigen vector 2,...,eigen vector n}

6. Obtain the transpose of the feature vector. Let it be A

7. PCA transform is y=A×(x−m), where x is the input dataset, m is the mean, and A is the transpose of the feature
vector.

The original data can be retrieved using the formula given below:

Exam Pa Note
No ratings yet
Exam Pa Note
73 pages
Module 2 ML Chapter2
No ratings yet
Module 2 ML Chapter2
64 pages
AIML Module - 4
No ratings yet
AIML Module - 4
25 pages
Data Science Presentation
100% (3)
Data Science Presentation
113 pages
MODULE_2
No ratings yet
MODULE_2
107 pages
Module -02 Machine Learning(BCS602)
No ratings yet
Module -02 Machine Learning(BCS602)
45 pages
Module -02 Machine Learning(BCS602)
No ratings yet
Module -02 Machine Learning(BCS602)
31 pages
AI&ML Module 2
No ratings yet
AI&ML Module 2
65 pages
Unit 3 DS
No ratings yet
Unit 3 DS
30 pages
Data Mining: Data Exploration: - Chapter 6
No ratings yet
Data Mining: Data Exploration: - Chapter 6
56 pages
Econ1203 Notes
67% (3)
Econ1203 Notes
35 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
89 pages
Data Analytics Summary
No ratings yet
Data Analytics Summary
80 pages
chapter2-statistical analysis
No ratings yet
chapter2-statistical analysis
86 pages
Lecture 1 Exploratory Data Analysis
No ratings yet
Lecture 1 Exploratory Data Analysis
41 pages
ML Module-02
No ratings yet
ML Module-02
37 pages
Multivariate Data Analysis in R PDF
No ratings yet
Multivariate Data Analysis in R PDF
400 pages
ML Module 02
No ratings yet
ML Module 02
37 pages
Lec 3 and 2 After Mid
No ratings yet
Lec 3 and 2 After Mid
15 pages
Exploratory Data Analysis_v3_part1
No ratings yet
Exploratory Data Analysis_v3_part1
36 pages
Week2 Modified
No ratings yet
Week2 Modified
43 pages
Unit .......
No ratings yet
Unit .......
45 pages
Data Visualization
No ratings yet
Data Visualization
37 pages
Chapter 2 - Understand Data
No ratings yet
Chapter 2 - Understand Data
63 pages
02 Data
No ratings yet
02 Data
41 pages
Getting To Know Your Data
No ratings yet
Getting To Know Your Data
78 pages
Day 01-Basic Statistics
No ratings yet
Day 01-Basic Statistics
36 pages
textbook ML_removed (1)
No ratings yet
textbook ML_removed (1)
22 pages
IT326 - Ch2
No ratings yet
IT326 - Ch2
44 pages
Iba Unit - Ii
No ratings yet
Iba Unit - Ii
31 pages
AIBIA 2 - Basic Statistical Concepts
No ratings yet
AIBIA 2 - Basic Statistical Concepts
24 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
U 3
No ratings yet
U 3
22 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
54 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
ML. MODEL 2
100% (1)
ML. MODEL 2
31 pages
AS Level Mathematics Statistics (New)
No ratings yet
AS Level Mathematics Statistics (New)
49 pages
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
No ratings yet
7CCMMS61 Statistics For Data Analysis: Francisco Javier Rubio Department of Mathematics
13 pages
02 Data
No ratings yet
02 Data
62 pages
3-Data Description
No ratings yet
3-Data Description
91 pages
VIPDMTheoryChapter2
No ratings yet
VIPDMTheoryChapter2
56 pages
Notes: Section 1: Exploratory Data Analysis
No ratings yet
Notes: Section 1: Exploratory Data Analysis
6 pages
Data Analysis With Mplus
No ratings yet
Data Analysis With Mplus
321 pages
RM-EBBA-class-8-CH0-11-Quatitative-analysis
No ratings yet
RM-EBBA-class-8-CH0-11-Quatitative-analysis
37 pages
IOT-Domain Analyst
No ratings yet
IOT-Domain Analyst
68 pages
data mining 2
No ratings yet
data mining 2
64 pages
Chapter 2
No ratings yet
Chapter 2
53 pages
The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests
No ratings yet
The Data Analyst's Guide To Data Types, Distributions, and Statistical Tests
38 pages
Module 01 - STAT 101
No ratings yet
Module 01 - STAT 101
23 pages
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Machine Learning (1) : Inteligência Artificial E Cibersegurança (Inacs)
33 pages
7u7 PDF
No ratings yet
7u7 PDF
31 pages
02Data Edited v2
No ratings yet
02Data Edited v2
43 pages
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
100% (6)
(eBook PDF) Introduction to Econometrics, 4th Global Edition instant download
57 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
29 pages
Concepts and Techniques: - Chapter 2
No ratings yet
Concepts and Techniques: - Chapter 2
65 pages
Food Module 4
No ratings yet
Food Module 4
30 pages
ES031 M1 DataCollection&Presentation
No ratings yet
ES031 M1 DataCollection&Presentation
64 pages
Business Club: Basic Statistics
No ratings yet
Business Club: Basic Statistics
26 pages
Module 4_chapter 2
No ratings yet
Module 4_chapter 2
14 pages
Assignment 3
100% (3)
Assignment 3
3 pages
Session Commands
No ratings yet
Session Commands
1,046 pages
Food_module_3_
No ratings yet
Food_module_3_
44 pages
Food Module 5
No ratings yet
Food Module 5
21 pages
Ethics and Morality
No ratings yet
Ethics and Morality
12 pages
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
No ratings yet
12-Exploratory Data Analysis, Anomaly Detection-28!03!2023
79 pages
Education Inequality
No ratings yet
Education Inequality
6 pages
CT3 Past Exams 2005 - 2009
No ratings yet
CT3 Past Exams 2005 - 2009
175 pages
cureus-0014-00000030095
No ratings yet
cureus-0014-00000030095
4 pages
Linear Regression Basics QUIZS
No ratings yet
Linear Regression Basics QUIZS
13 pages
Descriptive HEHSzjjwiebcdxme
No ratings yet
Descriptive HEHSzjjwiebcdxme
8 pages
15 Types of Regression You Should Know
No ratings yet
15 Types of Regression You Should Know
30 pages
Exploratory Factor Analysis
No ratings yet
Exploratory Factor Analysis
54 pages
Invited Review Recursive Models in Animal Breeding
No ratings yet
Invited Review Recursive Models in Animal Breeding
15 pages
Econometrics Chapter Two
No ratings yet
Econometrics Chapter Two
36 pages
QR-I(1)
No ratings yet
QR-I(1)
3 pages
The Effects of Business Digitalization and Knowledge Management P
No ratings yet
The Effects of Business Digitalization and Knowledge Management P
18 pages
Chapter 5: Continuous Probability Distributions: Department of Mathematics Izmir University of Economics
No ratings yet
Chapter 5: Continuous Probability Distributions: Department of Mathematics Izmir University of Economics
42 pages
Notes On Mathematical Expectation
No ratings yet
Notes On Mathematical Expectation
6 pages
Assignment 2 570 Hien
No ratings yet
Assignment 2 570 Hien
37 pages
Devore Ch. 1 Navidi Ch. 1
No ratings yet
Devore Ch. 1 Navidi Ch. 1
16 pages
Ibm Spss Statistics & Probability Distribution
No ratings yet
Ibm Spss Statistics & Probability Distribution
25 pages
Objectives: - by The End of This Topic Students Will
No ratings yet
Objectives: - by The End of This Topic Students Will
17 pages
Summary of Probability 2 1
No ratings yet
Summary of Probability 2 1
3 pages
Sampling and Sampling Distribution: WK15-LAS2-SAP-II-11
No ratings yet
Sampling and Sampling Distribution: WK15-LAS2-SAP-II-11
10 pages
QP, P & S (Cse & It), Nov 10
No ratings yet
QP, P & S (Cse & It), Nov 10
8 pages
Tutorial Cluster RCT No Stata
No ratings yet
Tutorial Cluster RCT No Stata
24 pages
Quantitative Techniques - EPGP-15 - Course Outline
No ratings yet
Quantitative Techniques - EPGP-15 - Course Outline
4 pages
Machine Learning Yarning - Andrew NG - 23 To 27
50% (2)
Machine Learning Yarning - Andrew NG - 23 To 27
8 pages
PHILOSOPHY
No ratings yet
PHILOSOPHY
7 pages
Psychological Distress Among Parents of Children With Mental Retardation in The United Arab Emirates
No ratings yet
Psychological Distress Among Parents of Children With Mental Retardation in The United Arab Emirates
8 pages
McNemara Test
No ratings yet
McNemara Test
11 pages
What Is The Significance of Research in Social and Business Sciences?
No ratings yet
What Is The Significance of Research in Social and Business Sciences?
7 pages
Co-Clustering: Models, Algorithms and Applications
From Everand
Co-Clustering: Models, Algorithms and Applications
Gérard Govaert
No ratings yet
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
From Everand
De-Mystifying Math and Stats for Machine Learning: Mastering the Fundamentals of Mathematics and Statistics for Machine Learning
Seaport AI Madhavan
No ratings yet

Mod2 Notes (4)

Uploaded by

Mod2 Notes (4)

Uploaded by

Module2 notes

Chapter-2 (2.6-2.8, 2.10), Chapter-3 (3.3, 3.4, 3.6)

Example Multivariate Dataset

It describes how long you have to wait for something to happen.

● Sometimes, it comes earlier, sometimes later.

● Success (e.g., heads in a coin flip)

Example: Imagine you flip a coin 5 times.

● The probability of getting heads (H) = 0.5

Example: Flipping a Coin

○ We assume the data follows a known distribution, like Normal (Gaussian),

The EM algorithm is a smart way for a computer to learn hidden patterns

Maximization (M) Stage – Update the model’s parameters (like mean

1. Feature redundancy: Some features are redundant.

1. The target dataset x is obtained.

3. The covariance of dataset x is obtained. Let it be C.

4. Eigenvalues and eigenvectors of the covariance matrix are calculated.

Feature vector = {eigen vector 1,eigen vector 2,...,eigen vector n}

You might also like