0% found this document useful (0 votes)

79 views35 pages

EE2211 Lecture 3

Uploaded by

Tze Long Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views35 pages

EE2211 Lecture 3

Uploaded by

Tze Long Gan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

EE2211 Introduction to Machine

Learning
Lecture 3
Semester 1
2020/2021

Li Haizhou ([email protected])

Electrical and Computer Engineering Department

National University of Singapore

Acknowledgement:
EE2211 development team
(Thomas, Kar-Ann, Chen Khong, Helen, Robby and Haizhou

© Copyright EE, NUS. All Rights Reserved.

Course Contents
• Introduction and Preliminaries (Haizhou)
– Introduction
– Data Engineering
– Introduction to Probability and Statistics
• Fundamental Machine Learning Algorithms I (Kar-Ann / Helen)
– Systems of linear equations
– Least squares, Linear regression
– Ridge regression, Polynomial regression
• Fundamental Machine Learning Algorithms II (Thomas)
– Over-fitting, bias/variance trade-off
– Optimization, Gradient descent
– Decision Trees, Random Forest
• Performance and More Algorithms (Haizhou)
– Performance Issues
– K-means Clustering
– Neural Networks

EE2211 Introduction to Machine Learning 2

© Copyright EE, NUS. All Rights Reserved.
Introduction to
Probability and Statistics
Module I Contents
• What is Machine Learning and Types of Learning
• How Supervised Learning works
• Regression and Classification Tasks
• Induction versus Deduction Reasoning
• Types of data
• Data wrangling and cleaning
• Data integrity and visualization
• Causality and Simpson’s paradox
• Random variable, Bayes’ rule
• Parameter estimation
• Parametric vs Non-Parametric machine learning

3
© Copyright EE, NUS. All Rights Reserved.
Causality
What is statistical causality or causation?
• In statistics, causation means that one thing will cause
the other, which is why it is also referred to as cause and
effect.
• The gold standard for causal data analysis is to combine
specific experimental designs such as randomized
studies with standard statistical analysis techniques.

• Causality creep is the idea that causal language is

often used to describe inferential or predictive analyses.

EE2211 Introduction to Machine Learning 4

© Copyright EE, NUS. All Rights Reserved.
Correlation
• In statistics, correlation is any statistical relationship,
whether causal or not, between two random variables.
• Correlations are useful because they can indicate a
predictive relationship that can be exploited in practice.

https://www.geo.fu-berlin.de/en/v/soga/Basics-of-statistics/Descriptive-Statistics/Measures-of-Relation-Between-Variables/Correlation/index.html

EE2211 Introduction to Machine Learning 5

© Copyright EE, NUS. All Rights Reserved.
Correlation does not imply causation
• Most data analyses involve inference or prediction.
• Unless a randomized study is performed, it is difficult to infer why
there is a relationship between two variables.
• Some great examples of correlations that can be calculated but are
clearly not causally related appear at http://tylervigen.com/
(See figure below).

EE2211 Introduction to Machine Learning 6

© Copyright EE, NUS. All Rights Reserved.
Example
– Decades of data show a clear causal relationship between smoking
and cancer.
– If you smoke, it is a sure thing that your risk of cancer will increase.
– But it is not a sure thing that you will get cancer.
– The causal effect is real, but it is an effect on your average risk.

EE2211 Introduction to Machine Learning 7

© Copyright EE, NUS. All Rights Reserved.
https://larspsyll.files.wordpress.com/2013/07/causation.jpg?w=538&h=416

EE2211 Introduction to Machine Learning 8

© Copyright EE, NUS. All Rights Reserved.
Caution
• Particular caution should be used when applying words
such as “cause” and “effect” when performing inferential
analysis.
• Causal language applied to even clearly labeled
inferential analyses may lead to misinterpretation - a
phenomenon called causation creep.

EE2211 Introduction to Machine Learning 9

© Copyright EE, NUS. All Rights Reserved.
Simpson’s paradox
Simpson's paradox is a phenomenon in probability and statistics, in
which a trend appears in several different groups of data but
disappears or reverses when these groups are combined.

https://en.wikipedia.org/wiki/Simpson%27s_paradox

EE2211 Introduction to Machine Learning 10

Ref: Gardener, Martin (March 1979). "MATHEMATICAL GAMES: On the fabric of inductive logic, and some probability paradoxes" (PDF). Scientific American. 234

EE2211 Introduction to Machine Learning 11

• We describe a random experiment by describing its

procedure and observations of its outcomes.

• Outcomes are mutual exclusive in the sense that only one

outcome occurs in a specific trial of the random
experiment. This also means an outcome is not
decomposable. All unique outcomes form a sample
space.

• A subset of sample space 𝑆𝑆, denote as 𝐴𝐴, is an event in a

random experiment 𝐴𝐴 ⊆ 𝑆𝑆, that is meaningful in

EE2211 Introduction to Machine Learning 12

Assuming events 𝐴𝐴 ⊆ 𝑆𝑆 and 𝐵𝐵 ⊆ 𝑆𝑆, the probabilities of

events related with and must satisfy,

1. 𝑃𝑃𝑟𝑟 𝐴𝐴 > 0
2. 𝑃𝑃𝑃𝑃 𝑆𝑆 = 1
3. If 𝐴𝐴 ∩ 𝐵𝐵 = ∅ , then 𝑃𝑃𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃𝑃𝑃 𝐴𝐴 + 𝑃𝑃𝑃𝑃 𝐵𝐵
*otherwise, 𝑃𝑃𝑃𝑃 𝐴𝐴 ∪ 𝐵𝐵 = 𝑃𝑃𝑃𝑃 𝐴𝐴 + 𝑃𝑃𝑃𝑃 𝐵𝐵 − 𝑃𝑃𝑃𝑃(𝐴𝐴 ∩ 𝐵𝐵)

EE2211 Introduction to Machine Learning 13

© Copyright EE, NUS. All Rights Reserved.
Random Variable
• A random variable, usually written as an italic capital
letter, like X, is a variable whose possible values are
numerical outcomes of a random phenomenon.
– Examples of random phenomena with a numerical outcome include
a toss of a coin (0 for heads and 1 for tails), a roll of a dice, or the
height of the first stranger you meet outside.
• There are two types of random variables:
– discrete and continuous.

s
R
X(s)
EE2211 Introduction to Machine Learning 14
© Copyright EE, NUS. All Rights Reserved.
Notations
• Some books used P(·) and p(·) to distinguish between the
probability of discrete random variable and the probability
of continuous random variables respectively.

• We shall use Pr(·) for both the above cases.

EE2211 Introduction to Machine Learning 15

© Copyright EE, NUS. All Rights Reserved.
Discrete random variable
• A discrete random variable (DRV) takes on only a
countable number of distinct values such as red, yellow,
blue or 1, 2, 3, . . ..
• The probability distribution of a discrete random variable is
described by a list of probabilities associated with each of its possible
values.
- This list of probabilities is called a
probability mass function (pmf).
(Like a histogram, except that here
the probabilities sum to 1)

A probability mass function

EE2211 Introduction to Machine Learning 16
© Copyright EE, NUS. All Rights Reserved.
• Let a discrete random variable X have k possible values
𝑥𝑥𝑖𝑖 𝑘𝑘𝑖𝑖=1 .
• The expectation of X denoted as 𝐸𝐸 𝑥𝑥 is given by,
𝑘𝑘
𝐸𝐸 𝑥𝑥 ≝ �𝑖𝑖=1 𝑥𝑥𝑖𝑖 · Pr(𝑋𝑋 = 𝑥𝑥𝑖𝑖 )
= 𝑥𝑥1 · Pr(𝑋𝑋 = 𝑥𝑥1 ) + 𝑥𝑥2 · Pr(𝑋𝑋 = 𝑥𝑥2 ) + ··· + 𝑥𝑥𝑘𝑘 · Pr(𝑋𝑋 = 𝑥𝑥𝑘𝑘 )
where Pr(𝑋𝑋 = 𝑥𝑥𝑖𝑖 ) is the probability that X has the value 𝑥𝑥𝑖𝑖
according to the pmf.
• The expectation of a random variable is also called the
mean, average or expected value and is frequently
denoted with the letter μ.
• The expectation is one of the most important statistics of a
random variable.

EE2211 Introduction to Machine Learning 17

© Copyright EE, NUS. All Rights Reserved.
• Another important statistic is the standard deviation,
defined as,
σ ≝ 𝐸𝐸 (𝑋𝑋 − 𝜇𝜇)2 .
• Variance, denoted as 𝜎𝜎 2 or var(X), is defined as,
𝜎𝜎 2 = 𝐸𝐸 (𝑋𝑋 − 𝜇𝜇)2
• For a discrete random variable, the standard deviation is
given by
σ = Pr(𝑋𝑋 = 𝑥𝑥1 )(𝑥𝑥1 − 𝜇𝜇)2 + ⋯ + Pr(𝑋𝑋 = 𝑥𝑥𝑘𝑘 )(𝑥𝑥𝑘𝑘 − 𝜇𝜇)2
where 𝜇𝜇 = 𝐸𝐸(𝑋𝑋).

EE2211 Introduction to Machine Learning 18

© Copyright EE, NUS. All Rights Reserved.
Continuous random variable
• A continuous random variable (CRV) takes an infinite
number of possible values in some interval.
– Examples include height, weight, and time.
– Because the number of values of a continuous random variable X
is infinite, the probability Pr(X = c) for any c is 0.
– Therefore, instead of the list of probabilities, the probability
distribution of a CRV (a continuous probability distribution) is
described by a probability density function (pdf).
– The pdf is a function whose codomain
is nonnegative and the area under the
curve is equal to 1.

A probability density function

EE2211 Introduction to Machine Learning 19

© Copyright EE, NUS. All Rights Reserved.
• The expectation of a continuous random variable 𝑋𝑋 is given
by 𝐸𝐸 𝑥𝑥 ≝ ∫𝑅𝑅 𝑥𝑥 𝑓𝑓𝑋𝑋 𝑥𝑥 d𝑥𝑥
where 𝑓𝑓𝑋𝑋 is the pdf of the variable 𝑋𝑋 and ∫𝑅𝑅 is the
integral of function 𝑥𝑥 𝑓𝑓𝑋𝑋 .
• The variance of a continuous random variable 𝑋𝑋 is given
by 𝜎𝜎 2 ≝ ∫𝑅𝑅(𝑋𝑋 − 𝜇𝜇)2 𝑓𝑓𝑋𝑋 𝑥𝑥 d𝑥𝑥

• Integral is an equivalent of the summation over all values of

the function when the function has a continuous domain.
• It equals the area under the curve of the function.
• The property of the pdf that the area under its curve is 1
mathematically means that ∫𝑅𝑅 𝑓𝑓𝑋𝑋 𝑥𝑥 d𝑥𝑥 = 1

EE2211 Introduction to Machine Learning 20

95%
90%

𝜇𝜇

𝑥𝑥1

EE2211 Introduction to Machine Learning 21

© Copyright EE, NUS. All Rights Reserved.
Example 1
Independent random variables
• Consider tossing a coin twice, what is the probability of
having (H,H)?
Pr(x=H, y=H) = Pr(x=H)Pr(y=H)
= (1/2)(1/2) = 1/4

Slides courtesy: Professor Robby Tan

EE2211 Introduction to Machine Learning 22

© Copyright EE, NUS. All Rights Reserved.
Example 2
Dependent random variables
• Given 2 balls with different colors (Red and Black), what
is the probability of having B and then R?
The space of outcomes of taking two balls
sequentially without replacement:
B–R
R–B Thus having B-R is ½ .
Mathematically:
Pr(x=B, y=R) = Pr(y=R | x=B) Pr(x=B)
= 1×(1/2) = 1/2
Since we are given the first pick was B, and thus we
know the probability of the remaining ball to be R is 1. Slides courtesy: Professor Robby Tan

EE2211 Introduction to Machine Learning 23

© Copyright EE, NUS. All Rights Reserved.
Example 3
Dependent random variables
• Given 3 balls with different colors (R,G,B), what is the
probability of having B and then G?
The space of outcomes of taking two balls
sequentially without replacement:
R–G|G–B|B–R
R – B | G – R | B – G Thus Pr(y=G, x=B) = 1/6 .
Mathematically:
Pr(y=G, x=B) = Pr(y=G | x=B) Pr(x=B)
= (1/2) × (1/3)
= 1/6
Given that the first pick is B, then the remaining balls are
G and R, and thus the chance of picking up G is ½. Slides courtesy: Professor Robby Tan

EE2211 Introduction to Machine Learning 24

• The conditional probability Pr(𝑌𝑌 = 𝑦𝑦|𝑋𝑋 = 𝑥𝑥) is the

probability of the random variable 𝑌𝑌 to have a specific
value 𝑦𝑦 given that another random variable 𝑋𝑋 has a
specific value of 𝑥𝑥.
• The Bayes’ Rule (also known as the Bayes’ Theorem)
stipulates that:
Pr 𝑋𝑋 = 𝑥𝑥 𝑌𝑌 = 𝑦𝑦 Pr(𝑌𝑌=𝑦𝑦)
Pr 𝑌𝑌 = 𝑦𝑦 𝑋𝑋 = 𝑥𝑥 = (1)
Pr(𝑋𝑋=𝑥𝑥)

𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 × 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
EE2211 Introduction to Machine Learning 25
© Copyright EE, NUS. All Rights Reserved.
Bayes’ Rule
Likelihood – propensity for
Prior – what we know about
observing a certain value of 𝑥𝑥
y BEFORE seeing 𝑥𝑥
given a certain value of 𝑦𝑦

Pr(𝑦𝑦) Pr 𝑥𝑥 𝑦𝑦 Pr(𝑦𝑦) Pr 𝑥𝑥 𝑦𝑦
Pr 𝑦𝑦 𝑥𝑥 = =
Pr(𝑥𝑥) 𝛴𝛴𝑦𝑦 Pr(𝑦𝑦) Pr 𝑥𝑥 𝑦𝑦

Evidence – a constant to
Posterior – what we know ensure that the left hand side
about y AFTER seeing 𝑥𝑥 is a valid distribution

Adapted from S. Prince

© Copyright EE, NUS. All Rights Reserved.
Parameter Estimation
• Bayes’ rule comes in handy when we have a model of
𝑋𝑋’s distribution, e.g., to model the term Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃�
with a normal distribution 𝑁𝑁(𝜇𝜇, 𝜎𝜎2) ,
− 𝑥𝑥−𝜇𝜇 2
1
𝑓𝑓𝜃𝜃� 𝑥𝑥 = Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ = 𝑒𝑒 2𝜎𝜎2 (2)
2𝜋𝜋𝜎𝜎 2

where 𝜃𝜃� ≝ [𝜇𝜇, σ] is the parameter vector

and 𝜋𝜋 is the constant (3.14159…).

EE2211 Introduction to Machine Learning 27

© Copyright EE, NUS. All Rights Reserved.
Parameter Estimation
• Suppose Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃� ≝ 𝑓𝑓𝜃𝜃� 𝑥𝑥 of equation (2), we
can update the values of parameters in the vector 𝜃𝜃 from
the data using the Bayes’ rule:
Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ Pr(𝜃𝜃 = 𝜃𝜃)
̂
Pr 𝜃𝜃 = 𝜃𝜃̂ 𝑋𝑋 = 𝑥𝑥 ⃪
Pr(𝑋𝑋 = 𝑥𝑥)
posterior probability
(3)
𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ Pr(𝜃𝜃=𝜃𝜃)
Pr �
=
� Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ Pr(𝜃𝜃=𝜃𝜃)
�
� 𝜃𝜃

EE2211 Introduction to Machine Learning 28

𝜃𝜃 ∗ = argmax𝜃𝜃 Pr 𝜃𝜃 = 𝜃𝜃� 𝑋𝑋 = 𝑥𝑥
= argmax𝜃𝜃 𝑙𝑙𝑙𝑙𝑙𝑙 Pr 𝜃𝜃 = 𝜃𝜃� 𝑋𝑋 = 𝑥𝑥

EE2211 Introduction to Machine Learning 29

© Copyright EE, NUS. All Rights Reserved.
Parameter Estimation
called maximum likelihood
estimation when only this term is
• Suppose Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃� ≝ 𝑓𝑓𝜃𝜃� 𝑥𝑥 used for estimating the non-
random parameter θ while
assuming X random.

Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ Pr(𝜃𝜃 = 𝜃𝜃)

̂
Pr 𝜃𝜃 = 𝜃𝜃̂ 𝑋𝑋 = 𝑥𝑥 ⃪
Pr(𝑋𝑋 = 𝑥𝑥)

• In this way, maximum a posteriori (MAP) estimation

becomes maximum likelihood estimation

EE2211 Introduction to Machine Learning 30

© Copyright EE, NUS. All Rights Reserved.
Maximum Likelihood Estimation
• Suppose that the underlying distribution is a normal
𝑁𝑁(𝜇𝜇, 𝜎𝜎2) and we want to estimate the mean 𝜇𝜇 and
variance 𝜎𝜎2 from sample data 𝑋𝑋 = 𝑥𝑥1, 𝑥𝑥2, … 𝑥𝑥𝑥𝑥 using the
maximum likelihood estimator,

• Log-likelihood function 𝐿𝐿(𝑋𝑋|𝜃𝜃) = ∑𝑚𝑚

𝑖𝑖=1 log 𝑓𝑓𝜃𝜃 (𝑥𝑥𝑥𝑥) ,

𝐿𝐿(𝑋𝑋|𝜃𝜃)
− 𝑥𝑥𝑖𝑖−𝜇𝜇 2
1
= ∑𝑚𝑚
𝑖𝑖=1 log 𝑒𝑒 2𝜎𝜎2
2𝜋𝜋𝜎𝜎 2
𝑚𝑚 1
= −𝑚𝑚 𝑙𝑙𝑙𝑙𝑙𝑙 𝜎𝜎 − log 2𝜋𝜋 − 2 ∑𝑚𝑚
𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝜇𝜇 2
2 2𝜎𝜎

EE2211 Introduction to Machine Learning 31

and
𝜕𝜕𝐿𝐿(𝑋𝑋|𝜃𝜃) 𝑚𝑚 1 𝑚𝑚
= − + 3 ∑𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝜇𝜇 2 =0
𝜕𝜕 𝜎𝜎 𝜎𝜎 𝜎𝜎
we have,
1 𝑚𝑚
𝜎𝜎� = ∑𝑖𝑖=1 𝑥𝑥𝑖𝑖 − 𝑥𝑥̅ 2
𝑚𝑚

EE2211 Introduction to Machine Learning 32

© Copyright EE, NUS. All Rights Reserved.
Parametric Machine Learning
• A learning model that summarizes data with a set of
parameters of fixed size is called a parametric model. No
matter how much data you throw at a parametric model, it
won’t change its mind about how many parameters it
need. Artificial Intelligence: A Modern Approach, pp. 737
• The algorithms involve two steps,
1. Select a form for the function, e.g. normal distribution,
2. Learn the coefficients for the function from the training
data.

Pros: Simple, speed, less data

Cons: Strong assumption, limited complexity, poor fit

EE2211 Introduction to Machine Learning 33

© Copyright EE, NUS. All Rights Reserved.
Non-parametric Machine Learning
• Algorithms that do not make strong assumptions about the
form of the mapping function are called nonparametric
machine learning algorithms. They are good when you have
a lot of data and no prior knowledge, and when you don’t
want to worry too much about choosing just the right
features. Artificial Intelligence: A Modern Approach, pp. 757

Pros: Flexibility, performance

Cons: More data, slow, overfitting

EE2211 Introduction to Machine Learning 34

EE2211 Introduction to Machine Learning 35

MA3303 Probability and Complex Function Notes
No ratings yet
MA3303 Probability and Complex Function Notes
225 pages
Hypothesis Testing
63% (24)
Hypothesis Testing
59 pages
Statistics One Shot
100% (1)
Statistics One Shot
58 pages
Simulation
No ratings yet
Simulation
180 pages
Ikaj Stochmod Lectnotes
No ratings yet
Ikaj Stochmod Lectnotes
114 pages
MA8451 Notes 008 Edubuzz360
No ratings yet
MA8451 Notes 008 Edubuzz360
125 pages
AML-IV New
No ratings yet
AML-IV New
98 pages
Unit II
No ratings yet
Unit II
140 pages
Distribution and Statistical Interference
No ratings yet
Distribution and Statistical Interference
43 pages
02 Random Variables
No ratings yet
02 Random Variables
95 pages
Probability Formula Sheet
No ratings yet
Probability Formula Sheet
11 pages
Probability 360
No ratings yet
Probability 360
74 pages
1.stochastic Process Edited - Final
No ratings yet
1.stochastic Process Edited - Final
90 pages
Probability and Statistics II MAY 2023
No ratings yet
Probability and Statistics II MAY 2023
51 pages
091 - MA8451, MA6451 Probability and Random Processes - Notes PDF
No ratings yet
091 - MA8451, MA6451 Probability and Random Processes - Notes PDF
79 pages
Lecture 2 - General Concepts For ML
No ratings yet
Lecture 2 - General Concepts For ML
63 pages
2 - Artificial Intelligence Mathematics
No ratings yet
2 - Artificial Intelligence Mathematics
53 pages
CPSC531 Probability
No ratings yet
CPSC531 Probability
75 pages
Yash, Maths U1
No ratings yet
Yash, Maths U1
65 pages
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
No ratings yet
EE2211 Introduction To Machine Learning: Semester 1 2021/2022
35 pages
CS115 Probability 2
No ratings yet
CS115 Probability 2
58 pages
Lecture1 Intro ML
No ratings yet
Lecture1 Intro ML
60 pages
Lec 2
No ratings yet
Lec 2
46 pages
ENENDA30 - Module 3
No ratings yet
ENENDA30 - Module 3
48 pages
Chapter III Random Variables
No ratings yet
Chapter III Random Variables
99 pages
Machine Learning Week 3
No ratings yet
Machine Learning Week 3
51 pages
Unit 1
No ratings yet
Unit 1
21 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
Lecture 5 Prob&Statistics
No ratings yet
Lecture 5 Prob&Statistics
43 pages
IICT Unit-1 Notes-1
No ratings yet
IICT Unit-1 Notes-1
34 pages
MM3&4 - Probability and Distributions Summary Notes
No ratings yet
MM3&4 - Probability and Distributions Summary Notes
31 pages
Engineering Statistics - 4
No ratings yet
Engineering Statistics - 4
43 pages
Chapter - 2
No ratings yet
Chapter - 2
34 pages
Probability Is A Branch of Mathematics That Deals With Measuring The Likelihood of Events
No ratings yet
Probability Is A Branch of Mathematics That Deals With Measuring The Likelihood of Events
34 pages
Chapter - 2
No ratings yet
Chapter - 2
29 pages
ML - Lec 2 - Review of Probability and Statistics
No ratings yet
ML - Lec 2 - Review of Probability and Statistics
30 pages
Lecure-3 Probability
No ratings yet
Lecure-3 Probability
80 pages
Unit II - ML
No ratings yet
Unit II - ML
29 pages
Applied Statistics - Lecture 1: Mario Beraha
No ratings yet
Applied Statistics - Lecture 1: Mario Beraha
52 pages
Basic Probability
No ratings yet
Basic Probability
27 pages
Lecture2 Math ML Review
No ratings yet
Lecture2 Math ML Review
87 pages
Lecture 04: Introduction To Probability Theory (Part II)
No ratings yet
Lecture 04: Introduction To Probability Theory (Part II)
48 pages
1.sampling Methods and Sample Size Determination
No ratings yet
1.sampling Methods and Sample Size Determination
80 pages
Lecture01 Uppsala EQG 12
No ratings yet
Lecture01 Uppsala EQG 12
39 pages
Essentials of Machine Learning - Lesson 02
No ratings yet
Essentials of Machine Learning - Lesson 02
15 pages
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
No ratings yet
BCS-DS-602: Machine Learning: Dr. Sarika Chaudhary Associate Professor Fet-Cse
18 pages
Math Essentials PDF
No ratings yet
Math Essentials PDF
55 pages
Math Essentials1234adadada PDF
No ratings yet
Math Essentials1234adadada PDF
55 pages
1234adadvklop32165adada PDF
No ratings yet
1234adadvklop32165adada PDF
55 pages
Math Essentials1234adadvklop32165adada PDF
No ratings yet
Math Essentials1234adadvklop32165adada PDF
55 pages
Probability FoundationalMathofAI S24
No ratings yet
Probability FoundationalMathofAI S24
7 pages
LECT3 Probability Theory
No ratings yet
LECT3 Probability Theory
42 pages
Lecture 2
No ratings yet
Lecture 2
9 pages
Anova Table
No ratings yet
Anova Table
5 pages
Scribe: Naive Bayes Classifier
No ratings yet
Scribe: Naive Bayes Classifier
16 pages
Machine Problem
No ratings yet
Machine Problem
11 pages
6 - Super-Cheatsheet-Mathematics
No ratings yet
6 - Super-Cheatsheet-Mathematics
5 pages
ECMT1020 Lecture Notes 01 rv1
No ratings yet
ECMT1020 Lecture Notes 01 rv1
6 pages
Engineering Uncertainty Notes
No ratings yet
Engineering Uncertainty Notes
15 pages
ML Academy - Part II
No ratings yet
ML Academy - Part II
8 pages
Syllabus IP
No ratings yet
Syllabus IP
3 pages
Lecture 0: Background: Rafael A. Irizarry and Hector Corrada Bravo January 2010
No ratings yet
Lecture 0: Background: Rafael A. Irizarry and Hector Corrada Bravo January 2010
3 pages
POM Lecture
No ratings yet
POM Lecture
15 pages
Lecture3 Wind Energy
100% (1)
Lecture3 Wind Energy
47 pages
Handout 4 Cointegration
No ratings yet
Handout 4 Cointegration
21 pages
Tutorial
No ratings yet
Tutorial
42 pages
F2F Lecture 1 Slides
No ratings yet
F2F Lecture 1 Slides
33 pages
Ch25 Exercises
No ratings yet
Ch25 Exercises
16 pages
Statistical Methods Using Spss 1st Edition Gabriel Otieno Okello Instant Download
No ratings yet
Statistical Methods Using Spss 1st Edition Gabriel Otieno Okello Instant Download
83 pages
Statistics: Statistics Is The Discipline That Concerns The Collection, Organization, Displaying, Analysis
No ratings yet
Statistics: Statistics Is The Discipline That Concerns The Collection, Organization, Displaying, Analysis
21 pages
Graham, S., & Sandmel, K. (2011) The Process Writing Approach - A Meta-Analysis. The Journal of Educational Research, 104 (6), 396-407.
No ratings yet
Graham, S., & Sandmel, K. (2011) The Process Writing Approach - A Meta-Analysis. The Journal of Educational Research, 104 (6), 396-407.
14 pages
Busness Statistics & Statistics For Busness II Course Outline
No ratings yet
Busness Statistics & Statistics For Busness II Course Outline
3 pages
Measures of Shape
No ratings yet
Measures of Shape
17 pages
Stucor Ma3251 Ed
No ratings yet
Stucor Ma3251 Ed
164 pages
Portfolio Risk and Return - Part II Risk
No ratings yet
Portfolio Risk and Return - Part II Risk
39 pages
STATISTICS
No ratings yet
STATISTICS
28 pages
Lecture 4a Solar Thermal
No ratings yet
Lecture 4a Solar Thermal
34 pages
STROBE MR Checklist - Fillable 3
No ratings yet
STROBE MR Checklist - Fillable 3
4 pages
Lampiran Anova
No ratings yet
Lampiran Anova
12 pages
The Human Challenges of The Digital World
No ratings yet
The Human Challenges of The Digital World
27 pages
1 T-Test
No ratings yet
1 T-Test
36 pages
Chapter Goals: After Completing This Chapter, You Should Be Able To
No ratings yet
Chapter Goals: After Completing This Chapter, You Should Be Able To
32 pages
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
No ratings yet
Chapter 7: Parameter Estimation: ST2334 Probability and Statistics (Academic Year 2014/15, Semester 1)
45 pages
EE2211 Lecture 7
No ratings yet
EE2211 Lecture 7
43 pages
Multiple Linear Regression: Chapter 12
No ratings yet
Multiple Linear Regression: Chapter 12
49 pages
Sampling Techniques
No ratings yet
Sampling Techniques
32 pages
Statistics and Probability Q2 M9
No ratings yet
Statistics and Probability Q2 M9
12 pages
Bcom Part 2 Ae Business Statistics e 3041 2021
No ratings yet
Bcom Part 2 Ae Business Statistics e 3041 2021
8 pages
Short Notes - Binomial Distribution - Lakshya MHTCET 2025
No ratings yet
Short Notes - Binomial Distribution - Lakshya MHTCET 2025
3 pages
Apl 103 Hardware Project
No ratings yet
Apl 103 Hardware Project
2 pages
Week 8 - Hypothesis Testing Part 1
No ratings yet
Week 8 - Hypothesis Testing Part 1
4 pages
T Statistic and Z Statics Difference
No ratings yet
T Statistic and Z Statics Difference
4 pages
Supervised Machine Learning for Science: How to stop worrying and love your black box
From Everand
Supervised Machine Learning for Science: How to stop worrying and love your black box
Christoph Molnar
No ratings yet
Neural Networks and Fuzzy Logic
From Everand
Neural Networks and Fuzzy Logic
C. Naga Bhaskar
No ratings yet

EE2211 Lecture 3

Uploaded by

EE2211 Lecture 3

Uploaded by

EE2211 Introduction to Machine

Electrical and Computer Engineering Department

© Copyright EE, NUS. All Rights Reserved.

EE2211 Introduction to Machine Learning 2

• Causality creep is the idea that causal language is

EE2211 Introduction to Machine Learning 4

EE2211 Introduction to Machine Learning 5

EE2211 Introduction to Machine Learning 6

EE2211 Introduction to Machine Learning 7

EE2211 Introduction to Machine Learning 8

EE2211 Introduction to Machine Learning 9

EE2211 Introduction to Machine Learning 10

EE2211 Introduction to Machine Learning 11

• We describe a random experiment by describing its

• Outcomes are mutual exclusive in the sense that only one

• A subset of sample space 𝑆𝑆, denote as 𝐴𝐴, is an event in a

EE2211 Introduction to Machine Learning 12

Assuming events 𝐴𝐴 ⊆ 𝑆𝑆 and 𝐵𝐵 ⊆ 𝑆𝑆, the probabilities of

EE2211 Introduction to Machine Learning 13

• We shall use Pr(·) for both the above cases.

EE2211 Introduction to Machine Learning 15

A probability mass function

EE2211 Introduction to Machine Learning 17

EE2211 Introduction to Machine Learning 18

A probability density function

EE2211 Introduction to Machine Learning 19

• Integral is an equivalent of the summation over all values of

EE2211 Introduction to Machine Learning 20

EE2211 Introduction to Machine Learning 21

Slides courtesy: Professor Robby Tan

EE2211 Introduction to Machine Learning 22

EE2211 Introduction to Machine Learning 23

EE2211 Introduction to Machine Learning 24

• The conditional probability Pr(𝑌𝑌 = 𝑦𝑦|𝑋𝑋 = 𝑥𝑥) is the

Adapted from S. Prince

where 𝜃𝜃� ≝ [𝜇𝜇, σ] is the parameter vector

EE2211 Introduction to Machine Learning 27

EE2211 Introduction to Machine Learning 28

EE2211 Introduction to Machine Learning 29

Pr 𝑋𝑋 = 𝑥𝑥 𝜃𝜃 = 𝜃𝜃̂ Pr(𝜃𝜃 = 𝜃𝜃)

• In this way, maximum a posteriori (MAP) estimation

EE2211 Introduction to Machine Learning 30

• Log-likelihood function 𝐿𝐿(𝑋𝑋|𝜃𝜃) = ∑𝑚𝑚

EE2211 Introduction to Machine Learning 31

EE2211 Introduction to Machine Learning 32

Pros: Simple, speed, less data

EE2211 Introduction to Machine Learning 33

Pros: Flexibility, performance

EE2211 Introduction to Machine Learning 34

EE2211 Introduction to Machine Learning 35

You might also like