0% found this document useful (0 votes)

26 views53 pages

2 - Artificial Intelligence Mathematics

The document covers fundamental mathematical concepts related to artificial intelligence, including random variables, expected value, variance, covariance, and differentiation. It explains the differences between discrete and continuous random variables, how to calculate expected value and variance, and introduces concepts like Gaussian distribution and Bayes' Rule. Additionally, it touches on matrix operations such as eigenvalues, eigenvectors, and singular value decomposition.

Uploaded by

Võ Hoàng Chương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

26 views53 pages

2 - Artificial Intelligence Mathematics

Uploaded by

Võ Hoàng Chương

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 53

Artificial Intelligence Mathematics

(Artificial Intelligence-related
Mathematical Concepts)
Some Math & Probability

• Expected value
• Covariance
• Differentiation
• Information Theory

2
Random Variable

• A random variable x takes on a defined set of

values with different probabilities
• For example, if you roll a die, the outcome is random (not
fixed) and there are 6 possible outcomes, each of which
occur with probability one-sixth.
• For example, if you poll people about their voting
preferences, the percentage of the sample that responds
“Yes on Proposition A” is a also a random variable (the
percentage will be slightly differently every time you poll).

• Roughly, probability is how frequently we

expect different outcomes to occur if we
repeat the experiment over and over
3
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Random variables can be discrete or
continuous

• Discrete random variables have a countable

number of outcomes
– Examples: Dead/alive, treatment/placebo, dice,
counts, etc.
• Continuous random variables have an
infinite continuum of possible values.
– Examples: blood pressure, weight, the speed of a
car, the real numbers from 1 to 6.

4
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Expected value

- Weighted average of possible values of a random variable

Discrete case:

E( X ) = ∑ x p(x )
all x
i i

Continuous case:

E( X ) = ∫
all x
xi p(xi )dx
5
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Expected value

• If X is a random integer between 1 and 10,

what’s the expected value of X?

6
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Expected value

• If X is a random integer between 1 and 10,

what’s the expected value of X?

10 10
1 1 10(10 + 1)
∑
µ = E ( x) = i ( ) =
i =1 10 10
∑
i
i = (.1)
2
= 55(.1) = 5.5

7
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Average vs. Expected Value

• E.g. Random integer between 1 and 10

– Average (a.k.a. arithmetic mean)

Given a list of (4, 6, 9, 1, 10)
Average: (4+6+9+1+10) / 5

– Expected Value : 5.5

8
Variance/standard deviation

“The expected squared distance (or deviation)

from the mean”
i.e. spread of a data set around its mean value

σ = Var ( x) = E[( x − µ ) ] =
2 2
∑ (x − µ )
all x
i
2
p(xi )

9
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Variance
Discrete case:

Var ( X ) = σ =2
∑ (x − µ )
all x
i
2
p(xi )

Continuous case:
∞
Var ( X ) = σ = ∫ ( xi − µ ) p ( xi )dx
2 2

−∞
10
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
∑ (x − µ ) ∑x
2
Var ( X ) = i
2
p(xi ) = i p(xi ) − ( µ ) 2

all x all x

2 2
Proofs (optional):
= E ( x ) − [ E ( x)]
E(x-µ)2 = E(x2–2µx + µ2)
=E(x2) – E(2µx) +E(µ2) expected value:E(X+Y)= E(X) + E(Y)
= E(x2) – 2µE(x) +µ2 E(c) = c
= E(x2) – 2µµ +µ2 E(x) = µ
= E(x2) – µ2
= E(x2) – [E(x)]2

11
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Variance

Find the variance and standard deviation for

the number of ships to arrive at the harbor

x 10 11 12 13 14
P(x) .4 .2 .2 .1 .1

(the mean is 11.3).

12
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Variance and std dev

x2 100 121 144 169 196

P(x) .4 .2 .2 .1 .1

5
2
E(x ) = ∑i =1
2
xi p ( x i ) =(100)(.4) + (121)(.2) + 144(.2) + 169(.1) + 196(.1) = 129.5

Var ( x) = E ( x 2 ) − [ E ( x)] 2 = 129.5 − 11.3 2 = 1.81

stddev( x) = 1.81 = 1.35

Interpretation: On an average day, we expect 11.3 ships to

arrive in the harbor, plus or minus 1.35. This gives you a feel for
what would be considered a usual day.
13
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Gaussian (Normal)
• If I look at the height of women in country xx, it will look
approximately Gaussian
• Small random noise errors, look Gaussian/Normal

• Distribution:

• Mean/var

14
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Coin tosses

• Number of heads in 100 tosses

– Flip coins virtually
– Flip a virtual coin 100 times; count # of heads
– Repeat this over and over again a large number of
times (e.g. 30,000)
– Plot the results

15
Coin tosses
• Number of heads in 100 tosses
– 30,000 trials

Mean = 50
Std. dev = 5
Follows a normal
distribution
∴95% of the time, we
get between 40 and 60
heads…

16
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Covariance: joint probability

• The covariance measures the strength of the

linear relationship between two variables
e.g. A positive covariance means both investments' returns tend
to move upward or downward in value at the same time.
E[( x − µ x )( y − µ y )]

N
σ xy = ∑ ( xi − µ x )( yi − µ y ) P ( xi , yi )
i =1

17
Kristin L. Sainani, HRP 259: Intro to Probability and Statistics
Interpreting Covariance

• Covariance between two random variables:

E[( x − µ x )( y − µ y )]

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent

18
Calculation of covariance

19
Calculation of covariance

20
Determinant of a Matrix (행렬식)

e.g.

21
https://www.mathsisfun.com/algebra/matrix-determinant.html
Eigenvalue and Eigenvector

• A v = λ v (A must be a square matrix)

• v (A – λ Ι) = 0 , eigenvector v is non-zero vector
• Finding λ (eigenvalue)  use determinant

22
https://jeongchul.tistory.com/603
Eigenvalue and Eigenvector

eigenvalueis 3

When eigenvalue is 1

23
https://jeongchul.tistory.com/603
Singular Value Decomposition

• Problem:
– #1: Find concepts in data
– #2: Reduce dimensionality

24
Recommender Systems, Lior Rokach
SVD - Definition

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

• A: n x m matrix (e.g., n documents, m terms)

• U: n x r matrix (n documents, r concepts)
• Λ: r x r diagonal matrix (strength of each
‘concept’) (r: rank of the matrix)
• V: m x r matrix (m terms, r concepts)

25
Manning and Raghavan, 2004
SVD - Example

• A = U Λ VT - example:
retrieval
inf. lung
data brain

1 1 1 0 0 0.18 0
2 2 2 0 0 0.36 0
CS 1 1 1 0 0 0.18 0 9.64 0
5 5 5 0 0 = x 0 5.29
x
0.90 0
0 0 0 2 2 0 0.53
0 0 0 3 3 0 0.80
MD 0 0 0 1 1
0.58 0.58 0.58 0 0
0 0.27 0 0 0 0.71 0.71

26
SVD - Example

• A = U Λ VT - example:
doc-to-concept
retrieval CS-concept similarity matrix
inf. MD-concept
data brain lung
1 1 1 0 0 0.18 0
2 2 2 0 0 0.36 0
CS 1 1 1 0 0 0.18 0 9.64 0
5 5 5 0 0 = x 0 5.29
x
0.90 0
0 0 0 2 2 0 0.53
0 0 0 3 3 0 0.80
MD 0 0 0 1 1
0.58 0.58 0.58 0 0
0 0.27 0 0 0 0.71 0.71

27
SVD - Example

• A = U Λ VT - example:
retrieval
inf. lung ‘strength’ of CS-concept
data brain
1 1 1 0 0 0.18 0
2 2 2 0 0 0.36 0
CS 1 1 1 0 0 0.18 0 9.64 0
5 5 5 0 0 = x 0 5.29
x
0.90 0
0 0 0 2 2 0 0.53
0 0 0 3 3 0 0.80
MD 0 0 0 1 1
0.58 0.58 0.58 0 0
0 0.27 0 0 0 0.71 0.71

28
Bayes Rule

• Which tells us:

– how often A happens given that B happens, written P(A|B),
• When we know:
– how often B happens given that A happens, written P(B|A)
– how likely A is on its own, written P(A)
– how likely B is on its own, written P(B)
• E.g.
– P(Fire|Smoke) means how often there is fire when we can see smoke
– P(Smoke|Fire) means how often we can see smoke when there is fire

29
Bayes’ Rule

30
https://hyeongminlee.github.io/post/bnn001_bayes_rule/
Differentiation

Differentiation is all about measuring change.

E.g. Measuring change in a linear function:

y = a + bx
a = intercept
b = constant slope i.e. the impact of a unit
change in x on the level of y

∆y y2 − y1
b= =
∆x x2 − x1 31
Example: A firms cost function is
Y = X2
40

30
X ∆X Y ∆Y
0 0
1 +1 1 +1
y=x2

2 +1 4 +3
10 3 +1 9 +5
4 +1 16 +7
0
0 1 2 3
X 4 5 6

Y = X2
Y+∆Y = (X+∆X) 2
Y+∆Y =X2+2X.∆X+∆X2
∆Y = X2+2X.∆X+∆X2 – Y
since Y = X2 ⇒ ∆Y = 2X.∆X+∆X2
∆Y
∆X
= 2X+∆X
34
The slope depends on X and ∆X
The slope of the graph of a function is called
the derivative of the function

dy ∆y
f ' ( x) = = lim
dx ∆x→0 ∆x
• The process of differentiation involves letting the
change in x become arbitrarily small, i.e. letting
∆x→0
e.g if = 2X+∆X and ∆X →0
⇒ = 2X in the limit as ∆X →0
35
Differentiation

• Examples:
f(x) = 3
f(x) = 4x
f(x) = 4x2

36
Differentiation

• Product Rule
If y = u.v where u and v are functions of x,
(u = f(x) and v = g(x) ) Then
dy dv du
=u +v
dx dx dx
• Examples
i) y = (x+2)(ax2+bx)
dy
dx
= (x + 2)(2ax + b ) + ax 2 + bx ( )
ii) y = (4x3-3x+2)(2x2+4x)
dy =  4 x 3 − 3 x + 2 (4 x + 4)+  2 x 2 + 4 x  12x 2 − 3
dx      37
Differentiation
• The Chain Rule
If y is a function of v, and v is a function of x,
then y is a function of x and
dy dy dv
= .
dx dv dx
• Example:
i) y = (ax2 + bx)½
2 ½
let v = (ax + bx) , so y = v
1
dy 1
dx 2
= ( −
)
ax + bx 2 .(2ax + b )
2
38
Sigmoid Function
• Sigmoid Function:
• In an exponential function, the base value is a function with a natural
constant e.
• Natural constant e:
• Called 'the base of the natural logarithmic' and 'Euler's number'
• An irrational number that is important in mathematics, like pi (π)
• Its value is approximately 2.718281828... Lim
• When the natural constant e is included in the exponential function
and enters the denominator, it becomes a sigmoid function

39
모두의 딥러닝
Sigmoid Function

40
모두의 딥러닝
Information Theory

• Key idea: unlikely events are more informative than

frequent events

41
https://ratsgo.github.io/statistics/2017/09/22/information/
Shannon’s Entropy

• The amount of information in an event with a value of x of

the random variable X:

– Example: A coin toss that results in heads: −log20.5=1,

– Example: Roll the dice and eye gets 1: −log21/6=2.5849%
– If the base is 2, the unit of information is called the shannon or bit.
• Shannon’s Entropy: Expected value of all incident information

• If you toss a coin that has the same chance of getting heads and tails,
1 fair coin

42
https://ratsgo.github.io/statistics/2017/09/22/information/
Entropy

• The x-axis is the fairness of the coin (1, i.e. the

probability of getting heads)
43
https://ratsgo.github.io/statistics/2017/09/22/information/
Cross Entropy

• H(P,Q) is similar to H(P), which is the entropy of P, but

the probability of being multiplied outside the logarithm is
P(x), and what goes inside the logarithm is Q(x). Entropy
is entropy, but the two probability distributions are
multiplied by an intersection

44
https://ratsgo.github.io/statistics/2017/09/22/information/
KL Divergence
• Calculate the difference between two probability distributions
• The difference between the distribution P(x) of the actual data and
the distribution Q(x) of the data estimated by the model can be
obtained using KLD

45
https://ratsgo.github.io/statistics/2017/09/22/information/
Softmax Function
• If we take the probability that the ith element is zi and the probability that the ith
class is correct in the vector of k dimension is pi, then the softmax function defines
pi as

• If it's three-dimensional

46
Maximum Likelihood Estimation
Maximum Likelihood Estimation (MLE)

• The Maximum Likelihood Estimation (MLE, 최대가능도)

is a method of estimating the parameters of a model.
This estimation method is one of the most widely used.
• The method of maximum likelihood selects the set of
values of the model parameters that maximizes the
likelihood function, i.e. maximizes the "agreement" of the
selected model with the observed data
– parameter values for which the observed sample is
most likely to have been generated
Example: Flip a Thumbtack

• When tossed, it can land in one of two positions:

Head(H) or Tail (T)
• We denote by θ the (unknown) probability P(H).
• Estimation task:
– Given a sequence of toss samples x[1], x[2], …, x[M]
we want to estimate the probabilities
P(H)= θ and P(T) = 1 – θ
The task is to find a vector of parameters (θ in this case)
that have generated the given data. This vector
parameter can be used to predict future data.
Likelihood Function

• How good is a particular ?

It depends on how likely it is to generate the
observed data
LD (θ ) P=
= (D | θ ) ∏ P( x[m] | θ )
m

• The likelihood for the sequence H,T, T, H, H is

L(θ)

LD (θ ) = θ ⋅ (1 − θ ) ⋅ (1 − θ ) ⋅ θ ⋅ θ

0 0.2 0.4
θ 0.6 0.8 1
MLE

• To compute the likelihood in the thumbtack

example we only require NH and NT
(the number of heads and the number of tails)
LD (θ )= θ NH
⋅ (1 − θ ) NT
MLE

MLE Principle: Choose parameters that maximize

the likelihood function
• This is one of the most commonly used estimators
• Intuitively appealing
• One usually maximizes the log-likelihood function,
defined as lD(θ) = ln LD(θ)
LD(θ) = θ NH . (1- θ)NT
lD (θ ) = N H log θ + NT log (1 − θ )
Given that logarithm is an increasing function so it is equivalent to
maximize the log likelihood
Example: MLE in Binomial Data
lD (θ ) = N H log θ + NT log (1 − θ )
Taking derivative and equating it to 0,
we get N NT NH
H
= ⇒ θ=ˆ
θ 1−θ N H + NT
Example:
(NH, NT ) = (3,2)
L(θ)
MLE estimate is 3/5 = 0.6

LD(θ) = θ NH . (1- θ)NT

53
0 0.2 0.4 0.6 0.8 1
MLE

• Given observed values X1 = x1, X2 = x2, . . . , Xn = xn,

the likelihood of θ is the function
L(θ) = f(x1, x2, . . . , xn|θ)
probability of observing the given data as a function of θ
L(θ) = If the Xi are iid (independent and identically distributed)

• It will be equivalent to maximize the log likelihood:

2 or more
MLE for Multinomial

• Now suppose X can have the values 1,2,…,K

(For example a die has K=6 sides)
• We want to learn the parameters θ1, θ2. …, θk
N1, N2, …, NK - number of times each outcome is
observed
K
• Likelihood function: LD (θ ) = ∏ θ k Nk
k =1

ˆ Nk
• MLE: θk = Note: 2:LD(θ) = θ NH . (1- θ)NT

∑ N


Draw PDF
No ratings yet
Draw PDF
21 pages
04 Ekspektasi - Matematik - SLIDE
No ratings yet
04 Ekspektasi - Matematik - SLIDE
28 pages
1 Module Notes
No ratings yet
1 Module Notes
9 pages
Random Variable & Distributions
No ratings yet
Random Variable & Distributions
40 pages
Introductory Econometrics: Probability and Statistics Refresher
No ratings yet
Introductory Econometrics: Probability and Statistics Refresher
35 pages
Applied Maths
No ratings yet
Applied Maths
34 pages
4.2 (Cont.) Standard Deviation of A Discrete Random Variable
No ratings yet
4.2 (Cont.) Standard Deviation of A Discrete Random Variable
33 pages
13 Discrete RV
No ratings yet
13 Discrete RV
29 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
Module 2 - Random Variables
No ratings yet
Module 2 - Random Variables
55 pages
PME-lec7-ch4-a
No ratings yet
PME-lec7-ch4-a
67 pages
Lecture Note L4
No ratings yet
Lecture Note L4
49 pages
Expected Values and Variance
No ratings yet
Expected Values and Variance
6 pages
Probability and Statistics, Slides
No ratings yet
Probability and Statistics, Slides
73 pages
Review of Statistics Econ3005 L1 AEF
No ratings yet
Review of Statistics Econ3005 L1 AEF
42 pages
Week Two Note
No ratings yet
Week Two Note
19 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
FRM Part 1: Basic Statistics
No ratings yet
FRM Part 1: Basic Statistics
28 pages
Sasin DECS 434 Session 1 and 2 - Probability and Excel
No ratings yet
Sasin DECS 434 Session 1 and 2 - Probability and Excel
104 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Random Variables
No ratings yet
Random Variables
44 pages
Chapter 3 - Market Data Analysis
No ratings yet
Chapter 3 - Market Data Analysis
55 pages
Mean Variance
No ratings yet
Mean Variance
14 pages
Paper III Stastical Methods in Economics
No ratings yet
Paper III Stastical Methods in Economics
115 pages
Chapter Two
No ratings yet
Chapter Two
10 pages
060 Random Variables
No ratings yet
060 Random Variables
5 pages
IICT Unit-1 Notes-1
No ratings yet
IICT Unit-1 Notes-1
34 pages
Random Variables: Prof. Megha Sharma
No ratings yet
Random Variables: Prof. Megha Sharma
35 pages
L2 - Mathematical Preliminaries
No ratings yet
L2 - Mathematical Preliminaries
24 pages
Formula Sheet
No ratings yet
Formula Sheet
18 pages
Random Variable Modified PDF
No ratings yet
Random Variable Modified PDF
19 pages
Probabbility Distribution Note
No ratings yet
Probabbility Distribution Note
28 pages
Stat Reviewer 1
No ratings yet
Stat Reviewer 1
61 pages
LN06 Random Variables
No ratings yet
LN06 Random Variables
5 pages
Review of Probability and Statistics
No ratings yet
Review of Probability and Statistics
34 pages
Joint Probability Distribution
No ratings yet
Joint Probability Distribution
25 pages
Statistics Probability11 q3 Week2 v4
No ratings yet
Statistics Probability11 q3 Week2 v4
10 pages
ProbabilityStatistics Probability2
No ratings yet
ProbabilityStatistics Probability2
11 pages
Random Variables and Applications: OPRE 6301
No ratings yet
Random Variables and Applications: OPRE 6301
35 pages
ETF2100 5910 Tutorial Week 1 SOLUTION
No ratings yet
ETF2100 5910 Tutorial Week 1 SOLUTION
7 pages
L-10 Expectation & Variance PDF
No ratings yet
L-10 Expectation & Variance PDF
34 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Stats Revision Slides (From Maria Molina-Domene)
No ratings yet
Stats Revision Slides (From Maria Molina-Domene)
32 pages
Stats Cheat Sheets
No ratings yet
Stats Cheat Sheets
15 pages
Recap: Random Variables
No ratings yet
Recap: Random Variables
19 pages
Mathematical Expectation
No ratings yet
Mathematical Expectation
49 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
SDM 1 Formula
No ratings yet
SDM 1 Formula
9 pages
Random Variables
No ratings yet
Random Variables
24 pages
Probab, Stats
No ratings yet
Probab, Stats
17 pages
Lecture1 Introduction
No ratings yet
Lecture1 Introduction
74 pages
ST3236 Note3
No ratings yet
ST3236 Note3
17 pages
4. Random Variable Summer 24-25
No ratings yet
4. Random Variable Summer 24-25
11 pages
4 Random Variables
No ratings yet
4 Random Variables
4 pages
Stats and Prob Reviewer
No ratings yet
Stats and Prob Reviewer
7 pages
Types of Learning Approach: Supervised and Semi-Supervised Learning
No ratings yet
Types of Learning Approach: Supervised and Semi-Supervised Learning
2 pages
W5 20190214 Group1 Matlabreport
No ratings yet
W5 20190214 Group1 Matlabreport
18 pages
EE440 - Homework 20190321
No ratings yet
EE440 - Homework 20190321
2 pages
Homework 1&2 Report EE440
100% (1)
Homework 1&2 Report EE440
19 pages
P Vohoangchuong: Problem 2: A) AES Cipher
No ratings yet
P Vohoangchuong: Problem 2: A) AES Cipher
8 pages
Report Lab EE 341 Final
100% (1)
Report Lab EE 341 Final
42 pages
Presentation EE471 Multiplication
No ratings yet
Presentation EE471 Multiplication
20 pages
Advdsp s2007-08 hw1
No ratings yet
Advdsp s2007-08 hw1
3 pages
Assignment 3
No ratings yet
Assignment 3
1 page
Lab2: Design A MIPS 32-Bit ALU: EE471 - Computer Organization and Design Semester 1 2016-2017
No ratings yet
Lab2: Design A MIPS 32-Bit ALU: EE471 - Computer Organization and Design Semester 1 2016-2017
2 pages
Computer Architecture Homework:01
No ratings yet
Computer Architecture Homework:01
4 pages
Solar System Presentation
No ratings yet
Solar System Presentation
34 pages
Project Designing An Audio Amplifier
No ratings yet
Project Designing An Audio Amplifier
11 pages
Ikat Handout
No ratings yet
Ikat Handout
13 pages
MHSC MH-ServiceCoils Tube-in-Fin SellSheet RVL-0047
No ratings yet
MHSC MH-ServiceCoils Tube-in-Fin SellSheet RVL-0047
2 pages
Booking Details Santos MNLPPSMNL
No ratings yet
Booking Details Santos MNLPPSMNL
2 pages
Medium Crossbody Bag - Accessories - Victoria's S
No ratings yet
Medium Crossbody Bag - Accessories - Victoria's S
1 page
RESEARCH
No ratings yet
RESEARCH
11 pages
Comprehensive Business Plan Template
No ratings yet
Comprehensive Business Plan Template
16 pages
Quantitative Methods For Business 13th Edition Anderson Test Bank
100% (58)
Quantitative Methods For Business 13th Edition Anderson Test Bank
25 pages
Compro Pas Living
No ratings yet
Compro Pas Living
12 pages
Tutorial Solutions Week 10
No ratings yet
Tutorial Solutions Week 10
3 pages
2A3 M8 Cembre 16sqmm 11kV 33kV Cable Lug M8 Hole
No ratings yet
2A3 M8 Cembre 16sqmm 11kV 33kV Cable Lug M8 Hole
1 page
Modere Compensation Plan PDF
No ratings yet
Modere Compensation Plan PDF
11 pages
BMO Bank Statement JUNE
100% (1)
BMO Bank Statement JUNE
3 pages
Study Handicrafts
No ratings yet
Study Handicrafts
12 pages
Assignment PDF
No ratings yet
Assignment PDF
4 pages
Defense PPT-Example
No ratings yet
Defense PPT-Example
10 pages
Deja Martin: Work Experience
No ratings yet
Deja Martin: Work Experience
5 pages
NIOS Class 10 Economics Important QA
No ratings yet
NIOS Class 10 Economics Important QA
4 pages
Isi 2022 A
No ratings yet
Isi 2022 A
9 pages
Modules 1&2
No ratings yet
Modules 1&2
6 pages
Lecture No.8 - 9
No ratings yet
Lecture No.8 - 9
29 pages
IELTS Writing Test 2 August 2024
No ratings yet
IELTS Writing Test 2 August 2024
3 pages
Rock Class Criepi
No ratings yet
Rock Class Criepi
3 pages
Pages From SAES-L-105
No ratings yet
Pages From SAES-L-105
2 pages
Kerajaan Malaysia V Ven-Coal Resources SDN BHD (2014) 11 MLJ 218
No ratings yet
Kerajaan Malaysia V Ven-Coal Resources SDN BHD (2014) 11 MLJ 218
24 pages
Five Fundamental by Bhoopendra
No ratings yet
Five Fundamental by Bhoopendra
14 pages
Msme Sector: Challenges and Opportunities: Neha Singh, Dr. Sneh P. Daniel
No ratings yet
Msme Sector: Challenges and Opportunities: Neha Singh, Dr. Sneh P. Daniel
4 pages
Independent University, Bangladesh (Iub) : Assignment On Income Per Capita Along With Comparison of Another SAARC Country
No ratings yet
Independent University, Bangladesh (Iub) : Assignment On Income Per Capita Along With Comparison of Another SAARC Country
9 pages
Electrical Plan
100% (1)
Electrical Plan
7 pages
Cost Accounting Unit 2 - Methods of Pricing Material Issues
100% (1)
Cost Accounting Unit 2 - Methods of Pricing Material Issues
7 pages
Regulatory Governance
No ratings yet
Regulatory Governance
8 pages

2 - Artificial Intelligence Mathematics

Uploaded by

2 - Artificial Intelligence Mathematics

Uploaded by

Artificial Intelligence Mathematics

• A random variable x takes on a defined set of

• Roughly, probability is how frequently we

• Discrete random variables have a countable

- Weighted average of possible values of a random variable

• If X is a random integer between 1 and 10,

• If X is a random integer between 1 and 10,

• E.g. Random integer between 1 and 10

– Average (a.k.a. arithmetic mean)

– Expected Value : 5.5

“The expected squared distance (or deviation)

Find the variance and standard deviation for

(the mean is 11.3).

x2 100 121 144 169 196

Var ( x) = E ( x 2 ) − [ E ( x)] 2 = 129.5 − 11.3 2 = 1.81

Interpretation: On an average day, we expect 11.3 ships to

• Number of heads in 100 tosses

• The covariance measures the strength of the

• Covariance between two random variables:

cov(X,Y) > 0 X and Y are positively correlated

cov(X,Y) < 0 X and Y are inversely correlated

cov(X,Y) = 0 X and Y are independent

• A v = λ v (A must be a square matrix)

A[n x m] = U[n x r] Λ [ r x r] (V[m x r])T

• A: n x m matrix (e.g., n documents, m terms)

• Which tells us:

Differentiation is all about measuring change.

• Key idea: unlikely events are more informative than

• The amount of information in an event with a value of x of

– Example: A coin toss that results in heads: −log20.5=1,

• The x-axis is the fairness of the coin (1, i.e. the

• H(P,Q) is similar to H(P), which is the entropy of P, but

• The Maximum Likelihood Estimation (MLE, 최대가능도)

• When tossed, it can land in one of two positions:

• How good is a particular ?

• The likelihood for the sequence H,T, T, H, H is

• To compute the likelihood in the thumbtack

MLE Principle: Choose parameters that maximize

LD(θ) = θ NH . (1- θ)NT

• Given observed values X1 = x1, X2 = x2, . . . , Xn = xn,

• It will be equivalent to maximize the log likelihood:

• Now suppose X can have the values 1,2,…,K

You might also like