0% found this document useful (0 votes)

10 views29 pages

Computational Thinking & Artificial Intelligence: (5 Week, Probability)

The document outlines a lecture series on Computational Thinking and Artificial Intelligence, focusing on Probability and Information Theory. It discusses the role of probability in AI decision-making, contrasting Frequentist and Bayesian perspectives, and introduces concepts such as Bayes' theorem and Shannon entropy. Additionally, it emphasizes the importance of quantifying information and uncertainty in AI applications.

Uploaded by

yeeheon06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views29 pages

Computational Thinking & Artificial Intelligence: (5 Week, Probability)

Uploaded by

yeeheon06

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

Computational Thinking &

Artificial Intelligence
(5th Week, Probability)

Kyoungwon Seo
Dept. of Applied Artificial Intelligence
[email protected]
Lecture

• Understanding Probability Theory

• Understanding Information Theory
Lecture program
• 1st Lecture: Introduction 9th Lecture: Cat & Dog Classification

• 2nd Lecture: Matrix 10th Lecture: Image Generation Model

• 3rd Lecture: Vector and Matrix 11th Lecture: Practice: Image Generation Model

• 4th Lecture: Perceptron 12th Lecture: Language Generation Model

• 5th Lecture: Probability 13th Lecture: Practice: Language Generation Model

• 6th Lecture: Learning Techniques 14th Lecture: AI and Ethics

• 7th Lecture: Deep Learning 15th Lecture: Make-Up Lecture

• 8th Lecture: Deep Learning Library & 16th Lecture: Final Term Exam
Mid-term Exam.
Probability Theory
Probability theory
• What is the role of probability theory in AI ?
▪ Tools to make better decisions in uncertainty by estimating parameters

Navigation

Uncertainty

Person
Identification

AI Speaker
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ Frequentist: uses maximum likelihood estimation (MLE)

▪ Law of large numbers: a theorem that describes the result of performing the same experiment,
a large number of times
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ In 1830, Joseph Jagger discovered a bias (9 numbers work well) in the Beaux-Arts casino roulette
(0-36 numbers) in Monte Carlo, Monaco, where a $1 stake could win $35
36 1
▪ Original expected value = −$1 ∗ + $35 ∗ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − 2.7𝑐𝑒𝑛𝑡
37 37

35.8 1.2
▪ Biased expected value = −$1 ∗ + $35 ∗ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 + 16.8𝑐𝑒𝑛𝑡 (bias = 1.2)
37 37

Ref. Biography joseph Jagger

Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ Frequentist probability is difficult to apply unless it is an event that repeats a lot
(e.g., What if a doctor said that a patient has a 40% chance of developing dementia?)

▪ Bayesian: Calculate posterior probabilities based on prior probabilities

Event A: People who show certain symptoms

Event A Event B
Event B: People who actually have dementia

𝑷 𝑨∩𝑩 𝒃
▪ Conditional probability: 𝑷 𝑩 𝑨 = = (𝑷 𝑨 > 𝟎)
𝑷𝑨 𝒂+𝒃
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ Conditional probability question: 70% of all used cars have air conditioning, and 40% have a
CD player. If 90% of all used cars own at least one of the two, what is the probability that used
cars without air conditioning will not have a CD player?

▪ Answer:

𝑃 𝐴 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝑎𝑖𝑟 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑖𝑛𝑔 = 0.3

𝑃 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝐶𝐷 𝑝𝑙𝑎𝑦𝑒𝑟 = 0.6

𝑃 𝐴 ∩ 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑎𝑖𝑟 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑖𝑛𝑔 𝑛𝑜𝑟 𝐶𝐷 𝑝𝑙𝑎𝑦𝑒𝑟𝑠 𝑎𝑟𝑒 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 = 0.1

𝑃 𝐵∩𝐴 0.1 1
𝑃 𝐵|𝐴 = = =
𝑃 𝐴 0.3 3
Bayesian probability
• Bayes’ theorem
▪ A theorem that expresses the relationship between prior and posterior probabilities

𝑃 𝐵𝐴 𝑃 𝐴
𝑃 𝐴𝐵 =
𝑃 𝐵

𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 = , 𝑃 𝐵 𝐴 𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃 𝐴

▪ 𝑷(𝑨) prior probability:

Initial probability determined based on the information currently available

▪ 𝑷(𝑨|𝑩) posterior probability:

Revising the prior information based on the added information
Bayesian probability
• Example
▪ Mr. O want to check for cancer using a test.
People in their 70s and 80s have a 0.6% chance of getting a cancer.
When you have a cancer, the probability that the test will come out positive is 90%.
Even if it is not cancer, the probability that the test is positive 5%.
When the test is positive, how should Mr. O be Judged?

▪ Answer:
𝑃 𝑐 = 0.006, 𝑃 ℎ = 0.994

𝑃 𝑝∩𝑐 𝑃 𝑝∩ℎ
𝑃 𝑝𝑐 = = 0.9, 𝑃 𝑝ℎ = = 0.05
𝑃 𝑐 𝑃 ℎ

𝑃 𝑐∩𝑝 𝑃 𝑝|𝑐 𝑃(𝑐) 0.9 ∗ 0.006

𝑃 𝑐𝑝 = = = = 0.0980 (9.8%)
𝑃 𝑝 𝑃 𝑝|𝑐)𝑃 𝑐 + 𝑃 𝑝 ℎ 𝑃(ℎ 0.9 ∗ 0.006 + 0.05 ∗ 0.994
Bayesian probability
• Unfair condition
▪ ℎ: hidden probability (predicted data)

▪ 𝑥: observed probability (train data)

𝑃 𝑥 ℎ 𝑃(ℎ)
𝑃 ℎ𝑥 =
𝑃 𝑥

• Coin toss (fair or unfair coin)

▪ 𝑝 𝐻𝑒𝑎𝑑 = 0.5

▪ previous outcomes are all head for 𝑛 − 1 times, you don’t know whether a coin is fair or not

▪ 𝑝 𝐻𝑒𝑎𝑑|𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 > 0.5

Bayesian deep learning
• Example
▪ Whereas frequentist(standard) deep learning trains a model based on fixed parameters,
Bayesian deep learning tries to reflect uncertainty through flexible parameters.

Parameters are represented by single, fixed values Parameters are represented by distribution
Information Theory
Information theory
• What is the role of information theory in AI ?
▪ Applied mathematics to quantify the amount of information

Claude E. Shannon
Information theory
• What is the role of information theory in AI?
▪ Describe the amount of information in relation to the probability
that represents the degree of uncertainty

▪ Highest uncertainty when tossing a coin has a 50% chance of heads and 50% tails

▪ Decreased uncertainty when 20% head and 80% tails

▪ Uncertainty disappears when the back side is 100%

▪ How can we quantify this?

Information theory – self-information
• Claude Shannon’s definition of self-information is
▪ An event with probability 100% is perfectly unsurprising and yields no information
▪ The less probable event is, the more surprising it is and the more information it yields
▪ If two independent events are measured separately, the total amount of information is the sum
of information of the individual events

• Given an event 𝑝(𝑥), the self-information is defined by

𝐼 𝑥 = −log 𝑝(𝑥),

If 𝑝 𝑥 = 1 ➔ 𝐼 𝑥 = − log 1 = 0
1 1
If 𝑝 𝑥 = ➔ 𝐼 𝑥 = − log = −log 𝑛−1 = log 𝑛
𝑛 𝑛
Shannon entropy
• Expected value of all event information

𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥𝑖 log 𝑃(𝑥𝑖 )

• Coins with 50% head and 50% tails

𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥

= − 0.5 ∗ log 0.5 + 0.5 ∗ log 0.5

= −log 0.5

= −(−log 2) = 0.693
Shannon entropy
• Expected value of all event information
▪ When information uncertainty disappears, entropy decreases

Ref: Pattern Recognition and Machine Learning, C.M.Bishop

Ref: Wikipedia entropy (information theory)

Shannon entropy
• Example
▪ In a match between A soccer team and B soccer team, there is a 99% chance that A will win

A wins: −log 𝑃 𝑋 = − log 0.99 = 0.01

B wins: −log 𝑃 𝑋 = − log 0.01 = 4.6

▪ Event/information that B win is more surprising (460 times) than A wins

𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥

= −(0.99 ∗ log 0.99 + 0.01 ∗ log 0.01)

= 0.056
Shannon entropy
• Average amount of information in a match between Team A and Team B
(99% chance that A will win)
𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥
= − 0.99 ∗ log 0.99 + 0.01 ∗ log 0.01
= 0.056

• When Team A and Team C play a match, it is impossible to predict who will win
(50% chance that A will win)

𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥

= −(0.5 ∗ log 0.5 + 0.5 ∗ log 0.5)

= 0.693

• Average degree of surprise = degree of uncertainty

Others (entropy)
• KL-Divergence (Kullback-Leibler divergence), relative entropy
▪ How different my predicted distribution is from the actual distribution

▪ 𝑄(𝑥) is the prediction, 𝑃(𝑥) is the actual probability

KL-divergence = 𝐸(−log (𝑄 𝑥 )) − 𝐸(− log (𝑃 𝑥 ))

Others (entropy)
• Cross entropy (example)

𝐸 − log 𝑄 𝑥 = −σ𝑃 𝑥 ∗ log(𝑄 𝑥 )

= −(𝑃 𝑇𝑒𝑎𝑚 𝐴 𝑤𝑖𝑛𝑠 ∗ log(𝑄(𝑇𝑒𝑎𝑚 𝐴 𝑤𝑖𝑙𝑙 𝑤𝑖𝑛)) + 𝑃 𝑇𝑒𝑎𝑚 𝐵 𝑤𝑖𝑛𝑠 ∗ log(𝑄 𝑇𝑒𝑎𝑚𝐵 𝑤𝑖𝑙𝑙 𝑤𝑖𝑛 ))

▪ Read images and classify them into three classes: dog/cat/fish

▪ 𝑄(𝑥) predicted observation probabilities are 0.2 for dogs, 0.3 for cats, and 0.5 for fish.

▪ 𝑃(𝑥) the actual distribution is 0 for dogs, 0 for cats, and 1 for fish

𝐸(− log(𝑄 𝑥 )) = −σ(𝑃 𝑥 ∗ log(𝑄 𝑥 )

= −(𝑃 𝑎𝑐𝑡𝑢𝑎𝑙 𝑑𝑜𝑔 ∗ log(𝑄 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑑𝑜𝑔 )+𝑃 𝑎𝑐𝑎𝑡 ∗ log(𝑄 𝑝𝑐𝑎𝑡 )+𝑃 𝑎𝑓𝑖𝑠ℎ ∗ log(𝑄 𝑝𝑓𝑖𝑠ℎ ))

= −log(0.5)
Entropy down = uncertainty down
• Cross entropy can be used as a loss function

• Reduce the entropy of AI models

▪ Reduce the uncertainty of artificial intelligence models

Situation Cross Entropy

Correct prediction + high confidence Low (Good)
Correct prediction + low confidence Slightly high
Wrong prediction + low confidence Moderate
Wrong prediction + high confidence (on wrong class) Very high (Worst)
Entropy Coding
• Suppose sentences are generated from 4 alphabets: A, B, C, and D, according to the
probability distributions, given by

A B C D
P(x) 0.5 0.25 0.125 0.125

• We want to compress sentences using a code of 0 and 1, if so, how many bits are
needed per alphabet?

A B C D
Uniform 00 01 10 11
Entropy 0 10 110 111

▪ Entropy coding required 1.75bits // traditional coding is 2bits for communication

Entropy Coding

A B C D
P(x) 0.5 0.25 0.125 0.125
codewords 0 10 110 111

• Entropy coding required 1.75bits // traditional coding is 2bits for communication

Entropy coding : alphabet

26 alphabet : 5 bit vs. 12 bit

Communication data is reduced into 1/3

Computational Thinking &
Artificial Intelligence
Thanks for your attention.

Kyoungwon Seo
Dept. of Applied Artificial Intelligence
[email protected]

Hennig 2021 Probabilistic Machine Learning
No ratings yet
Hennig 2021 Probabilistic Machine Learning
189 pages
Unit 3 Bayesian Concept Learning
No ratings yet
Unit 3 Bayesian Concept Learning
66 pages
Unit 2 - Probabilistic Reasoning
No ratings yet
Unit 2 - Probabilistic Reasoning
25 pages
Calculus Early Transcendentals 3rd Edition Briggs Full Download
100% (1)
Calculus Early Transcendentals 3rd Edition Briggs Full Download
409 pages
Boundless Algebra PDF
No ratings yet
Boundless Algebra PDF
465 pages
Leon-Garcia-IPPR - Chapters 1-6
No ratings yet
Leon-Garcia-IPPR - Chapters 1-6
180 pages
Introduction to Gambling Theory – Know the Odds!
From Everand
Introduction to Gambling Theory – Know the Odds!
stanbook449
3.5/5 (2)
Aiml CS3491 Cse
No ratings yet
Aiml CS3491 Cse
91 pages
Probabilistic Reasoning
No ratings yet
Probabilistic Reasoning
23 pages
Jee 2023 Last Minute Revision Maths Final
No ratings yet
Jee 2023 Last Minute Revision Maths Final
245 pages
Fall 2019 Prob Review
No ratings yet
Fall 2019 Prob Review
33 pages
Announcements
0% (1)
Announcements
29 pages
Unit 3
No ratings yet
Unit 3
68 pages
AL3391 AI UNIT 5 NOTES EduEngg
100% (1)
AL3391 AI UNIT 5 NOTES EduEngg
26 pages
Uncertainity and Knowledge Engineering
No ratings yet
Uncertainity and Knowledge Engineering
24 pages
Chapter 5 - Uncertain Knowledge and Reasoning
No ratings yet
Chapter 5 - Uncertain Knowledge and Reasoning
29 pages
SCSA3015 Deep Learning Unit 2 PDF
No ratings yet
SCSA3015 Deep Learning Unit 2 PDF
32 pages
Ai (It) Unit-3
No ratings yet
Ai (It) Unit-3
85 pages
Unit-Iii Knowledge & Reasoning
No ratings yet
Unit-Iii Knowledge & Reasoning
35 pages
Unit 6
No ratings yet
Unit 6
47 pages
Riesz Sequence Lebesgue Integral 120106
No ratings yet
Riesz Sequence Lebesgue Integral 120106
71 pages
Chapter 4 Probability
No ratings yet
Chapter 4 Probability
30 pages
L4 Naive Bayes
No ratings yet
L4 Naive Bayes
31 pages
8 - Probability
No ratings yet
8 - Probability
54 pages
Gauteng Prelims Paper 1 2017 MEMO
100% (6)
Gauteng Prelims Paper 1 2017 MEMO
14 pages
AI CSE Unit - 3 First Half
No ratings yet
AI CSE Unit - 3 First Half
51 pages
Presentation2 - Probability Theory
No ratings yet
Presentation2 - Probability Theory
28 pages
Unit6 Uncertain
No ratings yet
Unit6 Uncertain
35 pages
Module V - v1
No ratings yet
Module V - v1
58 pages
Lec-1 Probabilistic Models
No ratings yet
Lec-1 Probabilistic Models
29 pages
Unit Ii
No ratings yet
Unit Ii
30 pages
Al3391-Unit 5
No ratings yet
Al3391-Unit 5
23 pages
Pseudocode - Edexcel
No ratings yet
Pseudocode - Edexcel
8 pages
Unit2 AI & ML
No ratings yet
Unit2 AI & ML
29 pages
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
No ratings yet
Introduction To Bayesian Learning: Aaron Hertzmann University of Toronto SIGGRAPH 2004 Tutorial
141 pages
Chapter 5 - Machine Learning
No ratings yet
Chapter 5 - Machine Learning
59 pages
AIML Module 3,4
No ratings yet
AIML Module 3,4
16 pages
Module 4
No ratings yet
Module 4
15 pages
Linear Inequalities
100% (2)
Linear Inequalities
4 pages
Uncertainty
No ratings yet
Uncertainty
21 pages
AIML - Module 4 - Updated
No ratings yet
AIML - Module 4 - Updated
41 pages
Quantum Information: Stephen M. Barnett
No ratings yet
Quantum Information: Stephen M. Barnett
60 pages
SD Bayes Theorem 1
No ratings yet
SD Bayes Theorem 1
35 pages
Ai Lecture10
No ratings yet
Ai Lecture10
20 pages
3 Industry Calculations 3
No ratings yet
3 Industry Calculations 3
24 pages
Topic - 8 (Uncertainty)
No ratings yet
Topic - 8 (Uncertainty)
25 pages
Module14 InformationTheoryandEntropy
No ratings yet
Module14 InformationTheoryandEntropy
24 pages
7 Statistical Reasoning
No ratings yet
7 Statistical Reasoning
21 pages
Chapter Five AI
No ratings yet
Chapter Five AI
30 pages
Probabilistic Reasoning: Unit-V
No ratings yet
Probabilistic Reasoning: Unit-V
33 pages
Unit Ii
No ratings yet
Unit Ii
23 pages
Bayesian
No ratings yet
Bayesian
91 pages
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
No ratings yet
Acting Under Uncertainty - Bayesian Inference-Probabilistic Reasoning
22 pages
Aies Unit 3
No ratings yet
Aies Unit 3
11 pages
Quantifying Uncertainty: Week 5
No ratings yet
Quantifying Uncertainty: Week 5
38 pages
Unit-3 Ai
No ratings yet
Unit-3 Ai
24 pages
ML Physics
No ratings yet
ML Physics
24 pages
2MLIntrodpart 2
No ratings yet
2MLIntrodpart 2
42 pages
5 - CSE3013 - Uncertainity and Knowledge Engineering
No ratings yet
5 - CSE3013 - Uncertainity and Knowledge Engineering
24 pages
5 - Uncertainty and Knowledge Reasoning
No ratings yet
5 - Uncertainty and Knowledge Reasoning
33 pages
18CS71 Module 4
No ratings yet
18CS71 Module 4
30 pages
Probability Theory - Towards Data Science
No ratings yet
Probability Theory - Towards Data Science
19 pages
Probability and Information Theory PDF
No ratings yet
Probability and Information Theory PDF
2 pages
Lecture 3
No ratings yet
Lecture 3
6 pages
Exercise 1 1680070237
No ratings yet
Exercise 1 1680070237
19 pages
Cs3351 Aiml Unit 2 Notes Eduengg
No ratings yet
Cs3351 Aiml Unit 2 Notes Eduengg
26 pages
Probability Theory Research
No ratings yet
Probability Theory Research
6 pages
p3 Complex Numbers Notes
No ratings yet
p3 Complex Numbers Notes
12 pages
Bayesian Learning: Salma Itagi, Svit
No ratings yet
Bayesian Learning: Salma Itagi, Svit
14 pages
Probabilistic Reasoning in Artificial Intelligence
No ratings yet
Probabilistic Reasoning in Artificial Intelligence
5 pages
FEU EAC ITES103 ITEI103 Flowcharting and Pseudocoding StudVersion PDF
No ratings yet
FEU EAC ITES103 ITEI103 Flowcharting and Pseudocoding StudVersion PDF
48 pages
Pps Lab External Programs For Students
No ratings yet
Pps Lab External Programs For Students
3 pages
Degenerate Circles: Problems
No ratings yet
Degenerate Circles: Problems
4 pages
Sol03 Landau Levels
No ratings yet
Sol03 Landau Levels
8 pages
Algebra Gallian Notes
100% (1)
Algebra Gallian Notes
4 pages
Engineering Mathematics I
No ratings yet
Engineering Mathematics I
4 pages
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer S3 2007
No ratings yet
160.103 METHODS OF MATHEMATICS - Massey - Exam - Summer S3 2007
6 pages
ABM Specialized Subjects
No ratings yet
ABM Specialized Subjects
2 pages
Chapter 2 PPT Num.I.pptxxxxxx New
No ratings yet
Chapter 2 PPT Num.I.pptxxxxxx New
107 pages
Paper 3 Physics Practical s4 Guide Notes
No ratings yet
Paper 3 Physics Practical s4 Guide Notes
8 pages
Most Important MCQs Class 9th For Board Exam 2025
No ratings yet
Most Important MCQs Class 9th For Board Exam 2025
14 pages
03 Day3 Algebra Bridging
No ratings yet
03 Day3 Algebra Bridging
19 pages
Sufficient Statistics
No ratings yet
Sufficient Statistics
22 pages
Week 2. Understanding Angles in A Rectangular Coordinate System
No ratings yet
Week 2. Understanding Angles in A Rectangular Coordinate System
6 pages
Chap 8.1 - Review of Integration by Substitution
No ratings yet
Chap 8.1 - Review of Integration by Substitution
9 pages
12th Class Syllabus Math
No ratings yet
12th Class Syllabus Math
6 pages
Ketland Second Order Logic
No ratings yet
Ketland Second Order Logic
3 pages
Algebra II Test 2
No ratings yet
Algebra II Test 2
5 pages
Daubechies
No ratings yet
Daubechies
11 pages
Geometria Esferica
No ratings yet
Geometria Esferica
1 page

Computational Thinking & Artificial Intelligence: (5 Week, Probability)

Uploaded by

Computational Thinking & Artificial Intelligence: (5 Week, Probability)

Uploaded by

Computational Thinking &

• Understanding Probability Theory

• 2nd Lecture: Matrix 10th Lecture: Image Generation Model

• 4th Lecture: Perceptron 12th Lecture: Language Generation Model

• 5th Lecture: Probability 13th Lecture: Practice: Language Generation Model

• 6th Lecture: Learning Techniques 14th Lecture: AI and Ethics

• 7th Lecture: Deep Learning 15th Lecture: Make-Up Lecture

Ref. Biography joseph Jagger

▪ Bayesian: Calculate posterior probabilities based on prior probabilities

Event A: People who show certain symptoms

𝑃 𝐴 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝑎𝑖𝑟 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑖𝑛𝑔 = 0.3

𝑃 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜 𝐶𝐷 𝑝𝑙𝑎𝑦𝑒𝑟 = 0.6

▪ 𝑷(𝑨) prior probability:

▪ 𝑷(𝑨|𝑩) posterior probability:

𝑃 𝑐∩𝑝 𝑃 𝑝|𝑐 𝑃(𝑐) 0.9 ∗ 0.006

▪ 𝑥: observed probability (train data)

• Coin toss (fair or unfair coin)

▪ 𝑝 𝐻𝑒𝑎𝑑|𝑂𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 > 0.5

▪ Decreased uncertainty when 20% head and 80% tails

▪ Uncertainty disappears when the back side is 100%

▪ How can we quantify this?

• Given an event 𝑝(𝑥), the self-information is defined by

𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥𝑖 log 𝑃(𝑥𝑖 )

• Coins with 50% head and 50% tails

= − 0.5 ∗ log 0.5 + 0.5 ∗ log 0.5

Ref: Pattern Recognition and Machine Learning, C.M.Bishop

Ref: Wikipedia entropy (information theory)

A wins: −log 𝑃 𝑋 = − log 0.99 = 0.01

B wins: −log 𝑃 𝑋 = − log 0.01 = 4.6

▪ Event/information that B win is more surprising (460 times) than A wins

= −(0.99 ∗ log 0.99 + 0.01 ∗ log 0.01)

= −(0.5 ∗ log 0.5 + 0.5 ∗ log 0.5)

• Average degree of surprise = degree of uncertainty

▪ 𝑄(𝑥) is the prediction, 𝑃(𝑥) is the actual probability

KL-divergence = 𝐸(−log (𝑄 𝑥 )) − 𝐸(− log (𝑃 𝑥 ))

𝐸 − log 𝑄 𝑥 = −σ𝑃 𝑥 ∗ log(𝑄 𝑥 )

▪ Read images and classify them into three classes: dog/cat/fish

𝐸(− log(𝑄 𝑥 )) = −σ(𝑃 𝑥 ∗ log(𝑄 𝑥 )

• Reduce the entropy of AI models

Situation Cross Entropy

▪ Entropy coding required 1.75bits // traditional coding is 2bits for communication

• Entropy coding required 1.75bits // traditional coding is 2bits for communication

26 alphabet : 5 bit vs. 12 bit

Communication data is reduced into 1/3

You might also like