Computational Thinking & Artificial Intelligence: (5 Week, Probability)
Computational Thinking & Artificial Intelligence: (5 Week, Probability)
Artificial Intelligence
(5th Week, Probability)
Kyoungwon Seo
Dept. of Applied Artificial Intelligence
[email protected]
Lecture
• 3rd Lecture: Vector and Matrix 11th Lecture: Practice: Image Generation Model
• 8th Lecture: Deep Learning Library & 16th Lecture: Final Term Exam
Mid-term Exam.
Probability Theory
Probability theory
• What is the role of probability theory in AI ?
▪ Tools to make better decisions in uncertainty by estimating parameters
Navigation
Uncertainty
Person
Identification
AI Speaker
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ Frequentist: uses maximum likelihood estimation (MLE)
▪ Law of large numbers: a theorem that describes the result of performing the same experiment,
a large number of times
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ In 1830, Joseph Jagger discovered a bias (9 numbers work well) in the Beaux-Arts casino roulette
(0-36 numbers) in Monte Carlo, Monaco, where a $1 stake could win $35
36 1
▪ Original expected value = −$1 ∗ + $35 ∗ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 − 2.7𝑐𝑒𝑛𝑡
37 37
35.8 1.2
▪ Biased expected value = −$1 ∗ + $35 ∗ = 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 + 16.8𝑐𝑒𝑛𝑡 (bias = 1.2)
37 37
𝑷 𝑨∩𝑩 𝒃
▪ Conditional probability: 𝑷 𝑩 𝑨 = = (𝑷 𝑨 > 𝟎)
𝑷𝑨 𝒂+𝒃
Two perspectives in the probability theory
• Frequentist probability vs. Bayesian probability
▪ Conditional probability question: 70% of all used cars have air conditioning, and 40% have a
CD player. If 90% of all used cars own at least one of the two, what is the probability that used
cars without air conditioning will not have a CD player?
▪ Answer:
𝑃 𝐴 ∩ 𝐵 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑛𝑒𝑖𝑡ℎ𝑒𝑟 𝑎𝑖𝑟 𝑐𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛𝑖𝑛𝑔 𝑛𝑜𝑟 𝐶𝐷 𝑝𝑙𝑎𝑦𝑒𝑟𝑠 𝑎𝑟𝑒 𝑝𝑟𝑒𝑠𝑒𝑛𝑡 = 0.1
𝑃 𝐵∩𝐴 0.1 1
𝑃 𝐵|𝐴 = = =
𝑃 𝐴 0.3 3
Bayesian probability
• Bayes’ theorem
▪ A theorem that expresses the relationship between prior and posterior probabilities
𝑃 𝐵𝐴 𝑃 𝐴
𝑃 𝐴𝐵 =
𝑃 𝐵
𝑃 𝐴∩𝐵
𝑃 𝐵𝐴 = , 𝑃 𝐵 𝐴 𝑃 𝐴 = 𝑃 𝐴 ∩ 𝐵 = 𝑃 𝐴 𝐵 𝑃(𝐵)
𝑃 𝐴
▪ Answer:
𝑃 𝑐 = 0.006, 𝑃 ℎ = 0.994
𝑃 𝑝∩𝑐 𝑃 𝑝∩ℎ
𝑃 𝑝𝑐 = = 0.9, 𝑃 𝑝ℎ = = 0.05
𝑃 𝑐 𝑃 ℎ
𝑃 𝑥 ℎ 𝑃(ℎ)
𝑃 ℎ𝑥 =
𝑃 𝑥
▪ previous outcomes are all head for 𝑛 − 1 times, you don’t know whether a coin is fair or not
Parameters are represented by single, fixed values Parameters are represented by distribution
Information Theory
Information theory
• What is the role of information theory in AI ?
▪ Applied mathematics to quantify the amount of information
Claude E. Shannon
Information theory
• What is the role of information theory in AI?
▪ Describe the amount of information in relation to the probability
that represents the degree of uncertainty
▪ Highest uncertainty when tossing a coin has a 50% chance of heads and 50% tails
If 𝑝 𝑥 = 1 ➔ 𝐼 𝑥 = − log 1 = 0
1 1
If 𝑝 𝑥 = ➔ 𝐼 𝑥 = − log = −log 𝑛−1 = log 𝑛
𝑛 𝑛
Shannon entropy
• Expected value of all event information
𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥
= −log 0.5
= −(−log 2) = 0.693
Shannon entropy
• Expected value of all event information
▪ When information uncertainty disappears, entropy decreases
𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥
= 0.056
Shannon entropy
• Average amount of information in a match between Team A and Team B
(99% chance that A will win)
𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥
= − 0.99 ∗ log 0.99 + 0.01 ∗ log 0.01
= 0.056
• When Team A and Team C play a match, it is impossible to predict who will win
(50% chance that A will win)
𝐻 𝑋 = − σ𝑛𝑖=1 𝑃 𝑥 log 𝑃 𝑥
= 0.693
= −(𝑃 𝑇𝑒𝑎𝑚 𝐴 𝑤𝑖𝑛𝑠 ∗ log(𝑄(𝑇𝑒𝑎𝑚 𝐴 𝑤𝑖𝑙𝑙 𝑤𝑖𝑛)) + 𝑃 𝑇𝑒𝑎𝑚 𝐵 𝑤𝑖𝑛𝑠 ∗ log(𝑄 𝑇𝑒𝑎𝑚𝐵 𝑤𝑖𝑙𝑙 𝑤𝑖𝑛 ))
▪ 𝑄(𝑥) predicted observation probabilities are 0.2 for dogs, 0.3 for cats, and 0.5 for fish.
▪ 𝑃(𝑥) the actual distribution is 0 for dogs, 0 for cats, and 1 for fish
= −(𝑃 𝑎𝑐𝑡𝑢𝑎𝑙 𝑑𝑜𝑔 ∗ log(𝑄 𝑝𝑟𝑒𝑑𝑖𝑐𝑡𝑒𝑑 𝑑𝑜𝑔 )+𝑃 𝑎𝑐𝑎𝑡 ∗ log(𝑄 𝑝𝑐𝑎𝑡 )+𝑃 𝑎𝑓𝑖𝑠ℎ ∗ log(𝑄 𝑝𝑓𝑖𝑠ℎ ))
= −log(0.5)
Entropy down = uncertainty down
• Cross entropy can be used as a loss function
A B C D
P(x) 0.5 0.25 0.125 0.125
• We want to compress sentences using a code of 0 and 1, if so, how many bits are
needed per alphabet?
A B C D
Uniform 00 01 10 11
Entropy 0 10 110 111
A B C D
P(x) 0.5 0.25 0.125 0.125
codewords 0 10 110 111
Kyoungwon Seo
Dept. of Applied Artificial Intelligence
[email protected]