0% found this document useful (0 votes)
53 views

COMP2050-Lecture 22 - Machine Learning

Machine Learning in AI

Uploaded by

azanetranclc17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
53 views

COMP2050-Lecture 22 - Machine Learning

Machine Learning in AI

Uploaded by

azanetranclc17
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Machine Learning

COMP2050 - Artificial Intelligence


KHOA D. DOAN
[email protected]

Book Office Hour | Course Website


Slides adapted from/based on UC Berkeley CS 188, 2022
What is Learning?
● Learning is the process of acquiring some expertise from experience

What breed
is it?

2
Why Machine Learning?

learning estimation
Where does it come from?
learning structure

3
Types of Learning
● Supervised Learning: correct answers for each training instance

Gene Y’s Expression


Sale Price

Square Meters Gene X’s Expression

4
Types of Learning
● Supervised Learning: correct answers for each training instance
● Unsupervised Learning: find interesting patterns in data

5
Types of Learning
● Supervised Learning: correct answers for each training instance
● Unsupervised Learning: find interesting patterns in data
● Reinforcement learning: reward sequence, no correct answers

6
What is Learning?
● Learning is the process of acquiring some expertise from experience

What breed
is it?

● Most central problem?

7
What is Learning?
● Learning is the process of acquiring some expertise from experience

What breed
is it?

● Most central problem: generalization


○ How to abstract from “training” examples to “test” examples.
○ Analogy with human learning?

8
Training and Testing

9
Example: Spam Filter
● Input: an email
● Output: spam/ham
● Setup:
○ Get a large collection of example
emails, each labeled“spam” or “ham”
○ Note: someone has to hand label all
this data!
○ Want to learn to predict labels of new,
future emails
● Features: The attributes used to make the
ham / spam decision
○ Words: FREE!
○ Text Patterns: $dd, CAPS
○ Non-text: SenderInContacts, WidelyBroadcast
10
○ …
Model-Based Classification
● Model-based approach
○ Build a model (e.g. Bayes’ net) where
both the label and features are
random variables
○ Instantiate any observed features
○ Query for the distribution of the label
conditioned on the features

● Challenges
○ What structure should the BN have?
○ How should we learn its parameters?

11
Naïve Bayes for Text
● Bag-of-words Naïve Bayes:
○ Features: Wi is the word at position i
○ As before: predict label conditioned on feature variables
(spam vs. ham)
○ As before: assume features are conditionally independent
given label
● Generative model:

12
Naïve Bayes for Text
● Bag-of-words Naïve Bayes:
○ Features: Wi is the word at position i
○ As before: predict label conditioned on feature variables
(spam vs. ham)
○ As before: assume features are conditionally independent
given label
● Generative model:

● Prediction:

13
Naïve Bayes for Text: Parameters
● Model

● What are the parameters?

14
Naïve Bayes for Text: Parameters
● Model

● What are the parameters?

15
Parameter Estimation

16
Parameter Estimation with Maximum Likelihood
● Estimating the distribution of a random variable
● Empirically: use training data (learning!)
○ E.g.: for each outcome x, look at the empirical rate of that value:

○ This is the estimate that maximizes the likelihood of the data

17
General Case: n outcomes
● P(Heads) = q, P(Tails) = 1-q

● Flips are i.i.d.:


○ Independent events
○ Identically distributed according to unknown distribution
○ Sequence D of 𝛂H Heads and 𝛂T Tails

18
Parameter Estimation with Maximum Likelihood
● Data: Observed set D of 𝛂H Heads and 𝛂T Tails
● Hypothesis space: Binomial distributions
● Learning: finding q is an optimization problem
○ What’s the objective function?

● MLE: Choose q to maximize probability of D

19
Parameter Estimation with Maximum Likelihood

● Set derivative to zero, and solve!

20
Maximum Likelihood for Naïve Bayes Spam Classifier
● Model:
○ Random variable Fi = 1 if i’th dictionary word is present in email
○ Random variable Y is in {spam, ham} depending on email label
● Data D:
○ N emails with NH ”hams” and NS “spams”
○ fi(j) = 1 if i’th word appeared in email j
● Parameters:
○ Probability tables P(Y) and P(Fi | Y)
○ Collectively call them both θ
● MLE: Choose q to maximize probability of D

21
Maximum Likelihood for Naïve Bayes Spam Classifier*
● Let’s find single parameter P(Fi | Y = ham) (this will be our θ):
○ Denote L(θ) = P(D | θ) for ease of notation

22
Maximum Likelihood for Naïve Bayes Spam Classifier*

23
Maximum Likelihood for Naïve Bayes Spam Classifier *

P(Fi | Y = ham):
24
Parameter Estimation with Maximum Likelihood
● How do we estimate the conditional probability tables?
○ Maximum Likelihood, which corresponds to counting
● Need to be careful though … let’s see what can go wrong?

25
Underfitting and Overfitting

26
Example: Overfitting
P(features, C=spam) P(features, C=ham)

P(C=spam) = 0.5 P(C=spam) != 0.5


P(“we’ve” | C=spam) = 0.1 We've P(“we’ve” | C!=spam) = 0.8
P(“updated” | C=spam) = 0.2 updated P(“updated” | C!=spam) = 0.7
our
login
credential
policy.
Please
confirm
your
account
by
logging
into
P(“Google” | C=spam) = 0.3 Google P(“Google” | C!=spam) = 0.0
Docs.
27
What went wrong?
Generalization and Overfitting
● Problems with relative-frequency parameters
○ Unlikely to see occurrences of every words in training data.
○ Likely to see occurrences of a word for only 1 class in training data.

● What exactly is learning?

● Learning is to generalize
○ Want a classifier which does well on test data
○ Overfitting: fitting the training data very closely,
but not doing well on test data
○ Underfitting: fits the training set poorly

28
Smoothing

29
Laplace Smoothing
● Laplace’s estimate:
○ Pretend you saw every outcome once more
than you actually did

○ Can derive this estimate with Dirichlet priors

30
Laplace Smoothing
● Laplace’s estimate (extended):
○ Pretend you saw every outcome k extra times

○ What’s Laplace with k = 0?


○ k is the strength of the prior

● Laplace for conditionals:


○ Smooth each condition independently:

31
Course Conclusion

32
Applications of Deep Reinforcement Learning: Go

33
Applications of Deep Reinforcement Learning: Go
Just MiniMax Search?

34
Exhaustive Search?

35
Reducing depth with value network

36
Value network

37
Reducing breadth with policy network

38
Policy network

39
AlphaGo: neural network training pipeline

40
Robotics

41
AI Ethics Ever More Important
● Why?

42
AI Ethics Ever More Important
● Why?
○ AI is making decisions, at scale
○ Any kind of issues (e.g. bias or malignant use) could significantly affect
people
● Many open questions:
○ Who is responsible?
○ How to diagnose and prevent?

43
Some Key AI Ethics Topics
● Disinformation
● Bias and fairness
● Privacy and surveillance
● Metrics
● Algorithmic colonialism

44
What will be AI’s impact in the future?
● You get to determine that!
● As you apply AI
● As researchers / developers
● As auditors and regulators
● As informed public voices

45
Where to Go Next?
● Machine Learning: COMP3020
● Data Mining: COMP4040
● Several online resources
○ The Batch: https://www.deeplearning.ai/thebatch/
○ Import AI: https://jack-clark.net/
○ AI Ethics course: ethics.fast.ai
○ The Robot Brains Podcast: https://therobotbrains.ai
○ Computer Vision, NLP, Optimization, Reinforcement Learning, Neural
Science, Cognitive Modeling…
● UROP Projects

46
THANK YOU!

Good luck on the exam/projects and have a nice summer!


See you around!

47

You might also like