0% found this document useful (0 votes)
22 views77 pages

1 Introduction

Uploaded by

damasodra33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views77 pages

1 Introduction

Uploaded by

damasodra33
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 77

CSCE-421 Machine Learning

1. Introduction

Instructor: Guni Sharon, classes: TR 3:55-5:10, HRBB 124 1


Course website
• https://people.engr.tamu.edu/guni/csce421
• Syllabus
• Schedule
• Lecture slides + recording
• Assignments
• Course policies
• https://canvas.tamu.edu/courses/45914
• Quizzes
• Assignment submissions

2
Course staff

Instructor Guni Sharon TA Sheelabhadra "Sheel" Dey


[email protected] [email protected]
Email address Email address
Tuesday, 9-10am Friday, noon - 1:00 pm
Office hours Office hours
https://tamu.zoom.us/j/ https://tamu.zoom.us/j/2493282145
Office 4986264842 Office

• In case you cannot make these office hours due to conflicting


schedule, send me an email specifying at least 3 time windows that
work for you and I will set a meeting accordingly

3
Prerequisites
• Proficiency in Python. All class assignments will be in Python. See that your level of
knowledge is sufficient by going over the following tutorials
• https://docs.python.org/3/tutorial/
• https://docs.scipy.org/doc/numpy/user/quickstart.html

• College Level Calculus, Linear Algebra. You should be comfortable taking derivatives
and understanding matrix/vector operations and notation
• See https://people.engr.tamu.edu/guni/csce689/files/linalg.pdf
• Basic Probability and Statistics. You should be familiar with basics of probabilities,
Gaussian distributions, mean, standard deviation, etc.
• See https://people.engr.tamu.edu/guni/csce689/files/prob.pdf
• Foundations of Machine Learning. We will be formulating cost functions, taking
derivatives and performing optimization with gradient descent. Some optimization
tricks will be more intuitive with some knowledge of convex optimization
4
Programming assignments (40%)
• You will implement common ML algorithms in Python
• You will be graded separately on each algorithm
• Note that not all assignments bare the same weight in the final grade
• Individual submissions
• We will use Measure Of Software Similarity (MOSS) to detect
plagiarism
• https://theory.stanford.edu/~aiken/moss/

5
Assignment (P0)
• Assignment (P0) is now available through the course website
• Submit through Canvas by Monday, Sep 6
• Make It or Break It!

6
Written assignments (10%)
• Not all assignments bare the same weight in the final grade
• Submit in couples
• We will use the Canvas built in tool to detect plagiarism

7
Late submission policy
• You can use 6 late days
• A late day extends the deadline by 24 hours
• You are allowed up to 2 late days per assignment
• If you hand an assignment in after 48 hours, it will be worth 50%
• No credit will be given to assignments handed in after 72 hours
• For joint submissions, group members cannot pool late days: in other
words, to use 1 late day all group members must allocate 1 late day.
• Use late days only when truly needed, applying for more than 6 late
days will require you to justify all of them
8
Midterm (40%)
• In class midterm exam will take place towards the end of the
semester
• Might change to an online exam based on infection levels

9
Online quizzes (10%)
• Short (~15 minutes) quizzes on Canvas
• Quizzes will mostly include multiple-choice questions covering main
concepts that were discussed in class
• These quizzes will give you a good idea of your level of understanding
• You will have 6 days to take each quiz. Late days cannot be used.

10
COVID-19 regulations
• Remote participation is NOT allowed
• If sick/showing symptoms, don’t come to class
• Recorded lectures will be uploaded to the website
• There is a Zoom session per class on Canvas
• For recording purposes
• Blocked for participants

11
COVID-19 your safety

The Guardian

• “Face covering is strongly encouraged”


• Avoid talking inside
• I’ll talk behind mask + Plexiglas
• Send questions over Hangout chat to [email protected] (I’m open to other options)
12
No questions after class
• The university daycare closes at 5:30
• I need to leave immediately after class

13
Inference from probabilities: warmup
• In Brazos county 169 individuals are infected with COVD-19 per day
on average
• Population: 229,211
• 0.07373% of the population is infected daily
• Assume that (95) CSCE421 are drawn from the same population
• What is the probability that no CSCE421 student will be infected
during the (100 day) semester?

• Please protect yourselves and others

14
Traditional CS

Data
[Courses, lecture rooms]
Output
[course-time-room assignment]
Program
[Assignment algorithm]

15
Example by Dr. Kilian Weinberger
New problem

Data
[images of dogs and cats]
Output
[Image shows a dog (yes/no)]
Program
[?]

16
Example by Dr. Kilian Weinberger
Machine (supervised) learning

Data
[images of dogs and cats]
Program
[?]

Output
[Image shows a dog (yes/no)]

17
Example by Dr. Kilian Weinberger
Machine (supervised) learning
Testing
Training New
Data
Data Output

Program
Output

18
Example by Dr. Kilian Weinberger
19
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
• Drive safely along a curving road
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
• Drive safely along a curving road
• Translate spoken Chinese into spoken English in real time
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
• Play soccer on a human level
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
• Play soccer on a human level
• Write an intentionally funny story
[Shank, Tale-Spin System, 1984]
• One day Joe Bear was hungry. He asked his friend Irving Bird
where some honey was. Irving told him there was a beehive in
the oak tree. Joe walked to the oak tree. He ate the beehive. The
End.
• Henry Squirrel was thirsty. He walked over to the river bank
where his good friend Bill Bird was sitting. Henry slipped and fell
in the river. Gravity drowned. The End.
• Once upon a time there was a dishonest fox and a vain crow. One
day the crow was sitting in his tree, holding a piece of cheese in
his mouth. He noticed that he was holding the piece of cheese.
He became hungry, and swallowed the cheese. The fox walked
over to the crow. The End.
Back to the future… ANN
ML vs Humans – round 2
• 1997: Deep Blue vs. Kasparov
• First match won against world champion
• “Intelligent creative” play
• 200 million board positions per second
• Humans understood 99.9 of Deep Blue's moves
• Can do about the same now with commodity parts
• 1996: Kasparov beats Deep Blue: “I could feel --- I could smell --- a new kind of intelligence
across the table.”
• 1997: Deep Blue beats Kasparaov: “Deep Blue hasn’t proven anything.”
• Open question:
• How does human cognition deal with the search space explosion of chess?
• Or: how can humans compete with computers at all???
• 2016: AlphaGo beats Lee Sedol – huge advance: sparse rollouts and self-play
ML’s role in our life
• Applied AI automates all kinds of things
• Search engines
• Route planning, e.g. real-time navigation
• Logistics, e.g. packages, inventory
• Medical diagnosis
• Automated help desks
• Spam / fraud detection
• Smarter devices, e.g. cameras, cars
• Product recommendations
• Personal advertising
• … Lots more!
Is ML/AI dangerous?
• Eilon Musk called AI humanity’s “biggest existential threat” and compared it
to “summoning the demon.”…“As AI gets probably much smarter than
humans, the relative intelligence ratio is probably similar to that between a
person and a cat, maybe bigger,” Musk said. “I do think we need to be very
careful about the advancement of AI.”
• Max Tegmark, a physics professor at MIT: “When we got fire and messed up
with it, we invented the fire extinguisher. When we got cars and messed up,
we invented the seat belt, airbag, and traffic light. But with nuclear weapons
and A.I., we don’t want to learn from our mistakes. We want to plan ahead.”
• Nick Bostrom, philosopher at Oxford: “once unfriendly superintelligence
exists, it would prevent us from replacing it or changing its preferences. Our
fate would be sealed.”
Is ML/AI dangerous?
• Sam Altman, co-chair OpenAI: “In the next few decades we are either going to head
toward self-destruction or toward human descendants eventually colonizing the universe.”
• Stephen Hawking: “Success in creating effective AI, could be the biggest event in the
history of our civilization. Or the worst. We just don’t know. So we cannot know if we will
be infinitely helped by AI, or ignored by it and side-lined, or conceivably destroyed by it.” …
“Unless we learn how to prepare for, and avoid, the potential risks, AI could be the worst
event in the history of our civilization. It brings dangers, like powerful autonomous
weapons, or new ways for the few to oppress the many. It could bring great disruption to
our economy.”
• Larry Page: “Artificial intelligence would be the ultimate version of Google. The ultimate
search engine that would understand everything on the web. It would understand exactly
what you wanted, and it would give you the right thing. We're nowhere near doing that
now. However, we can get incrementally closer to that, and that is basically what we work
on.”
Define the right objective/utility
• Objective: reduce poverty
• Action: kill poor people
• Objective: reduce traffic congestion
• Action: block roads
• Objective: increase patient satisfaction
• Action: provide opioids
• Objective: forecast which criminals are most likely to reoffend
• Result: “Through COMPAS, black offenders were seen almost twice as likely as
white offenders to be labeled a higher risk” [Angwin et al; 2016; Garber, 2016; Liptak, 2017]
• COMPAS aids judges with sentencing decisions
What about bugs?
Taking our jobs
• Can ML do your job better than you?
In class questions/discussions
• In-class questions and discussions should be posted on Campuswire
(class code: 6453)

53
Technological advancements
New technology
• Higher productivity
• Less manpower is
required
• Loss of traditional jobs
• New types of jobs
Everybody wins?
AI/ML should benefit all (personal opinion)
• But it doesn’t
• Corporations reap most benefits
from increased productivity
• Can/should we fix this?
ML example (linear classifier)
• Data:
• [x=2D point, y=color]
• hypothesis(x):
• If
• Return ‘blue’
• Else return ‘red’
• Challenge:
• Find best assignment

57
Can we generalize this approach?
• What if our data is not linearly separable?
• Two common approaches
1. Fit a non-linear function … Neural networks, KNN,
Naïve Bayes
2. Bend the state space in a way that allows linear
separability … Kernel Trick

58
General notation
The dataset. is a feature vector from and is the label from .

The distribution from which the data is drawn (unknown). Unless


designated otherwise we will assume that samples are IID
(independent and identically distributed)
Binary classification

Multiclass classification

Regression

59
Feature vector
• Feature vector () can be very large
• E.g., a student’s grade record
• Current hardware can process (and make sense of) very large feature
vectors
• E.g., 8 megapixels image, each pixel is composed of 3 features (red,
green, blue)
• A feature vector is said to be dense if the number of nonzero features
in is large relative to
• A feature vector is said to be sparse if consists of mostly zeros

60
Model hypothesis
• A classification hypothesis maps features
to labels

• Our goal (usually*) is to find such that


• (*anomaly detection)
• Does such an always exist?
• No! we are limited by our model
assumption
• A linear model, for example, cannot capture
a XOR distribution
• Other models might be more flexible yet are
still limited
61
Model hypothesis
• We denote the set of all possible
hypothesis
• Quiz: What is for datatype

• Goal: minimize the expected model


error
• 2 immediate issues

62
Minimize model error

1. We can’t iterate over


permutations, and even if we could,
we don’t know
2. How do we define ?

63
Minimize expected model error

1. We can’t iterate over permutations, and even if we could, we


don’t know
Consider the following identities

1. By definition
2. Law of large numbers
• Corollary: with sufficient samples coming from :

64
Distribution mismatch
• What if our samples come from a different distribution?
• E.g., evaluating a classifier that identifies fraudulent credit card transactions
with mostly samples of fraudulent transactions
• E.g., evaluating a model that identifies broken bones from X-rays obtained
from a geriatric facility
• The expectation approximation will be biased
• This is known as a distribution mismatch
• Should be avoided
• Can be addressed in some cases (e.g., Importance Reweighting, off-
policy evaluation in RL)
65
Loss functions
• How do we define ?
• When defining ML as an optimization problem, we call the error
estimation which is computed over a set of labeled samples using
a ()
• Many options. Task dependent.
• E.g., 2-way classification
• Binary loss:

67
Examples of Loss functions
𝐿𝑜𝑠𝑠
Regression
• Absolute error, Mean-absolute error
(MAE), l1 loss |h ( 𝑥𝑖 ) − 𝑦 𝑖|

• Squared loss, Mean-squared error


(MSE), l2 loss

• Pros/Cons? (Differentiability, outliers)


2
( h ( 𝑥𝑖 ) − 𝑦 𝑖 )

68
Examples of Loss functions

[ ]
𝑥1 • l1 loss?

[ ]
𝑥2 ^ • l2 loss?
𝑦 1=2
𝑥3
^
𝑦 2 =0

[ ]
𝑦 1=2 ^
𝑦 3=1
𝑦 2=1
𝑦 3=3

69
Use vector/matrix notation

• Get used to write in vector/matrix notation


• This is what’s used for (computationally efficient) coding

70
Model fitting as an optimization problem
• We can rewrite our goal:
• As:
• It is actually straightforward to find such a model (with Loss=0)

• Did we solve ML?


• ML objective: generalize knowledge beyond the training examples

71
Training vs testing
• Split the dataset to training and testing sets
• and
• Train on , evaluate on
• Cross validation
• K-fold cross validation

𝑡𝑟𝑎𝑖𝑛 𝑡𝑒𝑠𝑡
𝐷 𝐷

Average test loss =


72
Training vs testing
• Split the dataset to training and testing sets
• and
• Train on , evaluate on
• Cross validation
• K-fold cross validation

𝑡𝑟𝑎𝑖𝑛
𝐷 𝐷
𝑡𝑒𝑠𝑡

Average test loss =


73
Training vs testing
• Split the dataset to training and testing sets
• and
• Train on , evaluate on
• Cross validation
• K-fold cross validation

𝑡𝑟𝑎𝑖𝑛
𝐷 𝐷
𝑡𝑒𝑠𝑡

Average test loss =


74
Course overview
• k-nearest neighbors
• Perceptron (Linear binary classifier)
• Naïve Bayes and Bayesian networks
• Logistic Regression and gradient decent
• Linear regression and SVMs
• Empirical risk minimization and ML debugging
• Kernels
• Decision / Regression Trees

75
Course overview
• Artificial Neural Networks / Deep Learning
• Derivative free optimization
• Unsupervised learning: clustering and features extraction
• Multi-armed bandits
• Markov-decision processes and RL
• Advanced topics
• Variational Inference and Generative Models
• Recurrent neural networks
• Open for suggestions…

76
What next?
• Class: K nearest neighbors
• Assignments:
• Go over tutorials
• Linear algebra
• Basic probability
• Python
• Programing (P0)
• Python
• NumPy
• Pandas

77

You might also like