1 Introduction
1 Introduction
1. Introduction
2
Course staff
3
Prerequisites
• Proficiency in Python. All class assignments will be in Python. See that your level of
knowledge is sufficient by going over the following tutorials
• https://docs.python.org/3/tutorial/
• https://docs.scipy.org/doc/numpy/user/quickstart.html
• College Level Calculus, Linear Algebra. You should be comfortable taking derivatives
and understanding matrix/vector operations and notation
• See https://people.engr.tamu.edu/guni/csce689/files/linalg.pdf
• Basic Probability and Statistics. You should be familiar with basics of probabilities,
Gaussian distributions, mean, standard deviation, etc.
• See https://people.engr.tamu.edu/guni/csce689/files/prob.pdf
• Foundations of Machine Learning. We will be formulating cost functions, taking
derivatives and performing optimization with gradient descent. Some optimization
tricks will be more intuitive with some knowledge of convex optimization
4
Programming assignments (40%)
• You will implement common ML algorithms in Python
• You will be graded separately on each algorithm
• Note that not all assignments bare the same weight in the final grade
• Individual submissions
• We will use Measure Of Software Similarity (MOSS) to detect
plagiarism
• https://theory.stanford.edu/~aiken/moss/
5
Assignment (P0)
• Assignment (P0) is now available through the course website
• Submit through Canvas by Monday, Sep 6
• Make It or Break It!
6
Written assignments (10%)
• Not all assignments bare the same weight in the final grade
• Submit in couples
• We will use the Canvas built in tool to detect plagiarism
7
Late submission policy
• You can use 6 late days
• A late day extends the deadline by 24 hours
• You are allowed up to 2 late days per assignment
• If you hand an assignment in after 48 hours, it will be worth 50%
• No credit will be given to assignments handed in after 72 hours
• For joint submissions, group members cannot pool late days: in other
words, to use 1 late day all group members must allocate 1 late day.
• Use late days only when truly needed, applying for more than 6 late
days will require you to justify all of them
8
Midterm (40%)
• In class midterm exam will take place towards the end of the
semester
• Might change to an online exam based on infection levels
9
Online quizzes (10%)
• Short (~15 minutes) quizzes on Canvas
• Quizzes will mostly include multiple-choice questions covering main
concepts that were discussed in class
• These quizzes will give you a good idea of your level of understanding
• You will have 6 days to take each quiz. Late days cannot be used.
10
COVID-19 regulations
• Remote participation is NOT allowed
• If sick/showing symptoms, don’t come to class
• Recorded lectures will be uploaded to the website
• There is a Zoom session per class on Canvas
• For recording purposes
• Blocked for participants
11
COVID-19 your safety
The Guardian
13
Inference from probabilities: warmup
• In Brazos county 169 individuals are infected with COVD-19 per day
on average
• Population: 229,211
• 0.07373% of the population is infected daily
• Assume that (95) CSCE421 are drawn from the same population
• What is the probability that no CSCE421 student will be infected
during the (100 day) semester?
14
Traditional CS
Data
[Courses, lecture rooms]
Output
[course-time-room assignment]
Program
[Assignment algorithm]
15
Example by Dr. Kilian Weinberger
New problem
Data
[images of dogs and cats]
Output
[Image shows a dog (yes/no)]
Program
[?]
16
Example by Dr. Kilian Weinberger
Machine (supervised) learning
Data
[images of dogs and cats]
Program
[?]
Output
[Image shows a dog (yes/no)]
17
Example by Dr. Kilian Weinberger
Machine (supervised) learning
Testing
Training New
Data
Data Output
Program
Output
18
Example by Dr. Kilian Weinberger
19
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
• Drive safely along a curving road
Things ML can do
• Speech recognition
• Natural language understanding in narrowly bounded domains
• Image recognition
• Play video games like a pro
• Robot locomotion
• Drive safely along a curving road
• Translate spoken Chinese into spoken English in real time
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
• Play soccer on a human level
Things ML cannot do (at 2021)
• Converse successfully with another person for an hour
• Drive safely and responsibly
• Cancer treatment recommendation
• Guarantee fairness
• Play soccer on a human level
• Write an intentionally funny story
[Shank, Tale-Spin System, 1984]
• One day Joe Bear was hungry. He asked his friend Irving Bird
where some honey was. Irving told him there was a beehive in
the oak tree. Joe walked to the oak tree. He ate the beehive. The
End.
• Henry Squirrel was thirsty. He walked over to the river bank
where his good friend Bill Bird was sitting. Henry slipped and fell
in the river. Gravity drowned. The End.
• Once upon a time there was a dishonest fox and a vain crow. One
day the crow was sitting in his tree, holding a piece of cheese in
his mouth. He noticed that he was holding the piece of cheese.
He became hungry, and swallowed the cheese. The fox walked
over to the crow. The End.
Back to the future… ANN
ML vs Humans – round 2
• 1997: Deep Blue vs. Kasparov
• First match won against world champion
• “Intelligent creative” play
• 200 million board positions per second
• Humans understood 99.9 of Deep Blue's moves
• Can do about the same now with commodity parts
• 1996: Kasparov beats Deep Blue: “I could feel --- I could smell --- a new kind of intelligence
across the table.”
• 1997: Deep Blue beats Kasparaov: “Deep Blue hasn’t proven anything.”
• Open question:
• How does human cognition deal with the search space explosion of chess?
• Or: how can humans compete with computers at all???
• 2016: AlphaGo beats Lee Sedol – huge advance: sparse rollouts and self-play
ML’s role in our life
• Applied AI automates all kinds of things
• Search engines
• Route planning, e.g. real-time navigation
• Logistics, e.g. packages, inventory
• Medical diagnosis
• Automated help desks
• Spam / fraud detection
• Smarter devices, e.g. cameras, cars
• Product recommendations
• Personal advertising
• … Lots more!
Is ML/AI dangerous?
• Eilon Musk called AI humanity’s “biggest existential threat” and compared it
to “summoning the demon.”…“As AI gets probably much smarter than
humans, the relative intelligence ratio is probably similar to that between a
person and a cat, maybe bigger,” Musk said. “I do think we need to be very
careful about the advancement of AI.”
• Max Tegmark, a physics professor at MIT: “When we got fire and messed up
with it, we invented the fire extinguisher. When we got cars and messed up,
we invented the seat belt, airbag, and traffic light. But with nuclear weapons
and A.I., we don’t want to learn from our mistakes. We want to plan ahead.”
• Nick Bostrom, philosopher at Oxford: “once unfriendly superintelligence
exists, it would prevent us from replacing it or changing its preferences. Our
fate would be sealed.”
Is ML/AI dangerous?
• Sam Altman, co-chair OpenAI: “In the next few decades we are either going to head
toward self-destruction or toward human descendants eventually colonizing the universe.”
• Stephen Hawking: “Success in creating effective AI, could be the biggest event in the
history of our civilization. Or the worst. We just don’t know. So we cannot know if we will
be infinitely helped by AI, or ignored by it and side-lined, or conceivably destroyed by it.” …
“Unless we learn how to prepare for, and avoid, the potential risks, AI could be the worst
event in the history of our civilization. It brings dangers, like powerful autonomous
weapons, or new ways for the few to oppress the many. It could bring great disruption to
our economy.”
• Larry Page: “Artificial intelligence would be the ultimate version of Google. The ultimate
search engine that would understand everything on the web. It would understand exactly
what you wanted, and it would give you the right thing. We're nowhere near doing that
now. However, we can get incrementally closer to that, and that is basically what we work
on.”
Define the right objective/utility
• Objective: reduce poverty
• Action: kill poor people
• Objective: reduce traffic congestion
• Action: block roads
• Objective: increase patient satisfaction
• Action: provide opioids
• Objective: forecast which criminals are most likely to reoffend
• Result: “Through COMPAS, black offenders were seen almost twice as likely as
white offenders to be labeled a higher risk” [Angwin et al; 2016; Garber, 2016; Liptak, 2017]
• COMPAS aids judges with sentencing decisions
What about bugs?
Taking our jobs
• Can ML do your job better than you?
In class questions/discussions
• In-class questions and discussions should be posted on Campuswire
(class code: 6453)
53
Technological advancements
New technology
• Higher productivity
• Less manpower is
required
• Loss of traditional jobs
• New types of jobs
Everybody wins?
AI/ML should benefit all (personal opinion)
• But it doesn’t
• Corporations reap most benefits
from increased productivity
• Can/should we fix this?
ML example (linear classifier)
• Data:
• [x=2D point, y=color]
• hypothesis(x):
• If
• Return ‘blue’
• Else return ‘red’
• Challenge:
• Find best assignment
57
Can we generalize this approach?
• What if our data is not linearly separable?
• Two common approaches
1. Fit a non-linear function … Neural networks, KNN,
Naïve Bayes
2. Bend the state space in a way that allows linear
separability … Kernel Trick
58
General notation
The dataset. is a feature vector from and is the label from .
Multiclass classification
Regression
59
Feature vector
• Feature vector () can be very large
• E.g., a student’s grade record
• Current hardware can process (and make sense of) very large feature
vectors
• E.g., 8 megapixels image, each pixel is composed of 3 features (red,
green, blue)
• A feature vector is said to be dense if the number of nonzero features
in is large relative to
• A feature vector is said to be sparse if consists of mostly zeros
60
Model hypothesis
• A classification hypothesis maps features
to labels
62
Minimize model error
63
Minimize expected model error
1. By definition
2. Law of large numbers
• Corollary: with sufficient samples coming from :
64
Distribution mismatch
• What if our samples come from a different distribution?
• E.g., evaluating a classifier that identifies fraudulent credit card transactions
with mostly samples of fraudulent transactions
• E.g., evaluating a model that identifies broken bones from X-rays obtained
from a geriatric facility
• The expectation approximation will be biased
• This is known as a distribution mismatch
• Should be avoided
• Can be addressed in some cases (e.g., Importance Reweighting, off-
policy evaluation in RL)
65
Loss functions
• How do we define ?
• When defining ML as an optimization problem, we call the error
estimation which is computed over a set of labeled samples using
a ()
• Many options. Task dependent.
• E.g., 2-way classification
• Binary loss:
67
Examples of Loss functions
𝐿𝑜𝑠𝑠
Regression
• Absolute error, Mean-absolute error
(MAE), l1 loss |h ( 𝑥𝑖 ) − 𝑦 𝑖|
68
Examples of Loss functions
[ ]
𝑥1 • l1 loss?
[ ]
𝑥2 ^ • l2 loss?
𝑦 1=2
𝑥3
^
𝑦 2 =0
[ ]
𝑦 1=2 ^
𝑦 3=1
𝑦 2=1
𝑦 3=3
69
Use vector/matrix notation
•
70
Model fitting as an optimization problem
• We can rewrite our goal:
• As:
• It is actually straightforward to find such a model (with Loss=0)
71
Training vs testing
• Split the dataset to training and testing sets
• and
• Train on , evaluate on
• Cross validation
• K-fold cross validation
𝑡𝑟𝑎𝑖𝑛 𝑡𝑒𝑠𝑡
𝐷 𝐷
𝑡𝑟𝑎𝑖𝑛
𝐷 𝐷
𝑡𝑒𝑠𝑡
𝑡𝑟𝑎𝑖𝑛
𝐷 𝐷
𝑡𝑒𝑠𝑡
75
Course overview
• Artificial Neural Networks / Deep Learning
• Derivative free optimization
• Unsupervised learning: clustering and features extraction
• Multi-armed bandits
• Markov-decision processes and RL
• Advanced topics
• Variational Inference and Generative Models
• Recurrent neural networks
• Open for suggestions…
76
What next?
• Class: K nearest neighbors
• Assignments:
• Go over tutorials
• Linear algebra
• Basic probability
• Python
• Programing (P0)
• Python
• NumPy
• Pandas
77