Lecture 1
Lecture 1
Aarti Singh
Aarti Mary
TAs:
Lectures and recitations will be recorded. Strictly for your use only.
Breakout rooms and Office hours will NOT be recorded. 3
Logistics
In case of technical issues during lecture:
please try logging back in
if issues remains > 5 mins, I will send an email when resolved.
If not resolved, recorded lecture + extra office hours
Webpage: https://www.cs.cmu.edu/~aarti/Class/10315_Fall20
Syllabus, policies, schedule of lectures, recitations,
office hours, slides, reading material, homeworks, …
Piazza: http://piazza.com/cmu/fall2020/10315
announcements, questions for Teaching team,
discussion forum for students
Homework submission: Gradescope
Grades: Canvas 4
Expectations
• Remote sessions (lecture, recitation, breakout room, office
hours)
– Turn on video if possible, especially for breakout rooms and office hours
– Keep yourself muted unless asking or responding to questions
Interact!
– Ask questions in class by raising hand or via Zoom chat
– Respond to questions in class by raising hand or via Zoom chat
– Anonymous polls on Zoom
– Breakout rooms and jamboard
Groups 1-10: Jamboard_1_10
Groups 11-20: Jamboard_11_20
6
Grading
• Grading
- 4 homework assignments (4 x 15% = 60%)
- 5 QnAs (20%)
- 1 midterm, 1 final (both in class): (10+10 = 20%)
• Late days
- total 4 across homeworks and QnAs
- No partial credit after late days
- late days are for unforeseen situations (interviews,
conference, etc.), do NOT include them in your plan
7
Homeworks & QnAs
• Collaboration
– You may discuss the questions
– Each student writes their own answers
– Each student must write their own code for the programming
part
– Please don’t search for answers on the web, Google,
previous years’ homeworks, etc.
• please ask us if you are not sure if you can use a particular
reference
• list resources used (references, discussants) on top of
submitted homework
• Homeworks are hard, start early J
• Due on gradescope at 11:59 pm
8
Waitlist + Audits + Pass/Fail
• Waitlist
we’ll let everyone in [as long as there is space in room]
wait to see how many students drop
keep attending lectures, recitations and office hours
virtually and doing HW
9
About the course
• Machine Learning Algorithms and Principles
– Classification: Naïve Bayes, Logistic Regression, Neural Networks,
Support Vector Machines, k-NN, Decision Trees, Boosting
– Regression: Linear regression, Kernel regression, Nonparametric
regression
– Unsupervised methods: Kernel density estimation, k-means and
hierarchical clustering, PCA, nonlinear dimensionality reduction
– Core concepts: Probability, Optimization, Theory, Model selection,
overfitting, bias-variance tradeoffs …
10
Recommended textbooks
• Textbooks (Recommended, not required):
Pattern Recognition and Machine Learning, Christopher Bishop
(available online)
Machine Learning: A probabilistic perspective, Kevin Murphy
(available online)
Machine Learning, Tom Mitchell
The elements of statistical learning: Data mining, inference
and prediction, Trevor Hastie, Robert Tibshirani, Jerome
Friedman
11
Pre-requisites
• Assume mathematical maturity
– Basic Probability and Statistics
Probability distributions – discrete and continuous, Mean, Variance,
Conditional probabilities, Bayes rule, Central limit theorem…
– Programming (python) and principles of computing
– Multivariate Calculus
Derivatives, integrals of multi-variate functions
– Linear Algebra
Matrix inversions, eigendecomposition, …
• Tutorial videos
– Probability, Calculus, Functional Analysis, SVD
https://www.youtube.com/channel/UC7gOYDYEgXG1yIH_rc2LgOw/playlists
– Linear Algebra
http://www.cs.cmu.edu/~zkolter/course/linalg/index.html
12
Related courses
• Related courses – Intro to ML algorithms and principles
10-301 – Undergrad version for non-SCS majors
10-601 – Masters version
10-701 – PhD version
10-715 – PhD students doing research in machine learning
(hardest, most mathematical)
Other related courses:
10-606, 10-607 – Math background for ML
10-605, 10-805 – Machine Learning with Large Datasets
11-663 – Machine Learning in Practice (ML software)
10-702, 10-704, 10-707, 10-708, 10-709, 15-859(B) – related
advanced topics
13
What is Machine Learning?
14
What is Machine Learning?
Data
Learning algorithm
Knowledge
15
From Data to Knowledge …
Machine Learning in Action
16
Machine Learning in Action
• Spam filtering
Spam/
Not spam
17
Machine Learning in Action
• Stock Market Prediction
Y=?
X = Feb01
18
Machine Learning in Action
• Face detection
19
Machine Learning in Action
• Decoding thoughts from brain scans
Rob a bank …
20
Machine Learning in Action
• Self-driving Cars
21
Machine Learning in Action
Document classification
Speech recognition, Natural language processing
Computer vision
Robotics
Web forensics
Medical data analysis
Sensor networks
Social networks
Smart buildings
…
22
Machine Learning in Action
Ø How have you interacted with ML in your daily life so
far?
23
ML is trending!
– Wide applicability
– Very large-scale complex systems
• Internet (billions of nodes), sensor network (new multi-modal
sensing devices), genetics (human genome)
– Huge multi-dimensional data sets
• 1.6 million images, 1000 object categories
• 30,000 genes x 10,000 drugs x 100 species x …
– Software too complex to write by hand
– Improved machine learning algorithms
– Improved data capture (Terabytes, Petabytes of data),
networking, faster computers
– Demand for self-customization to user, environment
“Data scientist: The sexiest job of the 21st century”
(Harvard Business Review) 24
Enjoy!
• ML is becoming ubiquitous in science,
engineering and beyond
• This class should give you the basic foundation
for applying ML and developing new methods
• The fun begins…
25
What is Machine Learning?
Design and Analysis of algorithms that
• improve their performance
• at some task
• with experience
26
Human learning
Task: Learning stage of protein crystallization
28
Tasks, Experience, Performance
29
Machine Learning Tasks
Broad categories -
• Supervised learning
Classification, Regression
• Unsupervised learning
Density estimation, Clustering, Dimensionality reduction
• Semi-supervised learning
• Active learning
• Reinforcement learning
• Many more …
30
Supervised Learning
Input Label
“Sports”
“News”
Discrete Labels
Document/Article “Science” Classification
…
Task:
31
Classification or Regression?
Estimating Environmental
Medical Diagnosis Contamination
“Anemic”
“Healthy”
Handwriting recognition
Weather prediction 32
Unsupervised Learning
Aka “learning without a teacher”
Input
Task:
33
Unsupervised Learning
Learning a Distribution
Img src: tablelifeblog.com ,data.world
Bias of a coin
Distribution of words in text