0% found this document useful (0 votes)
4 views

intro_slides

The document outlines the Spring 2023 Introduction to Machine Learning course at MIT, detailing the course structure, grading, prerequisites, and key topics covered. It emphasizes the importance of understanding machine learning concepts and provides information on staff, office hours, and collaboration policies. The course will cover supervised and unsupervised learning, with a focus on practical applications and theoretical foundations.

Uploaded by

vco.osc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

intro_slides

The document outlines the Spring 2023 Introduction to Machine Learning course at MIT, detailing the course structure, grading, prerequisites, and key topics covered. It emphasizes the importance of understanding machine learning concepts and provides information on staff, office hours, and collaboration policies. The course will cover supervised and unsupervised learning, with a focus on practical applications and theoretical foundations.

Uploaded by

vco.osc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

Spring 2023!

Introduction to
Machine Learning
https://introml.mit.edu

Marzyeh Ghassemi
[email protected]
Spring 2023!
Introduction to
Machine Learning
https://introml.mit.edu

Tomas Lozano-Perez
[email protected]
Spring 2023!
Introduction to
Machine Learning
https://introml.mit.edu

Wojciech Matusik
[email protected]
Spring 2023!
Introduction to
Machine Learning
https://introml.mit.edu

Vince Monardo
[email protected]
Spring 2023!
Introduction to
Machine Learning
https://introml.mit.edu

Shen Shen
[email protected]
Spring 2023!
Introduction to
Machine Learning
https://introml.mit.edu

Ashia Wilson
[email protected]
Full Staff

Logistical issues? Personal concerns?


We’d love to help out at
[email protected]
and ~40 awesome LAs
Section 1 staff
Recitation + Lab

Lab plus ~7 awesome LAs


Section 2 staff
Recitation + Lab

Lab plus ~5 awesome LAs


Section 3 staff
Recitation + Lab

Lab
plus ~7 awesome LAs
Section 4 staff
Recitation + Lab

Lab plus ~5 awesome LAs


Section 5 staff
Recitation + Lab

Lab

plus ~6 awesome LAs


Section 6 staff

Recitation + Lab

Lab

plus ~5 awesome LAs


Section 7 staff
Recitation + Lab

Lab

plus ~6 awesome LAs


Course pedagogy:
A nominal week – mix of theory, concepts, and application to problems!
• Exercises: Releases on Wed 5pm, due the following Mon. 9am
Easy questions based on that week’s notes reading (and viewing optional recorded lecture)
• Recitation: Monday, with attendance check-in (not today)
Assumes you have read and done exercises; start on homework
• Homework: Releases Monday 9am; due Wednesday (9 days later) at 11pm
Harder questions: concepts, mechanics, implementations
• Lab: Wednesday, with attendance check-in (starting Feb 8)
In-class empirical exploration of concepts
Work with partner on lab assignment
Check-off conversation with staff member, due the following Monday 11pm
Office hours: lots! posted on website. Also make use of Piazza and Psetpartners!
Exams:
• Midterm: Thurs. March 23: 7:30-9:30 pm
• Final: scheduled by Registrar (posted in 3rd week). Alert – might be as late as May 24!
Grading and collaboration (details on web)
Our objective (and we hope yours) is for you to learn about machine learning
• take responsibility for your understanding
• we will help!
Formula:
exercises 5% + attendance 5% + homework 15% + labs 15% + midterm 25% + final 35%
Lateness: 20% penalty per day, applied linearly (so 1 hour late is -0.83%)
Extensions:
• 20 one-day extensions (move one assignment’s deadline forward by one day) will be
applied automatically at the end of the term in a way that is maximally helpful
• for medical or personal difficulties see S3 & contact us at [email protected]
Collaboration: don't cheat!
• Understand everything you turn in
• Coding and detailed derivations must be done by you
• See collaboration policy/examples on course web site
Expected prerequisite background
Things we expect you to know (we use these constantly, but don’t
teach them explicitly):

Programming (e.g. as in 6.009 or 6.006)


• Intermediate Python, including classes
• Exposure to algorithms – ability to understand & discuss pseudo-code,
and implement in Python
Linear Algebra (e.g. as in 18.06, 18.C06, 18.03, or 18.700)
• Matrix manipulations: transpose, multiplication, inverse etc.
• Points and planes in high-dimensional space
• (Together with calculus): taking gradients, matrix calculus
Useful background
Things it helps to have prior exposure to, but we don’t expect (we
use these in 6.390, but will discuss as we go):

• numpy (Python package for matrix/linear algebra)


• pytorch (python package for modern ml models like deep neural
networks)
• Basic discrete probability: random variables, independence, conditioning
Heads-up for Wednesday
● Attend your assigned section only starting Wednesday Feb 8
● If you need to change your permanent section assignment, you will be
able to self-switch, starting 5pm today; details on introml homepage

Rest of Today
● Start our ML journey with an overview
● Work through recitation handout with others at your table
● Ask questions by putting yourself in the help queue
● No worries if no introml access yet; great chance to know your
neighbor (ask them to put you in the queue)
What we're teaching: Machine Learning!
Given:
• a collection of examples (gene sequences, documents, tree sections)
• an encoding of those examples in a computer (as vectors)

Derive:
• a computational model (called a hypothesis) that describes relationships
within and among the examples that is expected to characterize well new
examples from that same population, to make good predictions or decisions
A model might:
• classify images of cells as to whether they're cancerous
• specify groupings (clusters) of documents that address similar topics
• steer a car appropriately given lidar images of the surroundings
Very roughly, ML can be categorized into

(the categorization can be refined, e.g. there are active learning, semi-supervised, selective, contrastive,
few-shot, inverse reinforcement learning… )

[Slides adapted from 6.790]


Supervised learning
Goal: predict to what
degree a drug candidate
binds to the intended
target protein (based on
a dataset of already
screened molecules
against the target)

[Slides adapted from 6.790]


Unsupervised learning dimensionality reduction, embedding
2
Country and Capital Vectors Projected by PCA
China
Beijing
1.5 Russia
Japan
Moscow
1

dependency 0.5
Turkey Ankara Tokyo

/causal Poland

structure [Sachs et al 05] 0 Germany


France Warsaw
Berlin
-0.5 Italy Paris

Greece Athens
Rome
-1 Spain

[Mikolov et al., 2013]


Madrid
-1.5 Portugal
Lisbon

-2
-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Figure 2: Two-dimensional PCA projection of the 1000-dimensional Skip-gram vectors of countries and their

Over 3D protein structures, etc.


capital cities. The figure illustrates ability of the model to automatically organize concepts and learn implicitly
the relationships between them, as during the training we did not provide any supervised information about
what a capital city means.

which is used to replace every log P (wO |wI ) term in the Skip-gram objective. Thus the task is to
distinguish the target word wO from draws from the noise distribution Pn (w) using logistic regres-
sion, where there are k negative samples for each data sample. Our experiments indicate that values
of k in the range 5–20 are useful for small training datasets, while for large datasets the k can be as
small as 2–5. The main difference between the Negative sampling and NCE is that NCE needs both
samples and the numerical probabilities of the noise distribution, while Negative sampling uses only
samples. And while NCE approximately maximizes the log probability of the softmax, this property
is not important for our application.
Both NCE and NEG have the noise distribution Pn (w) as a free parameter. We investigated a number

de-noising diffusion models over images


of choices for Pn (w) and found that the unigram distribution U (w) raised to the 3/4rd power (i.e.,
U (w)3/4 /Z) outperformed significantly the unigram and the uniform distributions, for both NCE
and NEG on every task we tried including language modeling (not reported here).

2.3 Subsampling of Frequent Words

In very large corpora, the most frequent words can easily occur hundreds of millions of times (e.g.,
“in”, “the”, and “a”). Such words usually provide less information value than the rare words. For
example, while the Skip-gram model benefits from observing the co-occurrences of “France” and
“Paris”, it benefits much less from observing the frequent co-occurrences of “France” and “the”, as
nearly every word co-occurs frequently within a sentence with “the”. This idea can also be applied
in the opposite direction; the vector representations of frequent words do not change significantly
after training on several million examples.
[Slides adapted from 6.790]
To counter the imbalance between the rare and frequent words, we used a simple subsampling ap-
proach: each word wi in the training set is discarded with probability computed by the formula
!
t
P (wi ) = 1 − (5)
f (wi )

4
ChatGPT
Reinforcement learning

[Slides adapted from 6.790]


Machine learning (ML): why & what
• What is ML? Roughly, a set of methods for making predictions
and decisions from data.
• Why study ML? To apply; to understand; to evaluate; to create!
• Notes: ML is a tool with pros & cons

• What do we have? Data! And computation!


• What do we want? To make predictions on new data!
• How do we learn to make those decisions?
• The topic of this course!
What do we have?
• There are many different problem classes in ML
• We will first focus on an instance of supervised learning known
as regression.
(Training) data
• n training data points

• For data point

• Feature vector

• Label
• Training data
What do we want?
We want a “good” way to label new feature
vectors
• How to label? Learn a hypothesis

• We typically consider a class of


possible hypotheses
Input: Output:
Feature vector Label

how well our hypothesis labels new feature vectors depends largely
on how expressive the hypothesis class is
What do we want?
We may consider the class of linear
regressors:
• Hypotheses take the form:

parameters to learn Θ
• What we really want is to generalize to future data!
• What we don’t want:
• Model does not capture the input-output relationship (e.g.,
not enough data) —> Underfitting
• Model too specific to training data —> Overfitting
How good is a hypothesis?
Hopefully predict well on future data
• How good is a regressor at one point?

• Quantify the error using a loss


function,
• Common choice: squared loss:

g: guess,
a: actual

• Training error:

• Validation or Test error (n’ new points):


How do we learn?
• Have data; have hypothesis class
• Want to choose (learn) a good
hypothesis (a set of parameters)

What we want:

How to get it: learning


(Next time!) algorithm

You might also like