0% found this document useful (0 votes)

74 views

Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin

This document provides an overview and introduction to machine learning. It discusses supervised learning problems like classification and regression. It explains the difference between training error and test error, and how overfitting occurs when a model memorizes the training data rather than generalizing. The document also discusses the importance of inductive bias and choosing an appropriate hypothesis space to generalize well. It provides examples of other machine learning problems like unsupervised learning, semi-supervised learning, active learning, and reinforcement learning.

Uploaded by

Praveen Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

74 views

Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin

Uploaded by

Praveen Kumar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Machine Learning Coms-4771

Alina Beygelzimer
Tony Jebara, John Langford, Cynthia Rudin

February 3, 2008

(partially based on Yann LeCun’s and Sam Roweis’s slides; see links at the web page)
Logistics

I The course web page is

http://hunch.net/~coms-4771
I If you have a question, email
[email protected]
or post it at
http://coms-4771.blogspot.com/
I Do interrupt and ask questions during the class.
I The web page has notes on probability theory and statistics (if
you need to refresh your memory)
What is Machine Learning?

I In October 2006, Netflix announced a

$1M problem:
Predict the rating a given user would
assign to a given movie (based on 100
million past user-movie ratings).
I 10% improvement = $1M
I 2500 teams; annual progress prize of
www.netflixprize.com
$50K went to KorBell from AT&T
Labs (8.43%)
Netflix Problem Setup

I Success is measured by the root mean

squared error (RMSE) on the test set:
v
u n
u1 X
t (yi − ŷi )2 ,
n i=1

where yi and ŷi are the actual and

predicted movie ratings.
Netflix Problem Setup

I Success is measured by the root mean

squared error (RMSE) on the test set:
v
u n
u1 X
t (yi − ŷi )2 ,
n i=1

where yi and ŷi are the actual and

predicted movie ratings.
I Q: What’s the role of the probe set?
I Q: Why are there both quiz and test sets?
Netflix Problem Setup

I Success is measured by the root mean

squared error (RMSE) on the test set:
v
u n
u1 X
t (yi − ŷi )2 ,
n i=1

where yi and ŷi are the actual and

predicted movie ratings.
I Q: What’s the role of the probe set?
I Q: Why are there both quiz and test sets?
I We have a well defined task. How would
you go about solving it?
What is Machine Learning?

I We want robust, intelligent behavior. Hand-programming a solution

directly is not going to work. The world is too complex.
I Learning approach = programming by example: Get the machine to
program itself by showing it examples of the behavior we want.
Learning is about improving performance through experience.
I Learning is data driven. It can examine much larger amounts
of data than you can.
I Labeling examples is perhaps the easiest way to express
knowledge.

I Learning is general purpose—algorithm reuse!

I In reality, we specify a space of possible solutions, and let the
machine find a good solution in this space.
Learning Problems, Structure of Learning Machines

I Learning problem = (unknown) distribution of inputs and outputs D

+ (typically known) loss function L.
I Hypothesis space H = Space of functions mapping inputs to
outputs (H is often indexed by a set of parameters the algorithm
can tune to create different solutions)
I Learning algorithm searches (or prunes or tunes parameters in) H to
find a hypothesis minimizing the expected L on D, based on a
limited set of input-output examples.

The hardest part is deciding how to represent inputs/outputs and

how to select appropriate L and H.

How do we incorporate prior information?

Supervised Learning
Given a set of labeled examples, predict outputs of future unlabeled
examples.
Classification: Feature space X , discrete set of
labels Y (categories). Find a decision boundary
between the categories in Y .
Loss function: `(y , y 0 ) = 1(y 6= y 0 )
(zero-one loss)
Distribution D over X × Y . Find a classifier
h : X → Y minimizing the expected loss on D
given by Pr(x,y )∼D [h(x) 6= y ] = E(x,y )∼D `(y , h(x)).

Regression (“curve fitting” or “function

approximation”): Y = R
Loss function: `sq (y , y 0 ) = (y − y 0 )2 (squared
loss)
Learn a continuous mapping f : X → R minimizing
E(x,y )∼D `sq (y , f (x)).
Training vs. Testing

I Training (empirical) error: the average loss on the training

data
I Test error: the average loss on the test data
I Ideally we want to minimize the test error, but we can’t
evaluate it! (Most of the time we don’t even know future
inputs.)
I Do we want to minimize the training error instead?
I NO. Consider an algorithm that memorizes training examples.
If a test example is in the training set, produce the memorized
output. Otherwise, choose a random output.
I We are overfitting: Training error is 0. Test error is HUGE.
I Learning is not Memorization. We want to generalize from
training examples to predict well on previously unseen
examples.
Inductive Bias
I There should be some hope that the data is predictive.
I No Free Lunch Theorems: an unbiased learner can never generalize.
Inductive bias = set of assumptions that favor some predictors over
others.

I Ways of incorporating bias: assumptions on the data distribution

(test data is drawn from the same distribution as the training data,
examples are independent), choice of the hypothesis class
I Examples: Occam’s Razor (choose the simplest consistent
hypothesis), maximum margin (attempt to maximize the width of
the boundary), nearest neighbors (guess the label based on closest
training examples).
Choosing Hypothesis Space
I The number of training examples for
which the training error and test error
start converging = “capacity” of the
learning machine.
I Can we bound the expected loss as a
function of the empirical loss, the
capacity of the family of functions,
and the size of the training set? (Yes,
sometimes.)
I Problem: if the class is too rich, there
is a risk of overfitting the data. If the
class if too simple, there is a risk of
not being able to approximate the
data well.
Q: How to choose the hypothesis space so that it is large enough to
contain a solution to your problem, yet small enough to ensure
generalization from the training sets you have?
Choosing Hypothesis Space

For each training set size, there is an optimal capacity for the machine.
Choosing the Loss Function

I L quantifies what it means to do well/poorly on the task

I Tradeoff: L captures what we actually want to minimize vs. L is
computationally easy to optimize.
I Loss function semantics:
I Optimizing squared loss means predicting the conditional mean
E(x,y )∼D (y | x).
I Optimizing the absolute loss |y − y 0 | means predicting the
conditional median.
I Good rule: start with what you want and try to derive a loss.
Other Types of Learning

I Unsupervised learning: given only inputs, automatically

discover structure (clustering, outlier detection, embedding in
low-dimensional manifold, compression). Not so well defined.
I Semi-supervised learning: labels are expensive, unlabeled data
is cheap. Use the unlabeled data to help you learn.
I Active learning: ask for labels on unlabeled examples of your
choice, direct the learning process.
I Reinforcement learning: learn how to act in a way that
maximizes your expected reward; your actions change the
distribution of future inputs.
Some Applications of Machine Learning

I Handwritten character recongition, speech recognition, speaker

verification, object detection, tracking objects in videos
I Search, targeted ads, recommendation systems, spam filtering, auctions

I Credit card fraud, insurance premium prediction, product pricing, stock

market analysis (Wall Street uses a lot of machine learning)
I Medical diagnosis and prognosis, fMRI analysis
I Game playing (adaptive opponents)
I Robotics, adaptive decision making under uncertainty

DSA5105 Lecture1
No ratings yet
DSA5105 Lecture1
51 pages
ML 1 2 3
No ratings yet
ML 1 2 3
54 pages
DSA5102X_lecture1
No ratings yet
DSA5102X_lecture1
51 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
27 pages
Presentation on ML - Copy
No ratings yet
Presentation on ML - Copy
469 pages
Module 1 ML
No ratings yet
Module 1 ML
51 pages
What Is Supervise
No ratings yet
What Is Supervise
3 pages
Machine Learnig
No ratings yet
Machine Learnig
38 pages
Neural Networks Cheat Sheet - 2020 PDF
No ratings yet
Neural Networks Cheat Sheet - 2020 PDF
14 pages
DSA5102_lecture1
No ratings yet
DSA5102_lecture1
60 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
15 pages
Intro_DL_01
No ratings yet
Intro_DL_01
64 pages
19_ML_intro
No ratings yet
19_ML_intro
33 pages
Lecture 2.2 Example Data Preparation Feature Engineering
No ratings yet
Lecture 2.2 Example Data Preparation Feature Engineering
25 pages
2021 Machine Learning Intro
No ratings yet
2021 Machine Learning Intro
43 pages
Maths For ML
No ratings yet
Maths For ML
156 pages
Matematics and Machine Learning
No ratings yet
Matematics and Machine Learning
156 pages
Key Ideas in Machine Learning
No ratings yet
Key Ideas in Machine Learning
11 pages
Chapter 01 Introduction To Machine Learning
No ratings yet
Chapter 01 Introduction To Machine Learning
59 pages
UNIT-I
No ratings yet
UNIT-I
132 pages
Machine Learning – I[1]
No ratings yet
Machine Learning – I[1]
126 pages
2024 Machine Learning Intro
No ratings yet
2024 Machine Learning Intro
50 pages
Introduction to Machine Learning
No ratings yet
Introduction to Machine Learning
21 pages
ML 01
No ratings yet
ML 01
24 pages
Introduction To Machinelearning
No ratings yet
Introduction To Machinelearning
75 pages
Lecturenotes PDF
No ratings yet
Lecturenotes PDF
80 pages
Lecturenotes Cse176
No ratings yet
Lecturenotes Cse176
80 pages
Lecture1
No ratings yet
Lecture1
56 pages
lecture17
No ratings yet
lecture17
33 pages
01 - Introduction
No ratings yet
01 - Introduction
35 pages
Lec-02-03
No ratings yet
Lec-02-03
39 pages
Machine Learning HC
No ratings yet
Machine Learning HC
4 pages
Introduction To ML
100% (1)
Introduction To ML
39 pages
Unit 2
No ratings yet
Unit 2
76 pages
Machine - Learning - Unit - 1
No ratings yet
Machine - Learning - Unit - 1
70 pages
Intro To Machine Learning With PyTorch
No ratings yet
Intro To Machine Learning With PyTorch
48 pages
Sec 1630
No ratings yet
Sec 1630
145 pages
ML viva questions
No ratings yet
ML viva questions
25 pages
19 ML Intro
No ratings yet
19 ML Intro
31 pages
First Cours 2
No ratings yet
First Cours 2
42 pages
Lecture 1
No ratings yet
Lecture 1
47 pages
1 Introduction To Machine Learning
No ratings yet
1 Introduction To Machine Learning
20 pages
all_cards
No ratings yet
all_cards
106 pages
All Cards
No ratings yet
All Cards
104 pages
Week3_LearningI
No ratings yet
Week3_LearningI
48 pages
Unit 1 Notes
No ratings yet
Unit 1 Notes
15 pages
Lecture 4.2 Supervised Learning Classification
No ratings yet
Lecture 4.2 Supervised Learning Classification
25 pages
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
No ratings yet
Hypothesis Space and Inductive Bias - Inductive Bias - Inductive Learning - Underfitting and Overfitting
4 pages
Machine Learning
No ratings yet
Machine Learning
13 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
Introductiontomachinelearning 230723174746 1a0e5edc
No ratings yet
Introductiontomachinelearning 230723174746 1a0e5edc
27 pages
Machine Leaning 1 unit
No ratings yet
Machine Leaning 1 unit
10 pages
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
No ratings yet
UNIT I 1 ML Introduction To ML Well Posed Learning Problem
48 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Lec2 Intro to ML
No ratings yet
Lec2 Intro to ML
35 pages
Machine Learning: Presentation By: C. Vinoth Kumar SSN College of Engineering
100% (1)
Machine Learning: Presentation By: C. Vinoth Kumar SSN College of Engineering
15 pages
Machine Learning Basics Dl2 Rk (1)
No ratings yet
Machine Learning Basics Dl2 Rk (1)
16 pages
ML HAND WRITTEN NOTES
No ratings yet
ML HAND WRITTEN NOTES
19 pages
SAT Math: Master the Skills in 40 Pages
From Everand
SAT Math: Master the Skills in 40 Pages
Jennifer L Johnson
No ratings yet
The Matrixial Brain: Experiments in Reality
From Everand
The Matrixial Brain: Experiments in Reality
Paul Chaplin
No ratings yet
ML Module 1
No ratings yet
ML Module 1
52 pages
L. D. College of Engineering: Lab Manual For
No ratings yet
L. D. College of Engineering: Lab Manual For
70 pages
JETIR2501512
No ratings yet
JETIR2501512
6 pages
Machine Learning for Decision Sciences with Case Studies in Python 1st Edition S. Sumathi all chapter instant download
100% (1)
Machine Learning for Decision Sciences with Case Studies in Python 1st Edition S. Sumathi all chapter instant download
50 pages
Housing Price Prediction Modeling Using Machine Learning
No ratings yet
Housing Price Prediction Modeling Using Machine Learning
6 pages
Research On The Prediction of Boston House Price B
No ratings yet
Research On The Prediction of Boston House Price B
11 pages
Unit - 1 1.introduction To ML
No ratings yet
Unit - 1 1.introduction To ML
74 pages
Overview of Big Data: CS102 Winter 2019
No ratings yet
Overview of Big Data: CS102 Winter 2019
53 pages
Introduction To Pattern Recognition
No ratings yet
Introduction To Pattern Recognition
46 pages
Rathje 等 - Making the most of AI and machine learning in organizations and strategy research Supervised machin
No ratings yet
Rathje 等 - Making the most of AI and machine learning in organizations and strategy research Supervised machin
28 pages
DataScience_1
No ratings yet
DataScience_1
22 pages
ENSEMBLE_LEARNING
No ratings yet
ENSEMBLE_LEARNING
9 pages
Two Stage Job Title Identification-1
No ratings yet
Two Stage Job Title Identification-1
77 pages
Research Report On AI and Predictive Learning
No ratings yet
Research Report On AI and Predictive Learning
5 pages
DS Cheat Sheets
No ratings yet
DS Cheat Sheets
18 pages
Evaluation of Liquid Loading in Gas Wells Using Machine Learning
No ratings yet
Evaluation of Liquid Loading in Gas Wells Using Machine Learning
12 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
Professional Machine Learning Engineer
No ratings yet
Professional Machine Learning Engineer
106 pages
ResearchPaper
No ratings yet
ResearchPaper
14 pages
Cramming: Training A Language Model On A Single GPU in One Day (2022)
No ratings yet
Cramming: Training A Language Model On A Single GPU in One Day (2022)
27 pages
The Machine Learning Audit CRISP DM Framework - Joa - Eng - 0118
No ratings yet
The Machine Learning Audit CRISP DM Framework - Joa - Eng - 0118
6 pages
ML Sanchit
No ratings yet
ML Sanchit
49 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
Data Science Cheat Sheet
No ratings yet
Data Science Cheat Sheet
10 pages
Slides DS
No ratings yet
Slides DS
334 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Ds & ML Project (IBM)
No ratings yet
Ds & ML Project (IBM)
9 pages
AIML Interview Questions
No ratings yet
AIML Interview Questions
17 pages
Data Science Career Guide Course Guidebook
No ratings yet
Data Science Career Guide Course Guidebook
18 pages
House_Price_Prediction_using_AI[1]
No ratings yet
House_Price_Prediction_using_AI[1]
12 pages

Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin

Uploaded by

Machine Learning Coms-4771: Alina Beygelzimer Tony Jebara, John Langford, Cynthia Rudin

Uploaded by

Machine Learning Coms-4771

I The course web page is

I In October 2006, Netflix announced a

I Success is measured by the root mean

where yi and ŷi are the actual and

I Success is measured by the root mean

where yi and ŷi are the actual and

I Success is measured by the root mean

where yi and ŷi are the actual and

I We want robust, intelligent behavior. Hand-programming a solution

I Learning is general purpose—algorithm reuse!

I Learning problem = (unknown) distribution of inputs and outputs D

The hardest part is deciding how to represent inputs/outputs and

How do we incorporate prior information?

Regression (“curve fitting” or “function

I Training (empirical) error: the average loss on the training

I Training (empirical) error: the average loss on the training

I Ways of incorporating bias: assumptions on the data distribution

I L quantifies what it means to do well/poorly on the task

I Unsupervised learning: given only inputs, automatically

I Handwritten character recongition, speech recognition, speaker

I Credit card fraud, insurance premium prediction, product pricing, stock

You might also like