0% found this document useful (0 votes)
27 views43 pages

00intro 1

The document outlines the course structure for a Machine Learning class at The University of Chicago, led by Risi Kondor, covering topics such as clustering, regression, and deep learning. It details prerequisites, support resources, grading criteria, and the distinction between applied and theoretical machine learning. Additionally, it discusses the evolution of machine learning from classical AI, emphasizing its practical applications and various learning paradigms.

Uploaded by

zhanghaojing62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views43 pages

00intro 1

The document outlines the course structure for a Machine Learning class at The University of Chicago, led by Risi Kondor, covering topics such as clustering, regression, and deep learning. It details prerequisites, support resources, grading criteria, and the distinction between applied and theoretical machine learning. Additionally, it discusses the evolution of machine learning from classical AI, emphasizing its practical applications and various learning paradigms.

Uploaded by

zhanghaojing62
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Topic 0: Introduction

STAT 37710/CAAM 37710/CMSC 35400 Machine Learning


Risi Kondor, The University of Chicago
Instructors

Risi Kondor (associate prof.)


Crerar 221
[email protected]

TAs:
Su Yeong Lee (CAAM)
Kexiang Wang (CAAM)

2
2/43
/43
Topics
1. Clustering
2. Dimensionality reduction
3. Manifold learning
4. Regression
5. Online algorithms
6. Kernel methods (Hilbert space algorithms)
7. Bayesian learning
8. Deep learning
9. Generative models

Note: this list is provisional and almost certain to change.

3
3/43
/43
Prerequisites

• Competence in coding in some programming language.


• Mathematical maturity: ML is a mathematical subject.
• Specific areas of math needed:
◦ Calculus
◦ Linear algebra
◦ Probability (minimal Statistics)
◦ Little bit of optimization.

4
4/43
/43
Support

Recitations:
• On an as needed basis, place and time TBD
Office Hours:
• Fridays 1pm Crerar 221
Online:
• canvas.uchicago.edu (slides, lecture notes, assignments and grades)

5
5/43
/43
Resources
Books (Strictly optional! More for “further reading” than anything else.)
• Kevin Murphy: Machine Learning: A probabilistic perspective (2012)
Warning: very Bayesian
• Zhang, Lipton, Li and Smola: Dive into deep learning (d2l.ai)
• Hastie, Tibshirani, Friedman: The Elements of Statistical Learning (2008)
(available electronically on the library’s web site)

Online Courses
• Andrew White’s book “Deep learning for molecules and materials”
https://dmol.pub/index.html

Links to more books, papers and videos will be posted on Canvas.

6
6/43
/43
Credit

• Assignments/projects (posted on Canvas): ∼ 50%


◦ Project centered course: one assignment for each topic.
◦ Projects involve coding up algorithms discussed in class and running them on
data.
◦ Recommended language: Python.
◦ Submitted work must be your own. Discussing problems is okay but must be
acknowledged. Code and parts of the writeup cannot be shared.
◦ Submission in .pdf via Canvas. Penalty for late submissions: 20% for 24
hours, 40% for 48 hours. No partial late homeworks.
◦ For typing up assignments LATEX is strongly preferred.
• Midterm ∼ 20%
• Final ∼ 30%

7
7/43
/43
8
8/43
/43
What is Machine Learning?
Two types of programming

1. Explicit: write a program that tells the computer what to do.


2. Learning: write a program that tells the computer how to learn what to do
from data. → This is what Machine Learning is about.

10
/43
10/43
11
11/43
/43
Machine Learning in the abstract

Given a training set {(x1 , y1 ) , (x2 , y2 ) , . . . , (xm , ym )} learn a function

f : x 7→ y

to predict the y ’s corresonding to future x ’s.

In particular, those in the test set


{(x′ 1 , y ′ 1 ) , (x′ 2 , y ′ 2 ) , . . . , (x′ m′ , y ′ m′ )} .

Actually, this is supervised learning. Modern ML also encompasses many


other types of learning problems.

12
12/43
/43
Nomenclature

• Each (x, y) pair is called an example (or learning instance).


• x is called the input ( x ∈ X , where X is the input space ) .
• y is called the output ( y ∈ Y , where Y is the output space )
• The learned function
f: X →Y
is called the hypothesis (because the algorithm can never be sure how
close it is to the “truth”).
• The space F from which the algorithm chooses f is called the
hypothesis class.

13
13/43
/43
Deductive vs. inductive inference

• Deductive inference:
rules −→ data

• Inductive inference:
data −→ rules

ML is all about inductive inference → “Brave New Science of Data”.


Humans are experts at induction. However, ML takes a different approach.

Question: Give examples of inductive vs. deductive inferential processes.


Question: What are the relative strengths of humans vs. machines in
learning?

14
14/43
/43
Typical ML task 1: Regression

15
15/43
/43
Typical ML task 2: Classification

16
16/43
/43
Typical ML task 3: Ranking

Internet search
Elections Sports

17
17/43
/43
Typical ML task 4: Clustering

18
18/43
/43
ML task 5: Dimensionality Reduction

19
/43
19/43
Applied vs. theoretical ML

• Practitioners focus on solving real-world problems with ML (building


autonomous cars, finding disease genes, earning lots of money, etc.).
• Theorists work on devising new general purpose learning algorithms and
analyzing their behavior.
“Much of the art of machine learning is to reduce a range of disparate
problems to a fairly narrow set of prototypes. Much of the science of machine
learning is to then solve those problems and provide good guarantees.”
(Smola & Vishwanathan)

This course will focus on the fundamental algorithms rather than specific
applications.

20
/43
20/43
Origins: Classical Artificial Intelligence
AI vs. ML

Solves practical problems


Attempts to replicate human in- which humans think require
telligence in general. intelligence.

22
22/43
/43
Early attempts

The “Mechanical Turk” (Wolfgang von Kempelen, 1770)

23
23/43
/43
Formal reasoning = intelligence?

• Formal logic (Frege (1879) and others)


• Mathematics as a formal system (Russell & Whitehead, ∼ 1910)
• Gödel’s incompleteness results (1931)
• Turing machines and universality (1936)

“Since formal systems are the pinnacle of human achievement, intelligence


must be synonymous with formal reasoning.”

24
24/43
/43
Is the brain just a computer?
Pitts & McCullogh show that neurons appear to perform simple logical
operations (1943)

“So if all that the brain does is such mechanistic operations, then it should be
easy to imitate on Turing machines (i.e., computers)”

25
/43
25/43
The Turing test
In his landmark 1950 paper “Computing Machinery and Intelligence” Turing
proposes a positivist approach: “If a machine can fool a human into thinking
that it is a human, then it must be intelligent” → Weak AI

Prediction: “By the year 2000 machines with 120MB of memory would be
able to fool 30% of human judges in a 5min test”.

26
26/43
/43
Objections to the Turing test
Even if a computer passes the Turing test it cannot be truly intelligent
because...
1. Theological: computers have no soul
2. “Head in the sand”: it would be too scary
3. Mathematical: Godel incompleteness and such
4. Consciousness: Searle’s Chinese room argument
5. Disabilities: a machine will never be able to do fall in love/invent jokes/tell
right from wrong/etc.
6. Lady Lovelace’s: will never do anything original
7. The brain is not digital
8. The brain is not predictable
9. Extra-sensory perception

27
27/43
/43
28
28/43
/43
The Dartmouth conference (1956)

John McCarthy Marvin Minsky Allen Newell Herbert Simon


(1927–2011) (1927–2016) (1927–1992) (1916–2001)

”within a generation ... the problem of creating ’artificial intelligence’ will


substantially be solved” (Minsky)

29
29/43
/43
True beginnings: from philosophy to
building things
“We propose that a 2 month, 10 man study of artificial intelligence be carried
out during the summer of 1956 at Dartmouth College in Hanover, New
Hampshire. The study is to proceed on the basis of the conjecture that every
aspect of learning or any other feature of intelligence can in principle be so
precisely described that a machine can be made to simulate it. An attempt
will be made to find how to make machines use language, form abstractions
and concepts, solve kinds of problems now reserved for humans, and
improve themselves. We think that a significant advance can be made in one
or more of these problems if a carefully selected group of scientists work on it
together for a summer.”

McCarthy et al., 1955

30
/43
30/43
Early successes

• Newell and Simon’s “General Problem Solver” (1959)


• ELIZA (Weizenbaum, 1966)
• SHRDLU’s block world (Winograd 68–70)
• Prolog and expert systems 70’s–

31
31/43
/43
AI winters ’74-’80, ’87-’93

32
32/43
/43
New beginnings: Machine Learning
The birth of Machine Learning

Starting in late ’80’s, AI was transformed by a sequence of outside influences:


• Efficiently trainable neural network models
• Input from Physics community
• Influence of Bayesian Statistics
• Black box “geometric” learning algorithms
• Huge influence of the internet
• Firm foundations in Statistics
• Strong connections to optimization, signal processing, harmonic analysis,
probability, CS theory, ...
• MASSIVE PRACTICAL DEMAND

34
/43
34/43
The old vs. the new AI

Early: aiming for “general intelligence”, trying to imitate humans, tangled up


in formal systems and philosophy

New: pragmatic, focused on specific tasks, much closer ties to math and
statistics than neuroscience and logic, driver behind lots of technologies

Question: Classically, the subject that deals with the art of learning from data
is Statistics. So is ML just a branch of Statistics? No.

35
/43
35/43
Statistics
Nonaparametric statistics
Bayesian statistics
Probability
Empirical Process Theory

Computer Science
Artificial Intelligence
Computational Learning Th
Complexity Theory
Randomized Algorithms
Machine Databases
Learning Distributed Systems

Mathematics
Functional Analysis
Random geometry
Optimization
Numerical analysis

36
36/43
/43
Applications
NLP
Speech recogni-
tion
Translation
Computer Vision Summarization
Object detection Grading
Object recognition Search & rec.
Structure from motion Web search
Collaborative filtering
Ad placement

etc., etc.
Machine
Robotics
Learning
Autonomous vehicles
Robot assistants

Medical
Detection & imaging Finance
Automated diagno- High freq. trading
sis Portfolio selec-
tion
Comp Bio Risk analysis
Protein structure
Systems bio

37
37/43
/43
Hallmarks of ML
ML is ambitious:
• Datasets are often very high dimensional (∼ O(105 )) .
• Data is often abstract (structured objects vs. just vectors).
• Datasets are massive (∼ O(108 ) examples ) .
• Really want to build actual systems that work.

ML is brutal:
• Don’t need to think hard about the domain because with enough data,
even black box algorithms work really well (really?).
• Butcher the statistics as much as necessary to get an algorithm which
actually runs.
• Insist on algorithms that run in time
O(m3 ) → O(m2 ) → O(m) → o(m) .

38
38/43
/43
Taxonomy of Machine Learning
Taxonomy of machine learning 1.

Based on the output space Y :

• Classification: Y = {+1, −1}


Examples: spam/not spam, genuine/fraud, boy/girl,…
(generalization: multiclass classification Y = {1, 2, . . . , k} )
• Regression: Y = R
Examples: predict temperature tomorrow, price of a stock,…
(generalization: Y = Rd )
• Ranking: Y = Sn (group of permutations)
• Structured outputs: Y = anything
Examples: translate from Chinese to English, predict folding of protein,…

40
40/43
/43
Taxonomy of machine learning 2.

Based on the nature of the training data:

• Supervised learning: given {(xi , yi )}m


i=1 , learn f : X → Y .
Examples: classification, regression, …
• Unsupervised learning: given {x}m
i=1 , say something.
Examples: clustering, density estimation, dimensionality reduction,…
• Semi–supervised learning: given a (small) amount of labeled data
p
{(xi , yi )}m
i=1 and a (large) amount of unlabeled data {x}i=m+1 , learn
f : X → Y . Examples: learning parse trees, image search

41
41/43
/43
Taxonomy of machine learning 3.

Based on how the data is presented to the learner:

• Batch learning: see whole training set first, then predict on test
examples.
• Online learning: examples are presented one-by-one, first try and
predict yt , then find out what yt really is and learn from it.
• Transductive learning: like batch, but know test x′i ’s at training time.
• Active learning: algorithm can ask for next data point
• Reinforcement learning: exploring the world incurs a cost (games,
robotic control)

42
42/43
/43
Taxonomy of machine learning 4.

Based on the nature of the relationship between x and y :

• Deterministic: x fully determines y , so there is some ftrue out there so


that
y = ftrue (x).
• Stochasitic: x does not fully determine y , rather, for any given x , y is
drawn from some probability distribution px (y) .

In practical problems, invariably, we cannot assume a deterministic


relationship between inputs and outputs, so we use the stochastic model.

43
43/43
/43

You might also like