0% found this document useful (0 votes)

12 views

1 Introduction

Intro to Machine learning

Uploaded by

gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views

1 Introduction

Intro to Machine learning

Uploaded by

gaurav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 31

The world has over 6000 languages

Automated translation systems require paired data

[En] I think, therefore I am. <-> [Fr] Je pense, donc je suis.

Judge a man by his questions

rather than by his answers.

[En<->Fr]
paired text corpus translation
[En<->Fr] model

Il est encore plus facile de juger de l'esprit d'un

homme par ses questions que par ses réponses.

How many paired sentences are there for translating Maltese to Tibetan?
[En<->Fr]
[English] translation [French]
model
[En<->Es]
[English] translation [Spanish]
“Standard” machine translation: model
[Fr<->Es]
[French] translation [Spanish]
model

desired
language
[English] [French]

multilingual
“Multilingual” machine translation: [English] translation [Spanish]
model

[French] [Spanish]

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
desired
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Improved efficiency:
Translating into and out of rare languages works better if the model is
also trained on more common languages
What did they find?

Zero-shot machine translation:

E.g., train on English -> French, French -> English, and English ->
Spanish, and be able to translate French -> Spanish

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Spanish and Portuguese:

“Portuguese” weight
(Spanish weight = 1-w)

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Japanese and Korean:

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Russian and Belarusian:

Neither Russian nor

Belarusian!

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
What’s going on?

English French English French

standard
translation model
“thought” “thought”

the “thought” is a representation!

Representation learning Handling such complex inputs
requires representations

“Classic” view of machine learning:

“thought”
Il est encore plus facile de juger
de l'esprit d'un homme par ses
questions que par ses réponses.

The power of deep learning

lies in its ability to learn such
representations automatically
from data
English French
Deep Learning
Designing, Visualizing and Understanding Deep Neural Networks

CS W182/282A
Instructor: Sergey Levine
UC Berkeley
Course overview
• Broad overview of deep learning topics
• Neural network architectures
• Optimization algorithms
• Applications: vision, NLP
• Reinforcement learning
• Advanced topics
• Four homework programming assignments
• Neural network basics
• Convolutional and recurrent networks
• Natural language processing
• Reinforcement learning
• Two midterm exams
• Format TBD, but most likely will be a take-home exam
• Final project (group project, 2-3 people)
• Most important part of the course
• CS182: choose vision, NLP, or reinforcement learning
• CS282: self-directed and open-ended project
Course policies
Grading: Late policy:
30% midterms 5 slip days
40% programming homeworks strict late policy, no slack beyond slip days
30% final project no slip days for final project (due to grades deadline)

Prerequisites:
Excellent knowledge of calculus linear algebra
especially: multi-variate derivatives, matrix operations, solving linear systems
CS70 or STAT134, excellent knowledge of probability theory (including continuous random variables)
CS189, or a very strong statistics background
CS61B or equivalent, able to program in Python
What is machine learning?
What is deep learning?
What is machine learning?
computer
program [object label]

➢ How do we implement this program?

➢ A function is a set of rules for transforming inputs into outputs

➢ Sometimes we can define the rules by hand – this is called programming

➢ What if we don’t know the rules?

➢ What if the rules are too complex? Too many exceptions & special cases?
What is machine learning?
computer
program [object label]

➢ Instead of defining the input -> output relationship by hand, define a program that
acquires this relationship from data

➢ Key idea: if the rules that describe how inputs map to outputs are complex and full
of special cases & exceptions, it is easier to provide data or examples than to
implement those rules

➢ Question: Does this also apply to human and animal learning?

What are we learning?
computer
program [object label]

this describes a line

In general…
But what parameterization do we use?

[object label]

0.2 0.1 0.3 0.3

0.2 0.5 0.3 0.3

0.3 0.1 0.2 0.2

“Shallow” learning

[object label]

➢ Kind of a “compromise” solution: don’t hand-program the rules,

but hand-program the features

➢ Learning on top of the features can be simple (just like the 2D

example from before!)

➢ Coming up with good features is very hard!

From shallow learning to deep learning

input features label

learned features
what if we learn parameters here too? all the parameters are here
Multiple layers of representations?
Higher level representations are:
each arrow represents a
simple parameterized ➢ More abstract
transformation
(function) of the ➢ More invariant to nuisances
preceding layer
➢ Easier for predicting label

Coates, Lee, Raina, Ng.

So, what is deep learning?
➢ Machine learning with multiple layers of learned representations
“thought”

➢ The function that represents the transformation from input to

internal representation to output is usually a deep neural network
▪ This is a bit circular, because almost all multi-layer parametric
functions with learned parameters can be called neural
networks (more on this later)

➢ The parameters for every layer are usually (but not always!)
trained with respect to the overall task objective (e.g., accuracy)

▪ This is sometimes referred to as end-to-end learning

What makes deep learning work?
1950 1950: Turing describes how learning could be a path to machine intelligence

1957: Rosenblatt’s perceptron proposed as a practical learning method

1960

1969: Minsky & Papert publish book describing

1970 fundamental limitations of neural networks
most (but not all) mainstream research focuses on “shallow” learning

1980

1986: Backpropagation as a practical method for training deep nets

1990 1989: LeNet (neural network for handwriting recognition) what the heck
Huge wave of interest in ML community in happened here?
2000 probabilistic methods, convex optimization, but
mostly in shallow models

~2006: deep neural networks start gaining more attention

2010
2012: Krizhevsky’s AlexNet paper beats all other methods on ImageNet
What makes deep learning work?
1) Big models with many layers

2) Large datasets with many examples

3) Enough compute to handle all this

Model scale: is more layers better?
ResNet-152: 152 layers (2015)

LeNet, 7 layers (1989)

Krizhevsky’s model (AlexNet) for ImageNet, 8 layers (2012)

How big are the datasets?
MNIST (handwritten characters), 1990s - today: 60,000 images

CalTech 101, 2003: ~9,000 images

CIFAR 10, 2009: ~60,000 images

ILSVRC (ImageNet), 2009: 1.5 million images

How does it scale with compute?
What about NLP?

On what?? on this:
about 16 TPUs
(this photo shows a few
thousand of these)
So… it’s really expensive?
➢ One perspective: deep learning is not such a good idea, because it requires
huge models, huge amounts of data, and huge amounts of compute

➢ Another perspective: deep learning is great, because as we add more data,

more layers, and more compute, the models get better and better!
…which human?

human performance:
about 5% error
Model capacity: (informally) how
The underlying themes many different functions a particular
model class can represent (e.g., all
linear decision boundaries vs. non-
➢ Acquire representations by using high-capacity models and lots of data, linear boundaries).
without requiring manual engineering of features or representations
▪ Automation: we don’t need to know what the good features are,
Inductive bias: (informally) built-in
we can have the model figure it out from data
knowledge or biases in a model
▪ Better performance: when representations are learned end-to-end, designed to help it learned. All such
they are better tailored to the current task knowledge is “bias” in the sense that
it makes some solutions more likely
➢ Learning vs. inductive bias (“nature vs. nurture”): models that get most of and some less likely.
their performance from their data rather than from designer insight
▪ Inductive bias: what we build into the model to make it learn Scaling: (informally) ability for an
effectively (we can never fully get rid of this!) algorithm to work better as more data
and model capacity is added.
▪ Should we build in knowledge, or better machinery for learning
and scale?

➢ Algorithms that scale: This often refers to methods that can get better and
better as we add more data, representational capacity, and compute
Why do we call them neural nets?
Early on, neural networks were proposed as a rudimentary model of neurons in the brain
artificial “neuron” sums up signals
dendrites receive signals from other neurons
from upstream neurons
(also referred to as “units”)

neuron “decides” upstream activations

whether to fire based
on incoming signals
neuron “decides” how
much to fire based on
axon transmits signal to incoming signals
downstream neurons
activations transmitted
to downstream units “activation function”

layer 1 layer 2

Is this a good model for real neurons?

• Crudely models some neuron function
• Missing many other important anatomical details
• Don’t take it too seriously
What does deep learning have to do
with the brain?

Does this mean that the brain does deep learning?

Or does it mean that any sufficiently powerful learning

machine will basically derive the same solution?

TEF CANADA Express Guide: 45 min to double your score
From Everand
TEF CANADA Express Guide: 45 min to double your score
Jean K. MATHIEU
1/5 (1)
Deep Learning: A Visual Introduction
No ratings yet
Deep Learning: A Visual Introduction
53 pages
French Frequency Dictionary - 1000 Key & Common French Words in Context: French-English, #0
From Everand
French Frequency Dictionary - 1000 Key & Common French Words in Context: French-English, #0
MostUsedWords Com
5/5 (1)
Materialise 3-Matic - Lightweight Structures Videos
No ratings yet
Materialise 3-Matic - Lightweight Structures Videos
14 pages
An International Perspective: Weihrich and Koontz
No ratings yet
An International Perspective: Weihrich and Koontz
79 pages
Unit - 1 Deep Learning 3-2
No ratings yet
Unit - 1 Deep Learning 3-2
15 pages
MODULE 1 DL SNOTES
No ratings yet
MODULE 1 DL SNOTES
11 pages
1 AI_Introduction and ML
No ratings yet
1 AI_Introduction and ML
32 pages
Lect 4-Introduction to Deep Learning
No ratings yet
Lect 4-Introduction to Deep Learning
33 pages
Unit 1a - Fundamentals of Deep Learning
No ratings yet
Unit 1a - Fundamentals of Deep Learning
54 pages
Unit I - Fundamentals of DL
No ratings yet
Unit I - Fundamentals of DL
41 pages
UNIT I part 1 notes
No ratings yet
UNIT I part 1 notes
28 pages
uNIT 1
No ratings yet
uNIT 1
16 pages
Introduction To Deep Learning: by Gargee Sanyal
No ratings yet
Introduction To Deep Learning: by Gargee Sanyal
20 pages
Unit-4 ML Notes Part-2
No ratings yet
Unit-4 ML Notes Part-2
21 pages
chapter 1
No ratings yet
chapter 1
6 pages
Understanding Deep Learning
100% (1)
Understanding Deep Learning
39 pages
Deep Learning
No ratings yet
Deep Learning
285 pages
ML Archs
No ratings yet
ML Archs
36 pages
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
No ratings yet
Jntuk r20 Unit-I Deep Learning Techniques (WWW - Jntumaterials.co - In)
23 pages
DL Unit 1
No ratings yet
DL Unit 1
200 pages
Bay Learn 2015 Deep Mind
No ratings yet
Bay Learn 2015 Deep Mind
69 pages
Deep Learning
100% (3)
Deep Learning
32 pages
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
No ratings yet
Introduction To Deep Learning: Radu Ionescu, Prof. PHD
90 pages
Neural Networks1
No ratings yet
Neural Networks1
164 pages
Lecture1 ANN -Full
No ratings yet
Lecture1 ANN -Full
66 pages
Introduction To Deep Learning - Class
No ratings yet
Introduction To Deep Learning - Class
20 pages
Module1_ Deep Learning
No ratings yet
Module1_ Deep Learning
26 pages
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
No ratings yet
JNTUK R20 B.Tech CSE 4-1 Deep Learning Techniques Unit 1 Notes
15 pages
Unit I
No ratings yet
Unit I
10 pages
Unit-3
No ratings yet
Unit-3
16 pages
Introduction (BT4222) YL
No ratings yet
Introduction (BT4222) YL
48 pages
Deep Learning Final Sheet
No ratings yet
Deep Learning Final Sheet
915 pages
Unit - 1 Deep Learning Techniques
No ratings yet
Unit - 1 Deep Learning Techniques
18 pages
Deep Learning PIAIC
100% (1)
Deep Learning PIAIC
229 pages
Deep Learning
100% (3)
Deep Learning
207 pages
AI Chapter 4
No ratings yet
AI Chapter 4
63 pages
Deep Learning's Diminishing Returns
No ratings yet
Deep Learning's Diminishing Returns
9 pages
Nasslli ML 2018 Slides
No ratings yet
Nasslli ML 2018 Slides
243 pages
1-Introduction
No ratings yet
1-Introduction
81 pages
The Little Book of Deep Learning François Fleuret download pdf
100% (3)
The Little Book of Deep Learning François Fleuret download pdf
55 pages
Lecture 1a - Introduction
No ratings yet
Lecture 1a - Introduction
38 pages
Deep Learning University
No ratings yet
Deep Learning University
129 pages
Machine Learning: Herbert Jaeger
No ratings yet
Machine Learning: Herbert Jaeger
100 pages
deeplearning_avisualapproach_preview
No ratings yet
deeplearning_avisualapproach_preview
5 pages
Deep Learning Module-01 Search Creators
No ratings yet
Deep Learning Module-01 Search Creators
17 pages
DL - Unit1
No ratings yet
DL - Unit1
59 pages
LBDL
No ratings yet
LBDL
185 pages
Deep Learning Lab Manual
No ratings yet
Deep Learning Lab Manual
65 pages
(Ebook) Deep Learning with PyTorch, Second Edition (MEAP V03) by Howard Huang ISBN 9781633438859, 1633438856pdf download
100% (4)
(Ebook) Deep Learning with PyTorch, Second Edition (MEAP V03) by Howard Huang ISBN 9781633438859, 1633438856pdf download
46 pages
What's The Difference Between AI, Machine Learning
No ratings yet
What's The Difference Between AI, Machine Learning
21 pages
Deep Learning
No ratings yet
Deep Learning
7 pages
Deep Learning
No ratings yet
Deep Learning
48 pages
Insidedeeplearning Preview
No ratings yet
Insidedeeplearning Preview
5 pages
Unit 3
No ratings yet
Unit 3
21 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
Unit 4
No ratings yet
Unit 4
27 pages
(IJCST-V9I4P17) :yew Kee Wong
No ratings yet
(IJCST-V9I4P17) :yew Kee Wong
4 pages
Deep Learning
No ratings yet
Deep Learning
10 pages
DNN Merged Sugata
No ratings yet
DNN Merged Sugata
243 pages
1 DEEP LEARNING 2324 (1)
No ratings yet
1 DEEP LEARNING 2324 (1)
55 pages
Master Portuguese Grammar Instantly: Tenses (Volume 2): Speak Portuguese with Confidence, #4
From Everand
Master Portuguese Grammar Instantly: Tenses (Volume 2): Speak Portuguese with Confidence, #4
Mohamed Elshenawy
No ratings yet
Jayprakash Rathore
No ratings yet
Jayprakash Rathore
3 pages
Tooth Contact Analysis and Manufacturing of Dual Lead Worm Gears in ISO Type I
No ratings yet
Tooth Contact Analysis and Manufacturing of Dual Lead Worm Gears in ISO Type I
13 pages
04a. Secant, cosecant and cotangent
No ratings yet
04a. Secant, cosecant and cotangent
2 pages
Lesson 3: Rates of Change in Linear & Quadratic Functions: The Average Rate of Change and The Secant Line
No ratings yet
Lesson 3: Rates of Change in Linear & Quadratic Functions: The Average Rate of Change and The Secant Line
6 pages
Cambridge IGCSE ™: Co-Ordinated Sciences 0654/61 October/November 2022
No ratings yet
Cambridge IGCSE ™: Co-Ordinated Sciences 0654/61 October/November 2022
9 pages
Notes - Assembler Directives PDF
100% (1)
Notes - Assembler Directives PDF
23 pages
Exam Format & Tips To Pass P1
No ratings yet
Exam Format & Tips To Pass P1
4 pages
REVIEW FINAL MATHEMATICS Grade 1 - Final Semester Test - First Semester School Year 2024 - 2025
No ratings yet
REVIEW FINAL MATHEMATICS Grade 1 - Final Semester Test - First Semester School Year 2024 - 2025
6 pages
EstimationIdentification of Hydrodynamic Coefficients in Ship Maneuvering Equations of Motion by Estimation-Before-Modeling Technique Before Modeling
No ratings yet
EstimationIdentification of Hydrodynamic Coefficients in Ship Maneuvering Equations of Motion by Estimation-Before-Modeling Technique Before Modeling
26 pages
Introduction To Cryptography: Marcus K. G. Adomey
No ratings yet
Introduction To Cryptography: Marcus K. G. Adomey
52 pages
Module2 PDF
No ratings yet
Module2 PDF
29 pages
A Framework To Guide Practitioners For Selecting Metrics During The Countermovement and Drop Jump Tests
No ratings yet
A Framework To Guide Practitioners For Selecting Metrics During The Countermovement and Drop Jump Tests
9 pages
Chap 5 Flexural and Shear Stresses A
No ratings yet
Chap 5 Flexural and Shear Stresses A
67 pages
VTK Common Computational Geometry Hierarchy
No ratings yet
VTK Common Computational Geometry Hierarchy
17 pages
R Notes For Professionals
100% (1)
R Notes For Professionals
475 pages
4.6 The Gamma Probability Distribution
No ratings yet
4.6 The Gamma Probability Distribution
18 pages
Descriptive Statistics: X N X N X ... X X X
No ratings yet
Descriptive Statistics: X N X N X ... X X X
8 pages
Second Periodical Test in Math 10
No ratings yet
Second Periodical Test in Math 10
5 pages
Python Data Types - Jupyter Notebook_017ae85a1acb61f561c41b1ab55449c7
No ratings yet
Python Data Types - Jupyter Notebook_017ae85a1acb61f561c41b1ab55449c7
17 pages
Modern Physics
No ratings yet
Modern Physics
3 pages
Contractor Pre-Qualification PDF
No ratings yet
Contractor Pre-Qualification PDF
12 pages
Eidlin-Levy, Fares, Rubisten (2021)
No ratings yet
Eidlin-Levy, Fares, Rubisten (2021)
16 pages
AI Lab Report CSIT 4th Semester
No ratings yet
AI Lab Report CSIT 4th Semester
28 pages
String Manipulation Notes
No ratings yet
String Manipulation Notes
44 pages
SMB 3425 Operations Research I
No ratings yet
SMB 3425 Operations Research I
4 pages
Recursion: Fall 2002 CMSC 203 - Discrete Structures 1
No ratings yet
Recursion: Fall 2002 CMSC 203 - Discrete Structures 1
18 pages
Script Bullex Chapeleiro
No ratings yet
Script Bullex Chapeleiro
16 pages
01 Combinatorial Logic
No ratings yet
01 Combinatorial Logic
30 pages

1 Introduction

Uploaded by

1 Introduction

Uploaded by

The world has over 6000 languages

Automated translation systems require paired data

[En] I think, therefore I am. <-> [Fr] Je pense, donc je suis.

Judge a man by his questions

Il est encore plus facile de juger de l'esprit d'un

Zero-shot machine translation:

Translating English to mix of Spanish and Portuguese:

Translating English to mix of Japanese and Korean:

Translating English to mix of Russian and Belarusian:

Neither Russian nor

English French English French

the “thought” is a representation!

“Classic” view of machine learning:

The power of deep learning

➢ How do we implement this program?

➢ A function is a set of rules for transforming inputs into outputs

➢ Sometimes we can define the rules by hand – this is called programming

➢ What if we don’t know the rules?

➢ Question: Does this also apply to human and animal learning?

this describes a line

0.2 0.1 0.3 0.3

0.2 0.5 0.3 0.3

0.3 0.1 0.2 0.2

0.3 0.1 0.2 0.2

➢ Kind of a “compromise” solution: don’t hand-program the rules,

➢ Learning on top of the features can be simple (just like the 2D

➢ Coming up with good features is very hard!

input features label

Coates, Lee, Raina, Ng.

➢ The function that represents the transformation from input to

▪ This is sometimes referred to as end-to-end learning

1957: Rosenblatt’s perceptron proposed as a practical learning method

1969: Minsky & Papert publish book describing

1986: Backpropagation as a practical method for training deep nets

~2006: deep neural networks start gaining more attention

2) Large datasets with many examples

3) Enough compute to handle all this

LeNet, 7 layers (1989)

Krizhevsky’s model (AlexNet) for ImageNet, 8 layers (2012)

CalTech 101, 2003: ~9,000 images

CIFAR 10, 2009: ~60,000 images

ILSVRC (ImageNet), 2009: 1.5 million images

➢ Another perspective: deep learning is great, because as we add more data,

neuron “decides” upstream activations

Is this a good model for real neurons?

Does this mean that the brain does deep learning?

Or does it mean that any sufficiently powerful learning

You might also like