0% found this document useful (0 votes)
12 views

1 Introduction

Intro to Machine learning

Uploaded by

gaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views

1 Introduction

Intro to Machine learning

Uploaded by

gaurav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

The world has over 6000 languages

Automated translation systems require paired data

[En] I think, therefore I am. <-> [Fr] Je pense, donc je suis.

Judge a man by his questions


rather than by his answers.

[En<->Fr]
paired text corpus translation
[En<->Fr] model

Il est encore plus facile de juger de l'esprit d'un


homme par ses questions que par ses réponses.

How many paired sentences are there for translating Maltese to Tibetan?
[En<->Fr]
[English] translation [French]
model
[En<->Es]
[English] translation [Spanish]
“Standard” machine translation: model
[Fr<->Es]
[French] translation [Spanish]
model

desired
language
[English] [French]

multilingual
“Multilingual” machine translation: [English] translation [Spanish]
model

[French] [Spanish]

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
desired
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Improved efficiency:
Translating into and out of rare languages works better if the model is
also trained on more common languages
What did they find?

Zero-shot machine translation:


E.g., train on English -> French, French -> English, and English ->
Spanish, and be able to translate French -> Spanish

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Spanish and Portuguese:

“Portuguese” weight
(Spanish weight = 1-w)

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Japanese and Korean:

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]

multilingual
[English] translation [Spanish]
model

[French] [Spanish]

Translating English to mix of Russian and Belarusian:

Neither Russian nor


Belarusian!

Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
What’s going on?

English French English French


standard
translation model
“thought” “thought”

the “thought” is a representation!


Representation learning Handling such complex inputs
requires representations

“Classic” view of machine learning:


“thought”
Il est encore plus facile de juger
de l'esprit d'un homme par ses
questions que par ses réponses.

The power of deep learning


lies in its ability to learn such
representations automatically
from data
English French
Deep Learning
Designing, Visualizing and Understanding Deep Neural Networks

CS W182/282A
Instructor: Sergey Levine
UC Berkeley
Course overview
• Broad overview of deep learning topics
• Neural network architectures
• Optimization algorithms
• Applications: vision, NLP
• Reinforcement learning
• Advanced topics
• Four homework programming assignments
• Neural network basics
• Convolutional and recurrent networks
• Natural language processing
• Reinforcement learning
• Two midterm exams
• Format TBD, but most likely will be a take-home exam
• Final project (group project, 2-3 people)
• Most important part of the course
• CS182: choose vision, NLP, or reinforcement learning
• CS282: self-directed and open-ended project
Course policies
Grading: Late policy:
30% midterms 5 slip days
40% programming homeworks strict late policy, no slack beyond slip days
30% final project no slip days for final project (due to grades deadline)

Prerequisites:
Excellent knowledge of calculus linear algebra
especially: multi-variate derivatives, matrix operations, solving linear systems
CS70 or STAT134, excellent knowledge of probability theory (including continuous random variables)
CS189, or a very strong statistics background
CS61B or equivalent, able to program in Python
What is machine learning?
What is deep learning?
What is machine learning?
computer
program [object label]

➢ How do we implement this program?

➢ A function is a set of rules for transforming inputs into outputs

➢ Sometimes we can define the rules by hand – this is called programming

➢ What if we don’t know the rules?

➢ What if the rules are too complex? Too many exceptions & special cases?
What is machine learning?
computer
program [object label]

➢ Instead of defining the input -> output relationship by hand, define a program that
acquires this relationship from data

➢ Key idea: if the rules that describe how inputs map to outputs are complex and full
of special cases & exceptions, it is easier to provide data or examples than to
implement those rules

➢ Question: Does this also apply to human and animal learning?


What are we learning?
computer
program [object label]

this describes a line


In general…
But what parameterization do we use?

[object label]

0.2 0.1 0.3 0.3

0.2 0.5 0.3 0.3

0.3 0.1 0.2 0.2

0.3 0.1 0.2 0.2


“Shallow” learning

[object label]

➢ Kind of a “compromise” solution: don’t hand-program the rules,


but hand-program the features

➢ Learning on top of the features can be simple (just like the 2D


example from before!)

➢ Coming up with good features is very hard!


From shallow learning to deep learning

input features label


learned features
what if we learn parameters here too? all the parameters are here
Multiple layers of representations?
Higher level representations are:
each arrow represents a
simple parameterized ➢ More abstract
transformation
(function) of the ➢ More invariant to nuisances
preceding layer
➢ Easier for predicting label

Coates, Lee, Raina, Ng.


So, what is deep learning?
➢ Machine learning with multiple layers of learned representations
“thought”

➢ The function that represents the transformation from input to


internal representation to output is usually a deep neural network
▪ This is a bit circular, because almost all multi-layer parametric
functions with learned parameters can be called neural
networks (more on this later)

➢ The parameters for every layer are usually (but not always!)
trained with respect to the overall task objective (e.g., accuracy)

▪ This is sometimes referred to as end-to-end learning


What makes deep learning work?
1950 1950: Turing describes how learning could be a path to machine intelligence

1957: Rosenblatt’s perceptron proposed as a practical learning method


1960

1969: Minsky & Papert publish book describing


1970 fundamental limitations of neural networks
most (but not all) mainstream research focuses on “shallow” learning

1980

1986: Backpropagation as a practical method for training deep nets


1990 1989: LeNet (neural network for handwriting recognition) what the heck
Huge wave of interest in ML community in happened here?
2000 probabilistic methods, convex optimization, but
mostly in shallow models

~2006: deep neural networks start gaining more attention


2010
2012: Krizhevsky’s AlexNet paper beats all other methods on ImageNet
What makes deep learning work?
1) Big models with many layers

2) Large datasets with many examples

3) Enough compute to handle all this


Model scale: is more layers better?
ResNet-152: 152 layers (2015)

LeNet, 7 layers (1989)

Krizhevsky’s model (AlexNet) for ImageNet, 8 layers (2012)


How big are the datasets?
MNIST (handwritten characters), 1990s - today: 60,000 images

CalTech 101, 2003: ~9,000 images

CIFAR 10, 2009: ~60,000 images

ILSVRC (ImageNet), 2009: 1.5 million images


How does it scale with compute?
What about NLP?

On what?? on this:
about 16 TPUs
(this photo shows a few
thousand of these)
So… it’s really expensive?
➢ One perspective: deep learning is not such a good idea, because it requires
huge models, huge amounts of data, and huge amounts of compute

➢ Another perspective: deep learning is great, because as we add more data,


more layers, and more compute, the models get better and better!
…which human?

human performance:
about 5% error
Model capacity: (informally) how
The underlying themes many different functions a particular
model class can represent (e.g., all
linear decision boundaries vs. non-
➢ Acquire representations by using high-capacity models and lots of data, linear boundaries).
without requiring manual engineering of features or representations
▪ Automation: we don’t need to know what the good features are,
Inductive bias: (informally) built-in
we can have the model figure it out from data
knowledge or biases in a model
▪ Better performance: when representations are learned end-to-end, designed to help it learned. All such
they are better tailored to the current task knowledge is “bias” in the sense that
it makes some solutions more likely
➢ Learning vs. inductive bias (“nature vs. nurture”): models that get most of and some less likely.
their performance from their data rather than from designer insight
▪ Inductive bias: what we build into the model to make it learn Scaling: (informally) ability for an
effectively (we can never fully get rid of this!) algorithm to work better as more data
and model capacity is added.
▪ Should we build in knowledge, or better machinery for learning
and scale?

➢ Algorithms that scale: This often refers to methods that can get better and
better as we add more data, representational capacity, and compute
Why do we call them neural nets?
Early on, neural networks were proposed as a rudimentary model of neurons in the brain
artificial “neuron” sums up signals
dendrites receive signals from other neurons
from upstream neurons
(also referred to as “units”)

neuron “decides” upstream activations


whether to fire based
on incoming signals
neuron “decides” how
much to fire based on
axon transmits signal to incoming signals
downstream neurons
activations transmitted
to downstream units “activation function”

layer 1 layer 2

Is this a good model for real neurons?


• Crudely models some neuron function
• Missing many other important anatomical details
• Don’t take it too seriously
What does deep learning have to do
with the brain?

Does this mean that the brain does deep learning?

Or does it mean that any sufficiently powerful learning


machine will basically derive the same solution?

You might also like