1 Introduction
1 Introduction
[En<->Fr]
paired text corpus translation
[En<->Fr] model
How many paired sentences are there for translating Maltese to Tibetan?
[En<->Fr]
[English] translation [French]
model
[En<->Es]
[English] translation [Spanish]
“Standard” machine translation: model
[Fr<->Es]
[French] translation [Spanish]
model
desired
language
[English] [French]
multilingual
“Multilingual” machine translation: [English] translation [Spanish]
model
[French] [Spanish]
Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
desired
language
[English] [French]
multilingual
[English] translation [Spanish]
model
[French] [Spanish]
Improved efficiency:
Translating into and out of rare languages works better if the model is
also trained on more common languages
What did they find?
Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]
multilingual
[English] translation [Spanish]
model
[French] [Spanish]
“Portuguese” weight
(Spanish weight = 1-w)
Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]
multilingual
[English] translation [Spanish]
model
[French] [Spanish]
Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
mix of language
desired (e.g., 40% Spanish, 60% French)
language
[English] [French]
multilingual
[English] translation [Spanish]
model
[French] [Spanish]
Johnson et al. Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation. 2016.
What’s going on?
CS W182/282A
Instructor: Sergey Levine
UC Berkeley
Course overview
• Broad overview of deep learning topics
• Neural network architectures
• Optimization algorithms
• Applications: vision, NLP
• Reinforcement learning
• Advanced topics
• Four homework programming assignments
• Neural network basics
• Convolutional and recurrent networks
• Natural language processing
• Reinforcement learning
• Two midterm exams
• Format TBD, but most likely will be a take-home exam
• Final project (group project, 2-3 people)
• Most important part of the course
• CS182: choose vision, NLP, or reinforcement learning
• CS282: self-directed and open-ended project
Course policies
Grading: Late policy:
30% midterms 5 slip days
40% programming homeworks strict late policy, no slack beyond slip days
30% final project no slip days for final project (due to grades deadline)
Prerequisites:
Excellent knowledge of calculus linear algebra
especially: multi-variate derivatives, matrix operations, solving linear systems
CS70 or STAT134, excellent knowledge of probability theory (including continuous random variables)
CS189, or a very strong statistics background
CS61B or equivalent, able to program in Python
What is machine learning?
What is deep learning?
What is machine learning?
computer
program [object label]
➢ What if the rules are too complex? Too many exceptions & special cases?
What is machine learning?
computer
program [object label]
➢ Instead of defining the input -> output relationship by hand, define a program that
acquires this relationship from data
➢ Key idea: if the rules that describe how inputs map to outputs are complex and full
of special cases & exceptions, it is easier to provide data or examples than to
implement those rules
[object label]
[object label]
➢ The parameters for every layer are usually (but not always!)
trained with respect to the overall task objective (e.g., accuracy)
1980
On what?? on this:
about 16 TPUs
(this photo shows a few
thousand of these)
So… it’s really expensive?
➢ One perspective: deep learning is not such a good idea, because it requires
huge models, huge amounts of data, and huge amounts of compute
human performance:
about 5% error
Model capacity: (informally) how
The underlying themes many different functions a particular
model class can represent (e.g., all
linear decision boundaries vs. non-
➢ Acquire representations by using high-capacity models and lots of data, linear boundaries).
without requiring manual engineering of features or representations
▪ Automation: we don’t need to know what the good features are,
Inductive bias: (informally) built-in
we can have the model figure it out from data
knowledge or biases in a model
▪ Better performance: when representations are learned end-to-end, designed to help it learned. All such
they are better tailored to the current task knowledge is “bias” in the sense that
it makes some solutions more likely
➢ Learning vs. inductive bias (“nature vs. nurture”): models that get most of and some less likely.
their performance from their data rather than from designer insight
▪ Inductive bias: what we build into the model to make it learn Scaling: (informally) ability for an
effectively (we can never fully get rid of this!) algorithm to work better as more data
and model capacity is added.
▪ Should we build in knowledge, or better machinery for learning
and scale?
➢ Algorithms that scale: This often refers to methods that can get better and
better as we add more data, representational capacity, and compute
Why do we call them neural nets?
Early on, neural networks were proposed as a rudimentary model of neurons in the brain
artificial “neuron” sums up signals
dendrites receive signals from other neurons
from upstream neurons
(also referred to as “units”)
layer 1 layer 2