0% found this document useful (0 votes)

84 views38 pages

Do Large Language Models Need Sensory Grounding For Meaning and Understanding?

The document discusses the importance of sensory grounding for meaning and understanding in humans and whether large language models also need it.

Uploaded by

Zakhar Kogan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

84 views38 pages

Do Large Language Models Need Sensory Grounding For Meaning and Understanding?

The document discusses the importance of sensory grounding for meaning and understanding in humans and whether large language models also need it.

Uploaded by

Zakhar Kogan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Philosophy of Deep Learning, NYU

Do large language models

need sensory grounding
for meaning and
understanding?

Spoiler: YES!
Yann LeCun
Courant Institute & Center for Data Science, NYU
Meta – Fundamental AI Research
NYU
2023-03-24
Generated with Make-A-Scene
Y. LeCun

Machine Learning sucks! (compared to humans and animals)

Supervised learning (SL) requires large numbers of labeled samples.

Reinforcement learning (RL) requires insane amounts of trials.
Self-Supervised Learning (SSL) requires large numbers of unlabeled
samples.
Most current ML-based AI systems:
make stupid mistakes, do not reason nor plan
Animals and humans:
Can learn new tasks very quickly.
Understand how the world works
Can reason and plan
Humans and animals have common sense
current machines, not so much (it’s very superficial).
Self-Supervised Learning
has
taken over the world

For understanding & generation

of images, audio, text...

Generated with Make-A-Scene

Y. LeCun

Self-Supervised Learning = Learning to Fill in the Blanks

Reconstruct the input or Predict missing parts of the input.
time or space →
Y. LeCun

Self-Supervised Learning = Learning to Fill in the Blanks

Reconstruct the input or Predict missing parts of the input.
time or space →
Y. LeCun

SSL via Denoising Auto-Encoder / Masked Auto-Encoder

BERT [Devlin 2018]
RoBERTa [Ott 2019] ỹ C(y,ỹ)
….
Decoder

Learned
representation

Encoder

ŷ corruption y

This is a [...] of text extracted This is a piece of text extracted

[...] a large set of [...] articles from a large set of news articles
Y. LeCun

Auto-Regressive Generative Models

Outputs one “token” after another

Tokens may represent words, image patches, speech segments...

Stochastic
Encoder
Predictor

x[t-3] x[t-2] x[t-1] x[t] x[t+1]

Prompt Predicted token

Stochastic
Encoder
Predictor

x[t-2] x[t-1] x[t] x[t+1] x[t+2]

Context
Y. LeCun

Auto-Regressive Large Language Models (AR-LLMs)

Outputs one text token after another
Tokens may represent words or subwords
Encoder/predictor is a transformer architecture
With billions of parameters: typically from 1B to 500B
Training data: 1 to 2 trillion tokens
LLMs for dialog/text generation:
BlenderBot, Galactica, LLaMA (FAIR), Alpaca (Stanford), LaMDA/Bard
(Google), Chinchilla (DeepMind), ChatGPT (OpenAI), GPT-4 ??…
Performance is amazing … but … they make stupid mistakes
Factual errors, logical errors, inconsistency, limited reasoning, toxicity...
LLMs have no knowledge of the underlying reality
They have no common sense & they can’t plan their answer
Y. LeCun

Unpopular Opinion about AR-LLMs

Auto-Regressive LLMs are doomed.

They cannot be made factual, non-toxic, etc.
They are not controllable Tree of “correct”
answers Tree of all possible
token sequences
Probability e that any produced token takes
us outside of the set of correct answers
Probability that answer of length n is
correct:
n
P(correct) = (1-e)

This diverges exponentially.

It’s not fixable.
Y. LeCun

Auto-Regressive Generative Models Suck!

AR-LLMs
Have a constant number of computational steps between input and
output. Weak representational power.
Do not really reason. Do not really plan

Humans and many animals

Understand how the world works.
Can predict the consequences of their actions.
Can perform chains of reasoning with an unlimited number of steps.
Can plan complex tasks by decomposing it into sequences of subtasks
Y. LeCun
pointing
How could machines learn like animals and humans?
Social
helping vs false perceptual
Communication hindering beliefs

face tracking
How can babies
Actions rational, goal-
directed actions learn how the
Perception

biological
motion world works?
gravity, inertia
Physics stability,
support
conservation of
momentum
How can
teenagers learn
Object permanence shape
constancy to drive with
Objects solidity, rigidity 20h of practice?
[Emmanuel natural kind categories Age (months)
Dupoux] 0
Production

1 2 3 4 5 6 7 8 9 10 11 12 13 14
proto-imitation
crawling walking
emotional contagion
Y. LeCun

Three challenges for AI & Machine Learning

1. Learning representations and predictive models of the world
Supervised and reinforcement learning require too many samples/trials
Self-supervised learning / learning dependencies / to fill in the blanks
learning to represent the world in a non task-specific way
Learning predictive models for planning and control
2. Learning to reason, like Daniel Kahneman’s “System 2”
Beyond feed-forward, System 1 subconscious computation.
Making reasoning compatible with learning.
Reasoning and planning as energy minimization.
3. Learning to plan complex action sequences
Learning hierarchical representations of action plans
A Cognitive Architecture
capable of
reasoning & planning
Position paper:
“A path towards autonomous machine intelligence”
https://openreview.net/forum?id=BZ5a1r-kVsf

Longer talk: search “LeCun Berkeley” on YouTube

Y. LeCun

Modular Architecture for Autonomous AI

Configurator
Configures other modules for task configurator
Perception Short-term
memory
Estimates state of the world World Model
World Model
Predicts future world states Perception
Actor Critic
Cost Intrinsic Cost
Compute “discomfort” cost

Actor
Find optimal action sequences action
Short-Term Memory
Stores state-cost episodes percept
Y. LeCun

Mode-1 Perception-Action Cycle

Perception module s[0]=Enc(x)

Extract representation of the world
Policy module A(s[0])
C(s[0]) C(s[1])
Computes an action reactively
Cost module C(s[0]) s[0] s[1]
Computes cost of state Pred(s,a)

Optionally: action A(s)

World Model Pred(s,a) a[0] Actor
Predicts future state
Stores states and costs in short-term
memory
Y. LeCun

Mode-2 Perception-Planning-Action Cycle

Akin to classical Model-Predictive Control (MPC)
Actor proposes an ation sequence
World Model predicts outcome
Actor optimizes action sequence to minimize cost
e.g. using gradient descent, dynamic programming, MC tree search…
Actor sends first action(s) to effectors
C(s[0]) C(s[t]) C(s[t+1]) C(s[T-1]) C(s[T])

Pred(s,a) Pred(s,a) Pred(s,a) Pred(s,a)

s[0] s[t] s[t+1] s[T-1]

action
a[0] a[t] a[t+1] a[T-1]
Actor

[Henaff et al ICLR 19],[Hafner et al. ICML 19],[Chaplot et al. ICML 21],[Escontrela CoRL 22],...
Y. LeCun

Cost Module

Intrinsic Cost (IC)

Immutable cost modules.
Hard-wired drives. Intrinsic Cost (IC) Trainable Cost / Critic (TC)
IC1(s) IC2(s) ... ICk(s) TC1(s) TC2(s) ... TCl(s)

Trainable Cost (TC)

Trainable
Predicts future values of IC s
Equivalent to a critic in RL
Implements subgoals
Configurable
All are differentiable
Building & Training
the World Model
Energy-Based Models
Joint-Embedding Architecture
Y. LeCun

How do we represent uncertainty in the predictions?

[Mathieu,
The world is only partially Couprie,
predictable LeCun
How can a predictive model ICLR 2016]
represent multiple
predictions?
Probabilistic models are
intractable in continuous
domains.
Generative Models must
predict every detail of the
world
My solution: Joint-
Embedding Predictive
Architecture [Henaff, Canziani, LeCun ICLR 2019]
Y. LeCun

Architecture for the world model: JEPA

JEPA: Joint Embedding
Predictive Architecture.
x: observed past and present
y: future
a: action
z: latent variable (unknown)
D( ): prediction cost
C( ): surrogate cost
JEPA predicts a representation
of the future Sy from a
representation of the past and
present Sx
Y. LeCun

Architectures: Generative vs Joint Embedding

Generative: predicts y (with all the details, including irrelevant ones)
Joint Embedding: predicts an abstract representation of y

a) Generative Architecture b) Joint Embedding Architecture

Examples: VAE, MAE...
Y. LeCun

Energy-Based Models: Implicit function

The only way to formalize & understand all model types
Gives low energy to compatible pairs of x and y
Gives higher energy to incompatible pairs
Energy
Landscape
F(x,y) y

x y

time or space →

x
Y. LeCun

EBM Training: two categories of methods

Contrastive methods y
Push down on energy of Contrastive
samples

training samples Low energy

region Contrastive
y
Method
Pull up on energy of
suitably-generated
contrastive samples x

Scales very badly with y

dimension x

Regularized Methods Training

samples Regularized
Regularizer minimizes the Method

volume of space that can

x
take low energy
Y. LeCun

Recommendations:
Abandon generative models
in favor joint-embedding architectures
Abandon Auto-Regressive generation
Abandon probabilistic model
in favor of energy-based models
Abandon contrastive methods
in favor of regularized methods
Abandon Reinforcement Learning
In favor of model-predictive control
Use RL only when planning doesn’t yield the
predicted outcome, to adjust the world model or
the critic.
Y. LeCun

Training a JEPA non contrastively

Four terms in the cost

Maximize information
content in Minimize
representation of x Prediction
Error
Maximize information Maximize Maximize
content in Information Information
Content
representation of y Content

Minimize Prediction Minimize

error Information
Content
Minimize information
content of latent
variable z
Y. LeCun

VICReg: Variance, Invariance, Covariance Regularization

Variance:
Maintains variance of
components of
representations
Covariance:
Decorrelates
components of
covariance matrix of
representations
Invariance:
Minimizes prediction
error.
Barlow Twins [Zbontar et al. ArXiv:2103.03230], VICReg [Bardes, Ponce, LeCun arXiv:2105.04906, ICLR 2022],
VICRegL [Bardes et al. NeurIPS 2022], MCR2 [Yu et al. NeurIPS 2020][Ma, Tsao, Shum, 2022]
Y. LeCun

VICRegL: local matching latent variable for segmentation

Latent variable optimization:

Finds a pairing between local feature vectors of the two images
[Bardes, Ponce, LeCun NeurIPS 2022, arXiv:2210.01571]
Y. LeCun

MC-JEPA: Motion & Content JEPA

Simultaneous SSL for

Image recognition
Motion estimation
Trained on
ImageNet 1k
Various video datasets
Uses VCReg to prevent
collapse
ConvNext-T backbone
Y. LeCun

MC-JEPA: Motion & Content JEPA

Motion estimation architecture uses a top-down hierarchical

predictor that “warp” feature maps.
Y. LeCun

MC-JEPA: Optical Flow Estimation Results

Y. LeCun

Image-JEPA: uses masking, transformer, EMA weights

“SSL from images with a JEPA”
M. Assran et al arxiv:2301.08243
Jointly embeds a context and a
number of neighboring patches.
Uses predictors
Uses only masking
Y. LeCun

Hierarchical Prediction at Multiple Time-Scales & Abstraction Levels

Low-level
representations
can only predict in
the short term.
Too much details JEPA-2
Prediction is hard
Higher-level
representations
can predict in the
longer term.
Less details.
Prediction is easier JEPA-1
Y. LeCun

Hierarchical Planning with Uncertainty

Predictors use latent variables sampled from regularizers. C(s2[4])

Enc2(s[0]) Pred2(s,a,z) Pred2(s,a,z)

s2[0] s2[2] s2[4]

R2 z2[2] R2 z2[4]

C(s[2]) C(s[4])

Enc1(x) Pred1(s,a,z) Pred1(s,a,z) Pred1(s,a,z) Pred1(s,a,z)

s[0] s[1] s[2] s[3] s[4]
R1 z1[0] R1 z1[0] R1 z1[0] R1 z1[0]

action
a[0] a[1] a[2] a[3]
Actor
Y. LeCun
C(s2)
Hierarchical Planning with Uncertainty
Enc2(s[0]) Pred2
s2initial s2
Hierarchical world model R2 z2
Hierarchical planning a2
An action at level k specifies an
objective for level k-1 C(s1,a2)

Prediction in higher levels are Enc1(x) Pred1

more abstract and longer-range. s1initial s1
R1 z1

This type of planning/reasoning a1

by minimizing a cost w.r.t “action”

C(s0,a1)
variables is what’s missing from
current architectures Enc0(x) Pred0
s0 initial
Including AR-LLMs, multimodal R0 z0
s0
systems, learning robots,...
a0
Y. LeCun

Steps towards Autonomous AI Systems

Self-Supervised Learning
To learn representations of the world
To learn predictive models of the world
Handling uncertainty in predictions
Joint-embedding predictive architectures
Energy-Based Model framework
Learning world models from observation
Like animals and human babies?
Reasoning and planning
That is compatible with gradient-based learning
No symbols, no logic → vectors & continuous functions
Y. LeCun

Positions / Conjectures
Prediction is the essence of intelligence
Learning predictive models of the world is the basis of common sense
Almost everything is learned through self-supervised learning
Low-level features, space, objects, physics, abstract representations…
Almost nothing is learned through reinforcement, supervision or imitation
Reasoning == simulation/prediction + optimization of objectives
Computationally more powerful than auto-regressive generation.
H-JEPA with non-contrastive training is the thing
Probabilistic generative models and contrastive methods are doomed.
Intrinsic cost & architecture drive behavior & determine what is learned
Emotions are necessary for autonomous intelligence
Anticipation of outcomes by the critic or world model+intrinsic cost.
Y. LeCun

Challenges for AI Research

Finding a general recipe for training Hierarchical Joint Embedding

Architectures-based World Models from video, image, audio, text…

Designing surrogate costs to drive the H-JEPA to learn relevant

representations (prediction is just one of them)
Integrating an H-JEPA into an agent capable of planning/reasoning
Devising inference procedures for hierarchical planning in the
presence of uncertainty (gradient-based methods, beam search,
MCTS,….)
Minimizing the use of RL to situations where the model or the critic
are inaccurate and lead to unforeseen outcomes.
Scaling
Thank you!

Deep Learning in Neural Networks An Overview
No ratings yet
Deep Learning in Neural Networks An Overview
89 pages
Lecunslides
No ratings yet
Lecunslides
99 pages
Neurosymbolic Presentation
No ratings yet
Neurosymbolic Presentation
42 pages
The Sentient Robot: The Last Two Hurdles in the Race to Build Artificial Superintelligence
From Everand
The Sentient Robot: The Last Two Hurdles in the Race to Build Artificial Superintelligence
Rupert Robson
No ratings yet
Lec12 Self Supervised Learning
No ratings yet
Lec12 Self Supervised Learning
91 pages
Generative AI With LArge Language Models
No ratings yet
Generative AI With LArge Language Models
36 pages
WEEK 01 Merged
No ratings yet
WEEK 01 Merged
606 pages
Generative Ai Terminology
67% (3)
Generative Ai Terminology
26 pages
Lecun 20250427 Nus120
No ratings yet
Lecun 20250427 Nus120
90 pages
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
No ratings yet
CSD411-Week 3 - Learning Paradigms and Mathematical Foundations
132 pages
cs4302 Lecture1
No ratings yet
cs4302 Lecture1
65 pages
The Future of Ai: Yann Lecun
No ratings yet
The Future of Ai: Yann Lecun
20 pages
Lecun 20240328 Harvard
No ratings yet
Lecun 20240328 Harvard
97 pages
Lecun 20230424 Santa Fe Institute
No ratings yet
Lecun 20230424 Santa Fe Institute
66 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
Don't Teach. Incentivize
No ratings yet
Don't Teach. Incentivize
59 pages
Ai Seminar
No ratings yet
Ai Seminar
46 pages
465-Lecture 1 (Deep Learning)
No ratings yet
465-Lecture 1 (Deep Learning)
47 pages
Compressed Yann 1
No ratings yet
Compressed Yann 1
45 pages
Lecun 20230721 Mit
No ratings yet
Lecun 20230721 Mit
69 pages
Lecun 20240223 Aaai
No ratings yet
Lecun 20240223 Aaai
81 pages
2015.08.26.Lecture01Intro 2
No ratings yet
2015.08.26.Lecture01Intro 2
37 pages
2023 LLMBC Whats Next
No ratings yet
2023 LLMBC Whats Next
95 pages
19 20-gpt-3 Prompts
No ratings yet
19 20-gpt-3 Prompts
68 pages
Reality Unbound - The Digital Mind (and the nature of reality)
From Everand
Reality Unbound - The Digital Mind (and the nature of reality)
E. Hughes
No ratings yet
Lecun 20201027 Att
No ratings yet
Lecun 20201027 Att
72 pages
Lec 23
No ratings yet
Lec 23
51 pages
Deep Learning Introduction Class
No ratings yet
Deep Learning Introduction Class
46 pages
cs188 Fa24 Lec26
No ratings yet
cs188 Fa24 Lec26
38 pages
Sesi#1 - WJ - Machine Learning in Brief (Printed Version)
No ratings yet
Sesi#1 - WJ - Machine Learning in Brief (Printed Version)
37 pages
NA To SS en 1997-1 2010 - Singapore National Annex To Eurocode 7
100% (2)
NA To SS en 1997-1 2010 - Singapore National Annex To Eurocode 7
26 pages
Carrazza n3pdf
No ratings yet
Carrazza n3pdf
94 pages
GenAIWorkshop GEOMAR With Footnotes Final
No ratings yet
GenAIWorkshop GEOMAR With Footnotes Final
41 pages
ML Lecture#1
No ratings yet
ML Lecture#1
52 pages
ICML 2018 Notes: Stockholm, Sweden
No ratings yet
ICML 2018 Notes: Stockholm, Sweden
55 pages
UNIT-2 Goyal Question & Answer
No ratings yet
UNIT-2 Goyal Question & Answer
5 pages
NNQuant 4
No ratings yet
NNQuant 4
20 pages
1 - Introduction
No ratings yet
1 - Introduction
82 pages
Reinforcement Learning For IoT - Final
No ratings yet
Reinforcement Learning For IoT - Final
45 pages
MLSys Class LLM Introduction
No ratings yet
MLSys Class LLM Introduction
43 pages
AIDL03 EvolutionOfAI
No ratings yet
AIDL03 EvolutionOfAI
22 pages
NLP & LLM - 11
No ratings yet
NLP & LLM - 11
16 pages
Ai 4 All
No ratings yet
Ai 4 All
31 pages
Lecun 20161205 Nips Keynote
No ratings yet
Lecun 20161205 Nips Keynote
75 pages
Deep Learning
No ratings yet
Deep Learning
49 pages
Linear Optimization and Extensions - Problems and Solutions (PDFDrive)
No ratings yet
Linear Optimization and Extensions - Problems and Solutions (PDFDrive)
450 pages
Deep Unsupervised Learning
No ratings yet
Deep Unsupervised Learning
90 pages
Lecun 20201119 Frgejp Ai
No ratings yet
Lecun 20201119 Frgejp Ai
32 pages
Ai Machine Learning
No ratings yet
Ai Machine Learning
27 pages
Applications of Reinforcement Learning
No ratings yet
Applications of Reinforcement Learning
10 pages
Machine Learning Overview
No ratings yet
Machine Learning Overview
54 pages
Variational Autoencoder
No ratings yet
Variational Autoencoder
31 pages
1 Introduction To RL
No ratings yet
1 Introduction To RL
46 pages
Introduction To ML P2
No ratings yet
Introduction To ML P2
30 pages
Deep Reinforcement Learning
No ratings yet
Deep Reinforcement Learning
47 pages
Thesis
No ratings yet
Thesis
87 pages
Introduction To AI (Studia)
No ratings yet
Introduction To AI (Studia)
11 pages
Case
No ratings yet
Case
6 pages
Unit 3 Introduction To Deep Learning Part 1
No ratings yet
Unit 3 Introduction To Deep Learning Part 1
7 pages
PHMSA Project DTPH56-07-000005 Final Report 277-T-10 Weld Strength Mismatch Requirements
No ratings yet
PHMSA Project DTPH56-07-000005 Final Report 277-T-10 Weld Strength Mismatch Requirements
100 pages
Permutation Methods A Distance Function Approach 2nd Edition Entire Book Download
100% (20)
Permutation Methods A Distance Function Approach 2nd Edition Entire Book Download
17 pages
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
No ratings yet
Introduction To AI and ML - Day 1: Gururajan Narasimhan Erode
39 pages
Automatic Isoperibol Calorimeter: Operating Instruction Manual
No ratings yet
Automatic Isoperibol Calorimeter: Operating Instruction Manual
110 pages
MSDS-CSP E - 2400 Evamarine Finish
No ratings yet
MSDS-CSP E - 2400 Evamarine Finish
5 pages
Integrated Pre Cum Mains Daily Answer Writing Program For UPSC CSE 2023
No ratings yet
Integrated Pre Cum Mains Daily Answer Writing Program For UPSC CSE 2023
20 pages
Dam Safety Workshop 2023-1 India
No ratings yet
Dam Safety Workshop 2023-1 India
4 pages
gooFSM Research Full Chapters
No ratings yet
gooFSM Research Full Chapters
79 pages
MIcro End-Milling I - Wear and Breakage
No ratings yet
MIcro End-Milling I - Wear and Breakage
18 pages
Gauge Sizes Chart: EN 10253 4 Structural Dimensions of Fittings ISO 5251 ISO 3419
No ratings yet
Gauge Sizes Chart: EN 10253 4 Structural Dimensions of Fittings ISO 5251 ISO 3419
10 pages
Jawi Unicode Compliant Font PDF
No ratings yet
Jawi Unicode Compliant Font PDF
5 pages
BWIA Race
No ratings yet
BWIA Race
12 pages
CISCE
No ratings yet
CISCE
1 page
Phinma University of Iloil1
No ratings yet
Phinma University of Iloil1
6 pages
Mediated Memories in The Digital Age 1st Edition Jose Van Dijck Instant Download
No ratings yet
Mediated Memories in The Digital Age 1st Edition Jose Van Dijck Instant Download
56 pages
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
No ratings yet
.Trashed 1724238737 BPSC Senior Secondary Teacher Booklets Sanskrit
32 pages
Learning Intelligent Distribution Agent: Fundamentals and Applications
From Everand
Learning Intelligent Distribution Agent: Fundamentals and Applications
Fouad Sabry
No ratings yet
CRUSHER JOE and DIRTY PAIR - Complete Movie, OVA, TV Series - 720p-1080p BluRay DUAL AUDIO x264
No ratings yet
CRUSHER JOE and DIRTY PAIR - Complete Movie, OVA, TV Series - 720p-1080p BluRay DUAL AUDIO x264
4 pages
Kismet: Fundamentals and Applications
From Everand
Kismet: Fundamentals and Applications
Fouad Sabry
No ratings yet
SCIENCE 1-4th QUARTER EXAM
No ratings yet
SCIENCE 1-4th QUARTER EXAM
3 pages
Biology Practical Class 12
No ratings yet
Biology Practical Class 12
7 pages
Rokka Archive Translation - Part 2
No ratings yet
Rokka Archive Translation - Part 2
63 pages
Chapter 4 Introduction To Discontinuity Study
No ratings yet
Chapter 4 Introduction To Discontinuity Study
87 pages
Safari 8
No ratings yet
Safari 8
8 pages
Test of Difference and Friedman
No ratings yet
Test of Difference and Friedman
11 pages
Protection Coordinator
No ratings yet
Protection Coordinator
4 pages
Elapan Company Profile 2023
No ratings yet
Elapan Company Profile 2023
7 pages
Presentations 5
No ratings yet
Presentations 5
6 pages
Ict Tools in Biology Education: DR Katarzyna Potyrala
No ratings yet
Ict Tools in Biology Education: DR Katarzyna Potyrala
9 pages
AN240P
No ratings yet
AN240P
5 pages
Arcos Del Sol-Digital-V2
No ratings yet
Arcos Del Sol-Digital-V2
2 pages