Poster 2

Uploaded by

chuyuliucs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views1 page

Poster 2

Uploaded by

chuyuliucs

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 1

Ins of our

Towards World Model for Reasoning prettiest group

member

Dongwei Jiang, Guoxuan Wang, Chuyu Liu, Daniel Khashabi

{djiang21, gwang69, cliu212}@jhu.edu

Motivation Rationale Filtering

❖ LLMs has greatly advanced as general-purpose agent models ❖ Rationale sampling is conducted on the training set
➢ Filter % means the percentage of left over rationales
➢ For each dataset, we manually annotate 100 pairs, where
half of them are positive rationale examples. Rationale
precision is calculated with this annotated dataset
➢ We want our ﬁltered rationales to have a precision of at
least 95%

❖ However, in the absence of well-structured environments, LLM faces serious

issues such as hallucination and inadequate planning
Inference results
❖ Previous eﬀorts in the domain of vision have created world models that
simulate the physical world (Sora, Video Language Planning…) ❖ Our method generates accurate and
❖ Constructing world models for more abstract domains, such as reasoning—a easy-to-understand rationales during inference time
key capability of LLMs that supports many advanced applications—has not
yet been attempted!

Method
(1) Sample and ﬁlter rationales with powerful LLMs ❖ Takeaway from inference results:

Question: … Answer: Natalia sold 48/2 =

➢ Our method performs better than the baseline
Question: Natalia sold <<48/2=24>>24 clips in May. <BOT>Now
we
clips to 48 …
➢ Perplexity-based heuristic works better than steps-based
should calculate the sum of chips in Useful for heuristic
April and May<EOT>. Natalia sold… future text!
Answer: Natalia sold ➢ Adding rationales doesn’t make the performance better
48/2 = <<48/2=24>>24
clips in May. Natalia
Question: … Answer: Natalia sold 48/2 =
sold 48+24 =
<<48/2=24>>24 clips in May. <BOT>Now
we
<<48+24=72>>72 clips
altogether in April and need to calculate the number of chips Useless for
May. #### 72 sold in May<EOT>. Natalia sold… future text!

(2) Train world model to generate rationales given preceding context

Question: Natalia sold clips… <BOT>Now we should calculate
the sum of chips in April and
Answer: Natalia sold 48/2 =
<<48/2=24>>24 clips in May. May<EOT>

(3) Supervise agent model during inference

Conclusion and Future Work
Question: Weng earns $12 an hour for babysitting. Yesterday, she just
did 50 minutes of babysitting. How much did she earn? ❖ We created a world model in an unsupervised way.
➢ It provides procedural feedback during inference time for
agent model that surpasses other baselines
➢ It also provides human-readable rationale that enhances
Answer: Weng earns 12/60 Answer: Weng earns 12/60
= $<<12/60=0.2>>0.2 per = $<<12/60=0.2>>0.2 per the explainability of LLM generation during reasoning
minute … hour …

✔ ✗
❖ In the future, we want to:
➢ Train a bigger stronger world model
( perplexity = 10 ) ( perplexity = 1000 ) ➢ Test on more reasoning and planning tasks
➢ Conduct more analysis on the eﬀect of world model
<BOT>We multiply the rate <BOT>We divide the amount
per minute by the number of of money by 60 and multiple
minutes she worked<EOT> by the number of minute she
worked<EOT>
Just kidding, the QR code is actually our Overleaf draft :)

Laser Maker Manual (YLM) - 95p - ENG
No ratings yet
Laser Maker Manual (YLM) - 95p - ENG
95 pages
LLM Cheat Sheetpdf
No ratings yet
LLM Cheat Sheetpdf
7 pages
Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach
No ratings yet
Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach
9 pages
A Modular Dataset To Demonstrate LLM Abstraction Capability: Adam Atanas Kai Liu
No ratings yet
A Modular Dataset To Demonstrate LLM Abstraction Capability: Adam Atanas Kai Liu
7 pages
E3. AI Agents
No ratings yet
E3. AI Agents
49 pages
Know Thy Frenemy
No ratings yet
Know Thy Frenemy
40 pages
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
No ratings yet
Harnessing The Reasoning Economy A Survey of Efficient Reasoning For Large Language Models
24 pages
Easy Problems That LLMs Get Wrong
No ratings yet
Easy Problems That LLMs Get Wrong
46 pages
LLM - Michael R Douglas
No ratings yet
LLM - Michael R Douglas
47 pages
Large Language Models - An Applied Econometric Framework
No ratings yet
Large Language Models - An Applied Econometric Framework
94 pages
Large Language Models Johns Hopkins University
No ratings yet
Large Language Models Johns Hopkins University
54 pages
Introduction To Logical Agents: RN, C Hapte R 7-7.3
No ratings yet
Introduction To Logical Agents: RN, C Hapte R 7-7.3
29 pages
Paniit Demystifying Llms
No ratings yet
Paniit Demystifying Llms
66 pages
Can Large Language Models Reason and Plan?
No ratings yet
Can Large Language Models Reason and Plan?
5 pages
Lec 9 Reasoning and Agents Prof Abeer 2025
No ratings yet
Lec 9 Reasoning and Agents Prof Abeer 2025
65 pages
Llms
No ratings yet
Llms
11 pages
Thinking Machines: A Survey of LLM Based Reasoning Strategies
No ratings yet
Thinking Machines: A Survey of LLM Based Reasoning Strategies
15 pages
Ba LLMS W3 S2 2024 2025
No ratings yet
Ba LLMS W3 S2 2024 2025
64 pages
Aeon - Co-Can Philosophy Help Us Get A Grip On The Consequences of AI
No ratings yet
Aeon - Co-Can Philosophy Help Us Get A Grip On The Consequences of AI
10 pages
Intelligent Artificial Intelligence?: Insights
No ratings yet
Intelligent Artificial Intelligence?: Insights
1 page
Devansh 2023
No ratings yet
Devansh 2023
20 pages
Icaps LLM Tut Slides Posted
No ratings yet
Icaps LLM Tut Slides Posted
97 pages
Roisinluo Reasoning in LLMs
No ratings yet
Roisinluo Reasoning in LLMs
72 pages
WorldCoder, A Model-Based LLM Agent
No ratings yet
WorldCoder, A Model-Based LLM Agent
42 pages
Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners
No ratings yet
Large Language Models Are In-Context Semantic Reasoners Rather Than Symbolic Reasoners
40 pages
Getting From Generative AI To Trustworthy AI
No ratings yet
Getting From Generative AI To Trustworthy AI
21 pages
Lecture 05 - Knowledge-Based Agents
No ratings yet
Lecture 05 - Knowledge-Based Agents
48 pages
Efficient Tool Use With Chain-of-Abstraction Reasoning
No ratings yet
Efficient Tool Use With Chain-of-Abstraction Reasoning
17 pages
NeurIPS2023 Tutorial LM WM
No ratings yet
NeurIPS2023 Tutorial LM WM
174 pages
The Power of the 2 x 2 Matrix: Using 2 x 2 Thinking to Solve Business Problems and Make Better Decisions
From Everand
The Power of the 2 x 2 Matrix: Using 2 x 2 Thinking to Solve Business Problems and Make Better Decisions
Alex Lowy
No ratings yet
Logical Agents I: Introducing The Wumpus
No ratings yet
Logical Agents I: Introducing The Wumpus
15 pages
Highschooler's Mental Models: Mental Models Series, #1
From Everand
Highschooler's Mental Models: Mental Models Series, #1
S VASIST
No ratings yet
Artificial Intelligence (CSE) Unit-Ii: 2.1.1 Logical Agents
No ratings yet
Artificial Intelligence (CSE) Unit-Ii: 2.1.1 Logical Agents
27 pages
State of GPT
No ratings yet
State of GPT
50 pages
Effective Problem-Solving Skills: Guide eBook
From Everand
Effective Problem-Solving Skills: Guide eBook
Amol Kindre
No ratings yet
Large Language Models As General Pattern Machines
No ratings yet
Large Language Models As General Pattern Machines
21 pages
The Illusion of Thinking
No ratings yet
The Illusion of Thinking
30 pages
The Illusion of Thinking
No ratings yet
The Illusion of Thinking
3 pages
Unit Iv Ai
No ratings yet
Unit Iv Ai
11 pages
Understanding Reasoning LLMS: Methods and Strategies For Building and Refining Reasoning Models
No ratings yet
Understanding Reasoning LLMS: Methods and Strategies For Building and Refining Reasoning Models
27 pages
LLM Intro
No ratings yet
LLM Intro
51 pages
An Introduction To Neural Networks Python
No ratings yet
An Introduction To Neural Networks Python
11 pages
Inference Efficiency by Learning Task Complexity
No ratings yet
Inference Efficiency by Learning Task Complexity
9 pages
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
No ratings yet
Can LLMs Really Reason and Plan - blog@CACM - Communications of The ACM
7 pages
Bim309 Ai Week14
No ratings yet
Bim309 Ai Week14
32 pages
Position - LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
No ratings yet
Position - LLMs Can't Plan, But Can Help Planning in LLM-Modulo Frameworks
13 pages
M03 - 03 Logical Agents
No ratings yet
M03 - 03 Logical Agents
44 pages
How LLM Agents Learn and Reason
No ratings yet
How LLM Agents Learn and Reason
2 pages
Unit 3
No ratings yet
Unit 3
54 pages
Mortizing Intractable Inference in Large Language Models: Edward J. Hu, Moksh Jain, Eric Elmoznino Younesse Kaddar
No ratings yet
Mortizing Intractable Inference in Large Language Models: Edward J. Hu, Moksh Jain, Eric Elmoznino Younesse Kaddar
31 pages
2408.00992v3 Fairness in Large Language Models in Three Hours
No ratings yet
2408.00992v3 Fairness in Large Language Models in Three Hours
5 pages
4-HC24.PrimisAI - Hans Bouwmeester.v4
No ratings yet
4-HC24.PrimisAI - Hans Bouwmeester.v4
29 pages
Zhaocheng Zhu1 - Large Language Models Can Learn Rules
No ratings yet
Zhaocheng Zhu1 - Large Language Models Can Learn Rules
29 pages
G Unit en ML Introduction
No ratings yet
G Unit en ML Introduction
33 pages
Turning Dust Into Gold: Distilling Complex Reasoning Capabilities From Llms by Leveraging Negative Data
No ratings yet
Turning Dust Into Gold: Distilling Complex Reasoning Capabilities From Llms by Leveraging Negative Data
12 pages
ClashEval: Quantifying The Tug-Of-War Between An LLM's Internal Prior and External Evidence
No ratings yet
ClashEval: Quantifying The Tug-Of-War Between An LLM's Internal Prior and External Evidence
13 pages
2023 LLMBC Whats Next
No ratings yet
2023 LLMBC Whats Next
95 pages
400pm Korinek Paper LLMs Final
No ratings yet
400pm Korinek Paper LLMs Final
65 pages
Best First Search: Fundamentals and Applications
From Everand
Best First Search: Fundamentals and Applications
Fouad Sabry
No ratings yet
21CSC206T Unit-4
No ratings yet
21CSC206T Unit-4
164 pages
AI Notes
No ratings yet
AI Notes
36 pages
Solutions For Problems in Mathematical Structures For Computer Science, 7th Edition - Gersting
No ratings yet
Solutions For Problems in Mathematical Structures For Computer Science, 7th Edition - Gersting
31 pages
Unit 4 Chapter 11 Test
No ratings yet
Unit 4 Chapter 11 Test
2 pages
Optical Transmission Modes, Layers and Protocols: Synchronous Networks
No ratings yet
Optical Transmission Modes, Layers and Protocols: Synchronous Networks
15 pages
Topic Research Worksheet: Examples of Acceptable Internet Sources
No ratings yet
Topic Research Worksheet: Examples of Acceptable Internet Sources
16 pages
cs188 Fa23 Note22
No ratings yet
cs188 Fa23 Note22
3 pages
Julie Ann Resume
No ratings yet
Julie Ann Resume
3 pages
Week 6 Resources
No ratings yet
Week 6 Resources
24 pages
Invoice Bali Smart Travels
No ratings yet
Invoice Bali Smart Travels
1 page
.An Approach To Physical Design of 28nm Technology Based Processor Chip Using IC Compiler
No ratings yet
.An Approach To Physical Design of 28nm Technology Based Processor Chip Using IC Compiler
4 pages
Case Study
No ratings yet
Case Study
6 pages
Current Advancements in Stereo Vision
No ratings yet
Current Advancements in Stereo Vision
234 pages
Vn800 Service Manual
No ratings yet
Vn800 Service Manual
405 pages
Cosc 111 (Unit 1)
No ratings yet
Cosc 111 (Unit 1)
62 pages
Cisco DevNets01t03
No ratings yet
Cisco DevNets01t03
57 pages
Informative Speech Steve Jobs
100% (1)
Informative Speech Steve Jobs
5 pages
E.optemal Reconfiguration of Network
No ratings yet
E.optemal Reconfiguration of Network
30 pages
150,000+ Free Fonts - Download Now - FFonts
No ratings yet
150,000+ Free Fonts - Download Now - FFonts
5 pages
Everyday Narrative Report Format
No ratings yet
Everyday Narrative Report Format
10 pages
Applied Epic en Us
No ratings yet
Applied Epic en Us
2 pages
Using Social Media Images As Data in Social Science Research
No ratings yet
Using Social Media Images As Data in Social Science Research
23 pages
Afnan PPT (1) (Read-Only)
No ratings yet
Afnan PPT (1) (Read-Only)
13 pages
Sugar Mommy PDF 3 PDF Scribd
No ratings yet
Sugar Mommy PDF 3 PDF Scribd
1 page
Gaurav Kumar Resume
No ratings yet
Gaurav Kumar Resume
3 pages
Pract4
No ratings yet
Pract4
4 pages
Untitled
No ratings yet
Untitled
44 pages
wb3 Draft
No ratings yet
wb3 Draft
6 pages
Tài liệu không có tiêu đề
No ratings yet
Tài liệu không có tiêu đề
2 pages
+ Chelton: V/Uhf Receiver TYPE 707-I SERIAL Nos. 1 - 100
No ratings yet
+ Chelton: V/Uhf Receiver TYPE 707-I SERIAL Nos. 1 - 100
51 pages
ISSM535Q Week1 PDF
No ratings yet
ISSM535Q Week1 PDF
42 pages