0% found this document useful (0 votes)
12 views1 page

Poster 2

Uploaded by

chuyuliucs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views1 page

Poster 2

Uploaded by

chuyuliucs
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Ins of our

Towards World Model for Reasoning prettiest group


member

Dongwei Jiang, Guoxuan Wang, Chuyu Liu, Daniel Khashabi


{djiang21, gwang69, cliu212}@jhu.edu

Motivation Rationale Filtering


❖ LLMs has greatly advanced as general-purpose agent models ❖ Rationale sampling is conducted on the training set
➢ Filter % means the percentage of left over rationales
➢ For each dataset, we manually annotate 100 pairs, where
half of them are positive rationale examples. Rationale
precision is calculated with this annotated dataset
➢ We want our filtered rationales to have a precision of at
least 95%

❖ However, in the absence of well-structured environments, LLM faces serious


issues such as hallucination and inadequate planning
Inference results
❖ Previous efforts in the domain of vision have created world models that
simulate the physical world (Sora, Video Language Planning…) ❖ Our method generates accurate and
❖ Constructing world models for more abstract domains, such as reasoning—a easy-to-understand rationales during inference time
key capability of LLMs that supports many advanced applications—has not
yet been attempted!

Method
(1) Sample and filter rationales with powerful LLMs ❖ Takeaway from inference results:

Question: … Answer: Natalia sold 48/2 =


➢ Our method performs better than the baseline
Question: Natalia sold <<48/2=24>>24 clips in May. <BOT>Now
we
clips to 48 …
➢ Perplexity-based heuristic works better than steps-based
should calculate the sum of chips in Useful for heuristic
April and May<EOT>. Natalia sold… future text!
Answer: Natalia sold ➢ Adding rationales doesn’t make the performance better
48/2 = <<48/2=24>>24
clips in May. Natalia
Question: … Answer: Natalia sold 48/2 =
sold 48+24 =
<<48/2=24>>24 clips in May. <BOT>Now
we
<<48+24=72>>72 clips
altogether in April and need to calculate the number of chips Useless for
May. #### 72 sold in May<EOT>. Natalia sold… future text!

(2) Train world model to generate rationales given preceding context


Question: Natalia sold clips… <BOT>Now we should calculate
the sum of chips in April and
Answer: Natalia sold 48/2 =
<<48/2=24>>24 clips in May. May<EOT>

(3) Supervise agent model during inference


Conclusion and Future Work
Question: Weng earns $12 an hour for babysitting. Yesterday, she just
did 50 minutes of babysitting. How much did she earn? ❖ We created a world model in an unsupervised way.
➢ It provides procedural feedback during inference time for
agent model that surpasses other baselines
➢ It also provides human-readable rationale that enhances
Answer: Weng earns 12/60 Answer: Weng earns 12/60
= $<<12/60=0.2>>0.2 per = $<<12/60=0.2>>0.2 per the explainability of LLM generation during reasoning
minute … hour …

✔ ✗
❖ In the future, we want to:
➢ Train a bigger stronger world model
( perplexity = 10 ) ( perplexity = 1000 ) ➢ Test on more reasoning and planning tasks
➢ Conduct more analysis on the effect of world model
<BOT>We multiply the rate <BOT>We divide the amount
per minute by the number of of money by 60 and multiple
minutes she worked<EOT> by the number of minute she
worked<EOT>
Just kidding, the QR code is actually our Overleaf draft :)

You might also like