0% found this document useful (0 votes)

436 views17 pages

Green Wizards

green

Uploaded by

juliemaey727

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

436 views17 pages

Green Wizards

green

Uploaded by

juliemaey727

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Green Wizards

Math Reasoning

Step-by-Step

Task Specifications

Table of Contents

Project Overview

Task Attempt Workflow

Task Specifications

Step 1: Write a Prompt (math problem)

Step 1a: Solve the Math Problem

Step 1b: Keep Prompting the Model until it Produces an Incorrect Response

Step 2: Rate the Correctness of Each Step

Step 2a: Write a rationale for incorrect steps

Step 3: Rewrite Incorrect Steps

Step 4: Regenerate

Step 5: Summarize

APPENDIX

LaTex Guidelines

Rubrics

Helpful Links

Project Overview

Welcome to Green Wizards Math Reasoning Step-by-Step! This project aims to enhance advanced
models by developing and evaluating complex user prompts. For each pair of skills, generate prompts
that integrate the skills in a sophisticated way. Prompts should be challenging enough that [latest
models] are unlikely to produce accurate responses. Model outputs will be annotated to identify and
correct errors.

Key Terminology:

• Prompt: A question or statement designed to elicit a response from the model, incorporating
two complex skills.

• Model Response: The model’s step-by-step answer to the prompt, used for evaluation.
• Annotation: The process of reviewing the model’s response for accuracy, relevance to the
prompt, and proper skill use, while noting any reasoning errors.

• Error Identification: Detecting reasoning flaws or gaps in the model’s application of skills.

• Error Correction: Fixing identified errors and providing clear explanations for the corrections.

• Teaching Material: If necessary, developing educational resources to explain and address

reasoning errors.

Task Attempt Workflow

1. PROMPT - Write and Solve an Elegant Math Problem

• Select two Skills to incorporate into a prompt (you must use both)

• Write a prompt that can stump the model (causing it to make a reasoning error)

• Make sure your problem can be solved

2. RESPONSE - Rate the Correctness of Each Step

• Check the model’s response, determining whether each step is correct or incorrect

• Determine if the step should be solved using LLM (text-based), or python (code-based
computations).

• If the response does not have any reasoning errors, repeat Step 1.

• Write a brief justification explaining the error

3. REWRITING - Rewrite incorrect steps

• Correct the model's reasoning by rewriting any incorrect steps

• Write a brief justification explaining the changes

4. REGENERATE - Repeat the process until the Final Answer is reached

• Regenerate the response after each correction, restarting the model's train of thought

• Repeat Steps 2-4 until the model arrives at the correct answer

5. SUMMARIZE - Summarize the task using the checkboxes and short responses

• Accuracy - Does the original model response (BEFORE REWRITES) make an error?

• Instruction Following - Does the response AFTER REWRITES follow all prompt instructions?

• Skills Check - Does the response AFTER REWRITES utilize both of the skills selected in step 1?

• First Incorrect Step - What was the first incorrect step with a reasoning error made by the
model?

• Skills Gap - What Skill(s) does the model fail at or not include in the response?
Task Specifications

Step 1: Write a Prompt (math problem)

In this step, you will write a math problem for the model to solve. Problems should adhere to the
specifications below.

Specifications list

1. Math problems must lead to an error in reasoning in at least one of the steps in the model’s
initial
responsehttps://www.google.com/search?client=safari&rls=en&q=(i.e.%2C+avoid+problems+wh
ere&ie=UTF-8&oe=UTF-8.

2. Math problems must use proper LaTeX formatting for all mathematical expressions.

3. Prompts must be original.

4. Math problems must be solvable

1. Avoid problems that don't contain the information necessary to solve them

2. Avoid problems with impossible scenarios, for example:

1. Dividing by zero

2. Asking for the square root of a negative number if the domain is real numbers

3. Avoid problems containing terms, concepts, theorems, or lemmas that do not adhere to
mathematical rules

5. Math problems must have only one correct solution

1. Avoid problems that have many potential answers

2. Answers must be either a number or a mathematical expression

3. Answers should be able to be read by something like Python or Wolphram Alpha

6. There must only be one answer to the prompt

1. A question like “find all the roots of this quadratic x^2 + 5x + 6” is not a valid prompt because it
would result in two answers, -2 and -3

2. A better question to use is “what is the minimum of the roots of this quadratic x^2 + 5x + 6”
because the ground truth final answer is -3 only.

7. Prompts must not ask for proofs

1. Prompts can only ask questions that result in an answer that can be machine verified

2. This is an example of a prompt that would fail:

In a regular hexagon, three diagonals are drawn, intersecting at the center of the hexagon. The diagonals
divide the hexagon into $6$ congruent triangles, $3$ of which are colored red and the remaining $3$ are
colored blue. Prove that there exists a coloring of the triangles such that it is possible to draw a line that
passes through the center of the hexagon and divides it into two parts that are each one color.

8. Prompts must not ask multiple questions in the problem statement

1. Prompts must only ask for one question and one answer

2. This is an example of a prompt that would fail:

Given that $\sin \theta = \frac{1}{3}$, find the value of $\frac{\cos^3 2\theta}{\cos^2 3\theta}$,
considering only the principal value of $\theta$. Check if the resulting expression is a perfect cube, and
identify any degenerate cases for the values of $\theta$.

9. Math problems must be sufficiently complex, but not contrived

1. Problems should be complex enough to stump the model but also not contrived, meaning that
the problems should reflect realistic scenarios/asks

Category Example Explanation

Good Example Given a circle with a radius of r, This is a good example because it integrates
a square is inscribed inside the geometry concepts (circle and square) and
circle. Calculate the area of the requires the model to compute areas of both
region outside the square but shapes and then find the difference. It
inside the circle. challenges the model to perform multiple steps
and apply different geometric formulas, making
it complex enough to potentially stump the
model.

Bad Example What is the area of a rectangle This problem is too simple for the task
with a length of 5 units and a requirements. It only requires a basic
width of 3 units? application of the area formula for rectangles
(Area=length×width\text{Area} = \text{length}
\times \text{width}Area=length×width) and
does not challenge the model's problem-solving
abilities. It lacks complexity and is not suitable
for evaluating the model’s capacity to handle
more advanced tasks.

Bad Example What is the sum of the first 10 This problem is unsuitable because it includes
positive integers? Also, find the two separate math problems in one prompt,
area of a triangle with a base of which goes against the requirement that a math
5 units and a height of 7 units.
problem should have only one correct solution.
Both problems individually are also too simple.

10. Math problems should not contain any spelling, grammar, or formatting errors

Feel free to find inspiration in this [EXT] Math R2 SFT - Math Problem Bank

Step 1a: Solve the Math Problem

Next, you will solve the math problem and verify the problem satisfies all the constraints above.

Step 1b: Keep Prompting the Model until it Produces an Incorrect Response

Next, you will submit the math problem for the model to solve. If the model answers correctly, or only
makes calculation errors, submit a different problem. Keep submitting problems until the model
produces an incorrect response.

IMPORTANT NOTE:

• Your task is to create a prompt that results in at least one reasoning error.

• If your prompt only produces a calculation error, revise it to introduce a reasoning error before
completing the task.

Step 2: Rate the Correctness of Each Step

When you identify a mistake in the model's response, follow these steps:
1. Mark the step as "Incorrect."

2. Explain the mistake and why it is wrong.

3. Rewrite the model’s reasoning with a corrected solution.

4. Select either Reasoning Error or Calculation Error.

Criteria for Incorrect Labeling:

• Incorrect: A clear mathematical error is present.

• Not Incorrect: Issues like poor LaTeX formatting, valid but inefficient solutions, or suboptimal
expressions that are still mathematically correct should not be labeled as incorrect.

When asked to rate if each step is done correctly or incorrectly, you will see 4 possible step labels:
Incorrect - LLM is appropriate, Correct - LLM is appropriate, Incorrect - Python is appropriate, Correct -
Python is appropriate. We will review more in detail below the two images!

The below flow charts show how to think through which of the 4 step labels to choose:

Each step label is designed to address two questions:

1. Is the math in the step correct?

1. This is straightforward: verify if the mathematical operations performed are correct and the
model’s output is accurate.

2. Should this step ideally be solved using the model’s LLM or Python capabilities?

1. This question relates to determining whether a step should be solved using the model's large
language model (LLM) or its Python capabilities (implicit code execution, or ICE).

2. LLMs are better suited for tasks involving reasoning, logic, or proofs. Python is ideal for more
complex calculations or numerical tasks where precision is crucial.

When to Use LLM or Python

• Select LLM is ideal to solve the problem when the step is similar to:

Essentially when the math problem requires abstract concept, word, or pattern recognition

• Word problems that require an understanding of natural language

• Pattern recognition, like simple sequences

• Deducing or applying proofs, theorems, propositions, and corollaries where the model relies on
step-by-step reasoning, intuition, and context

• Abstract math, such as measure theory (e.g. prove a certain Lebesgue set is measurable) or
number theory (e.g., determining if an extension of a field has finitely many intermediate fields)

• Theoretical probability, (e.g. What is P(A)P(B) for independent A and B)

• Select Python is ideal to solve the problem when the step is similar to:

Essentially when the math problem involves a very precise input / output

• Simple variable-based calculus (e.g. derive x^2 + 2x + 7)

• Applying number tests that don’t rely on large computation, e.g. calculating a limit or applying a
ratio test or root test

• Manipulating or isolating quantifiers or variables, where a solution can be calculated (e.g.

isolate y in 2x + 3 = 7y; or substitute y = 3z+3 when x = 10+7y)

• Basic arithmetic with small numbers (under 100, e.g. 25 + 37)

• Algebra or solving for variables (e.g. solve for x in 3x+7=3)

• Precise numerical computation or complex arithmetic, where there are large numbers or multi-
step calculations that might introduce errors when using an LLM (e.g., what is
938193 x 93189318)

• Trigonometry and evaluation of calculus concepts (e.g. find the quadratic roots of x^2 - 7x + 4;
or solve cos(x)^2 + sin(x^2) = 0 at x = 0)

• Matrix operations or calculations involving vectors (e.g. find a vector orthogonal to v1 = [3,0,2];
or multiply M_1 and M_2 where each is a matrix; or find the determinant of M_3)
• Applied probability and statistics, such as calculating standard deviations or regressions, (e.g.
what is the standard deviation of this dataset; or binomial distribution, such as what is the
probability of getting exactly 8 heads in 15 coin flips, assuming the coin is fair?)

• Distributions or calculations that require calculating areas under a curve, evaluating an

integral, or volume (e.g. what is the probability that a normally distributed variable with mean
100 and standard deviation 15 falls between 90 and 120?; or, “what is the volume of this solid
evolution (3x+1)^0.5 at these boundaries …”)

• Numerical approximations or evaluating series (e.g. a Monte Carlo approximation, a binomial

distribution, applying Simpson’s rule or a Striling approximation)

Step 2a: Write a rationale for incorrect steps

In this part of the task, you will provide a rationale for any steps that were rated as Incorrect

Specifications List

1. Rationales should clearly explain why the step is incorrect

2. Rationales should use proper spelling and grammar

3. Rationales do not require LaTeX. If you do use LaTeX, the LaTeX should follow the style guide.

4. Rationales should not contain “model”, “AI”, “LLM”, etc.

Step 3: Rewrite Incorrect Steps

Rewriting Guidelines

• Maintain Sequential Integrity: Ensure that your rewrite preserves the logical order of the original
reasoning and includes the same level of detail.

• Clarity in Rewriting: Rewrite the step clearly and concisely, using simple language that avoids
jargon. The rewrite should be self-contained and understandable, following a logical sequence of
reasoning.

• Focus on Problem-Solving: Only include information necessary to solve the problem. Do not add
extraneous details like definitions of basic concepts.

Types of Errors

1. Calculation Errors: Simple arithmetic or operational mistakes (e.g., incorrect multiplication).

2. Reasoning Errors: Mistakes in understanding the process or logic behind a solution.

For this project, each response must include at least one reasoning error. If the only error is a calculation
mistake, modify the prompt to create a reasoning error.

Examples of Errors
Example 1: Rectangle Area (Reasoning Error)

• Problem: A rectangle’s length is doubled, and its width is increased by 3 meters. What is the new
area?

• Incorrect Answer: Adds the dimensions instead of multiplying.

• Error: Misunderstanding the formula for area.

Example 2: Car Rental (Calculation Error)

• Problem: How much will it cost to rent a car for 3 days and drive 150 miles?

• Incorrect Answer: Correct understanding, but miscalculates the total cost.

• Error: Simple arithmetic mistake.

Step 4: Regenerate

Repeat the process until Final Answer is reached

• Regenerate the response after each correction, restarting the model's train of thought

• Repeat Steps 2-4 until the model arrives at the correct answer

Step 5: Summarize

In this final step, you should summarize the task using the checkboxes and short responses, according to
the following dimensions:

• Accuracy

• Does the original model response (BEFORE REWRITES) make an error?

• Instruction Following

• Does the response AFTER REWRITES follow all prompt instructions?

• Skills Check

• Does the response AFTER REWRITES utilize both of the skills selected in step 1?

• First Incorrect Step

• What was the first incorrect step with a reasoning error made by the model?

• Skills Gap

• What Skill(s) does the model fail at or not include in the response?
APPENDIX

LaTex Guidelines

Please refer to this guide for the LaTeX formatting guidelines.

All prompts and rewrites should be written in proper Single $ LaTeX.

If you are unfamiliar with LaTeX:

• Refer to the style guides that are linked in the task.

• Feel free to use an LLM to help you. Make sure to:

• Ask it to write the expression in Single $ Latex

• Only ask for help writing the LaTeX, as opposed to asking for help solving the problem.

• Remember that the impetus for this project is that LLMs are often very wrong at math. We don't
want an incorrect solution from an LLM biasing your own solution!

To help catch mistakes and ensure that your responses are accurate in mathematical calculations, we
highly recommend using a dedicated math-solving or calculation verification tool.

• WolframAlpha

• Great for general math problems, calculus, and symbolic computation.

• Desmos

• Perfect for graphing and exploring functions interactively.

• Symbolab

• Useful for step-by-step solutions in calculus, algebra, and more.

• GeoGebra

• A powerful tool for dynamic geometry, algebra, calculus, and statistics.

By using one of these tools, you can reduce errors and enhance the precision of mathematical answers.

You only need to install one of the following extensions:

Quillbot:

• Google Chrome

• Microsoft Edge

Grammarly:

• Google Chrome

LanguageTool:
• Safari

• Firefox

• Google Chrome

Python vs. LLM Guidance (labeling calculation and reasoning errors)

When to Use LLM

When to Use LLM What It Involves Example

Abstract concepts, word, Recognizing complex Identifying trends in a sequence like

or pattern recognition ideas, abstract patterns, or Fibonacci or interpreting a complex word
word problems problem.

Word problems Understanding and "If a car travels 100 miles in 2 hours, how
interpreting natural fast is it traveling?" The model must
language scenarios understand the wording and context to
form a correct solution.
When to Use LLM What It Involves Example

Pattern recognition Recognizing or predicting Identifying the next number in a

logical patterns or sequence, like 2, 4, 8, 16 (powers of 2).
sequences

Proofs, theorems, Logical reasoning required Proving that the square root of 2 is
propositions, and to prove theorems or irrational through logical steps rather
corollaries deductions than computational output.

Abstract math Handling complex, Proving that a set is measurable using

theoretical math not Lebesgue measure or determining
reducible to steps properties in advanced number theory.

Theoretical probability Explaining theoretical Explaining the theoretical probability of

reasoning in probability two independent events, such as "What
scenarios is P(A and B) if P(A) = 0.3 and P(B) = 0.5?"

When to Use Python

When to Use Python What It Involves Example

Basic arithmetic Performing basic Adding, subtracting, multiplying, or

arithmetic operations dividing numbers under 100, such as 25 +
3725 + 37.

Algebra or solving for Solving algebraic Solving for x in an equation like x^2 + 4x +
variables equations 1=0
computationally
Isolating y in the equation xy = y+1

Function evaluation Define and evaluate a Compute f(x) = x^2*sin(x) for a range of
given function values.

Compute subsequent values in a sequence

such as Fibonacci.
When to Use Python What It Involves Example

Manipulating or isolating Manipulating equations Solving for y in an equation like

variables computationally 2x+3=7yx+3, or substituting values into an
equation like y=3z+3w when z=10+7y, w =
10 + 7yx.

Trigonometry Solving trigonometric Computing arctan(1.5)

problem computationally
Using Law of Cosines to find the length of
a side in a triangle

Computational calculus Performing calculus Differentiating functions like x^3+7x^2 +

operations like derivation, 2x.
integration, or limits
•

Calculating area under a probability

density function to determine
probabilities

Computing volumes using integrals

Finding Taylor series

Applying number tests Running computational Calculating the limit of a function

tests to check limits or
Series tests for convergence.
convergence

Numerical computation Approximations and other Solve e^x = x^4 using Newton’s Method.
intensive numerical
Numerical integration using Simpson’s
computations
Method

Monte Carlo simulations

Matrix operations or Solving matrix or vector- Determining if a set of vectors is linearly

vector calculations related problems independent

Calculating determinants

Solving a system of equations

When to Use Python What It Involves Example

Applied probability and Performing statistical or Determining the standard deviation of a

statistics probability calculations dataset

Calculating the probability of exactly 8

heads in 15 coin flips

Hypothesis testing

Reasoning Error Identification Guidelines

Error categories Description

Common The model cites some common sense that is

Sense Incorrect Common Sense incorrect.

Model skips certain steps important to the

logical flow of the solution. In other words the
Gaps in logical reasoning model response is missing a crucial step that
steps results in a nontrivial logical leap.

The model's conclusion is simply a

restatement of the premise, offering no new
Circular Reasoning insight or logical progression.

Logical Providing conflicting or contradictory

Reasoning Inconsistent Reasoning information within the same response

Analytical Mistake in decomposing the problem into

Thinking Incorrect decomposition smaller pieces

Deductive Unsupported Drawing broad conclusions from insufficient

Reasoning generalizations reasoning or insufficient observations.

Model making an assumption or a general

Inductive rule that is incorrect, inappropriate or
Reasoning Incorrect Assumption unnecessary.

The model makes comparisons to the wrong

Incorrect Comparative targets that do not help with the core
Reasoning Analysis Faulty Comparison arguments.
The model incorrectly assumes that one event
caused another simply because they occurred
together. Correlation doesn't always equal
False Causality causation.

Causal The model concludes an effect from

Reasoning Weak Causality insufficient or inconclusive or weak causes

Pattern The model finds an incorrect pattern or

Recognition Incorrect Pattern regularity from some given observations

The model uses the statistic tools correctly,

Incorrect Conclusion from but draws the wrong and unsupported
Statistics conclusion

Statistical The model cites the wrong statistic tool for

Reasoning Wrong Statistics the current problem

Temporal Faulty temporal The model makes wrong reasoning based on a

Reasoning reasoning misunderstanding of time

Abstract Proposing a level of abstraction for solving the

Thinking Incorrect Abstraction problem that is not appropriate or accurate.

Correct name of the formula, theorem,

Misstated Formula lemma etc but wrong form

Incorrect usage of the Correct name and form of the formula,

formula theorem, lemma etc but wrong place to use it

Incorrect Wrong corresponding Correct form of the formula, theorem, lemma

Formula formula etc but wrong name

Basic calculation mistakes. Substitution of

Arithmetic Errors value errors and term simplification errors.

Order of Operations Incorrect application of PEMDAS/BODMAS,

errors leading to incorrect results.

Incorrect Rounding Inaccurate rounding

Incorrect Incorrect
Calculation Calculation Incorrect Unit Errors Mishandling units of measurement.
Missing sign/the sign of the expression is
Incorrect Use of Signs flipped.

Rubrics

• Prompt Grading Rubric

• This is how your prompts will be graded by Reviewers:

1-2 (Fail) 3 (Okay) 4-5 (Good/ Perfect)

Skills are mentioned

Skills are included but Skills are deeply integrated,
superficially and do not
not fully leveraged, and the response requires
play a meaningful role in
Skill Integration leading to only moderate each skill to be applied in a
the response, leading to
or surface-level meaningful and non-trivial
incomplete or trivial
application. way.
answers.

The prompt is too simple, The prompt is sufficiently

The prompt has some
allowing the model to complex, requiring
Prompt complexity but may still
provide a correct or trivial significant reasoning and
Complexity be straightforward or
response without analysis, preventing a
easy to answer.
significant reasoning. simple or trivial solution.

The prompt is unclear, The prompt is clear,

The prompt is mostly
overly vague, or missing specific, and leaves little
Prompt Clarity & clear but may allow for
essential details, making it room for misinterpretation,
Specificity multiple reasonable
difficult to follow or answer requiring no more than one
interpretations.
accurately. minor assumption.

The prompt is impractical The prompt is verging on The prompt is fully

or impossible to answer impractical, but can still actionable within the
Feasibility within the model's be answered with model's capabilities, with
capabilities, or contains concessions or partial no conflicting or
contradictory instructions. fulfillment. contradictory instructions.

• Response Grading Rubric

• This is how your response process will be graded by Reviewers:

1-2 (Fail) 3 (Okay) 4-5 (Good/ Perfect)

[The correction introduces The correction addresses The correction is accurate
further errors, is vague or some mistakes but may and fully resolves errors,
irrelevant, and fails to leave subtler issues making the response
improve the response’s unaddressed. While coherent, logical, and
Correction Quality
coherence or accuracy. The clearer, the response aligned with the prompt.
revised response performs remains incomplete or The response improves
worse across evaluation imprecise and performs overall in quality and
criteria. the same as the original. performance.

Contains one or more major Contains no major factual

Contains up to two
factual errors, or multiple errors, with only one
Accuracy minor factual errors or
minor errors/misleading minor error or misleading
misleading statements.
points. statement.

Follows most
Instruction All explicit instructions are
Misses one or more explicit instructions but may
Following / followed, and the
instructions or does not fully subjectively miss some
Response response fully addresses
address the prompt. aspects of fully
Fulfillment the prompt.
answering the question.

Includes unnecessary
Unnecessary No unnecessary greetings
greetings like "I'd love to
Greetings / N/A or pleasantries at the
help you" or “Anything else I
Pleasantries beginning or end.
can assist with?”

Provides useful
information but may Well-balanced, insightful,
Overly simplistic, lacking
Depth / Nuance need more detail, or with appropriate depth
meaningful detail or depth.
includes excessive, and nuance.
distracting detail.

Helpful Links

• Calculator.Net - Includes a number of useful calculators including quadratic formula, LCM, GCF,
prime factorization, permutations, combinations, triangles, volume, hex, and much more.

• GeoGebra - Desmos but with more functionality for geometry. Would recommend testing
outside of tasking hours to get familiarity with the tool as there is a learning curve

Aristotle Reasoning (All Domains) Guidelines
No ratings yet
Aristotle Reasoning (All Domains) Guidelines
22 pages
Aristoxenus Guidelines (English)
No ratings yet
Aristoxenus Guidelines (English)
22 pages
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
No ratings yet
Beyond Statistical Learning: Exact Learning Is Essential For General Intelligence
24 pages
Moon Instructions
No ratings yet
Moon Instructions
24 pages
This Course Will Be A Quick Overview of The Most Important Elements To Remember When Writing Prompts For Mighty Moo
No ratings yet
This Course Will Be A Quick Overview of The Most Important Elements To Remember When Writing Prompts For Mighty Moo
10 pages
Laurelin Sun Instruction
No ratings yet
Laurelin Sun Instruction
8 pages
(Turing) Guidelines For Python Puzzles
No ratings yet
(Turing) Guidelines For Python Puzzles
8 pages
LLM Reasoning 1734956818
No ratings yet
LLM Reasoning 1734956818
87 pages
An Empirical Study On Challenging Math Problem Solving With GPT-4
No ratings yet
An Empirical Study On Challenging Math Problem Solving With GPT-4
22 pages
Mathchat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
No ratings yet
Mathchat: Benchmarking Mathematical Reasoning and Instruction Following in Multi-Turn Interactions
24 pages
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
No ratings yet
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
34 pages
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
No ratings yet
T Ra: A T - I R A M P S: O OOL Ntegrated Easoning Gent FOR Athematical Roblem Olving
22 pages
The Revolution of Formal Mathematics in The Era of Large Language Models
No ratings yet
The Revolution of Formal Mathematics in The Era of Large Language Models
56 pages
Training Verifiers To Solve Math Word Problems
No ratings yet
Training Verifiers To Solve Math Word Problems
22 pages
Physics of Language Models Part 2.1 Grade-School Math and The Hidden Reasoning Process
No ratings yet
Physics of Language Models Part 2.1 Grade-School Math and The Hidden Reasoning Process
33 pages
(Turing) Guidelines For Technical Writing Assessment (April 2024)
100% (1)
(Turing) Guidelines For Technical Writing Assessment (April 2024)
8 pages
Outlier Multi Turn Instructioin
No ratings yet
Outlier Multi Turn Instructioin
2 pages
RStar-Math Small LLMs Can Master Math Reasoning W
No ratings yet
RStar-Math Small LLMs Can Master Math Reasoning W
44 pages
A Survey of Deep Learning For Mathematical Reasoning
No ratings yet
A Survey of Deep Learning For Mathematical Reasoning
24 pages
Reasoning in Large Language Models Through Symbolic Math Word Problems
No ratings yet
Reasoning in Large Language Models Through Symbolic Math Word Problems
13 pages
What Makes Math Word Problems Challenging For LLMS?
No ratings yet
What Makes Math Word Problems Challenging For LLMS?
11 pages
Rstar Math
No ratings yet
Rstar Math
26 pages
Improving Large Language Model
No ratings yet
Improving Large Language Model
14 pages
MathScale Scaling Instruction Tuning For Mathematical Reasoning
No ratings yet
MathScale Scaling Instruction Tuning For Mathematical Reasoning
15 pages
Large Language Models For Mathematical Reasoning - Progresses and Challenges
No ratings yet
Large Language Models For Mathematical Reasoning - Progresses and Challenges
14 pages
How Well Do LLM Perform Iin Arithmetic Tasks
No ratings yet
How Well Do LLM Perform Iin Arithmetic Tasks
10 pages
Math Odyssey Benchmarks
No ratings yet
Math Odyssey Benchmarks
14 pages
Characterizing Mathematical Modelling Tasks in Empirical Literature
No ratings yet
Characterizing Mathematical Modelling Tasks in Empirical Literature
26 pages
Specs
No ratings yet
Specs
13 pages
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
No ratings yet
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
7 pages
Village Garden
No ratings yet
Village Garden
15 pages
Maths Resoaning Prompts
No ratings yet
Maths Resoaning Prompts
17 pages
Day 10 PromptEngineering Day2 Sp25
No ratings yet
Day 10 PromptEngineering Day2 Sp25
52 pages
Don't Trust Verify
No ratings yet
Don't Trust Verify
20 pages
Material Feedback
No ratings yet
Material Feedback
5 pages
08 Math
No ratings yet
08 Math
22 pages
Mathematics 2024
No ratings yet
Mathematics 2024
1 page
Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach
No ratings yet
Enhancing Reasoning Capabilities of Large Language Models: A Graph-Based Verification Approach
9 pages
Symmetry 15 00916 v2
No ratings yet
Symmetry 15 00916 v2
13 pages
Math Essential Skills Work Samples Handout
No ratings yet
Math Essential Skills Work Samples Handout
33 pages
RStar-Math Small LLMs Can Master Math Reasoning W
No ratings yet
RStar-Math Small LLMs Can Master Math Reasoning W
19 pages
Project Statement BITS F464
No ratings yet
Project Statement BITS F464
3 pages
Guide 004
No ratings yet
Guide 004
8 pages
2024 ISHS General Maths IA1 PSMT Budget Car - Final-2
No ratings yet
2024 ISHS General Maths IA1 PSMT Budget Car - Final-2
7 pages
Lauren Docs Chat GPT Intructions
No ratings yet
Lauren Docs Chat GPT Intructions
10 pages
Mail Valley Assessment BMS Updated
No ratings yet
Mail Valley Assessment BMS Updated
5 pages
LLM+技巧总结+ +Prompt+Engineering指南
No ratings yet
LLM+技巧总结+ +Prompt+Engineering指南
25 pages
Wizard Bms
No ratings yet
Wizard Bms
2 pages
Matcha Stem English
No ratings yet
Matcha Stem English
5 pages
Python Cheat Sheet
No ratings yet
Python Cheat Sheet
26 pages
Appendix Plagiarism
No ratings yet
Appendix Plagiarism
2 pages
Cover Trexquant
No ratings yet
Cover Trexquant
1 page
R Max Powered Running Manual
100% (2)
R Max Powered Running Manual
40 pages
Mail Valley v2 - Do's and Don'ts
No ratings yet
Mail Valley v2 - Do's and Don'ts
3 pages
AI - 03 (Problems, State Space)
No ratings yet
AI - 03 (Problems, State Space)
44 pages
(Updated) Green Wizards Attempter Specifications
No ratings yet
(Updated) Green Wizards Attempter Specifications
9 pages
FIn Problems
No ratings yet
FIn Problems
8 pages
IES VE Parametric Tool Guide
No ratings yet
IES VE Parametric Tool Guide
7 pages
Green Wizard
100% (1)
Green Wizard
18 pages
Automatic Isoperibol Calorimeter: Operating Instruction Manual
No ratings yet
Automatic Isoperibol Calorimeter: Operating Instruction Manual
110 pages
NCP3 Skin Integrity
67% (3)
NCP3 Skin Integrity
3 pages
Root Cause Analysis Enhancing Safety in Chemical Processing Environments
100% (1)
Root Cause Analysis Enhancing Safety in Chemical Processing Environments
91 pages
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
No ratings yet
What Is A Worldview? Published in Dutch As: "Wat Is Een Wereldbeeld?"
14 pages
CSET106 DMS Course File
No ratings yet
CSET106 DMS Course File
4 pages
Prime MX FIRA 6250 2018
No ratings yet
Prime MX FIRA 6250 2018
4 pages
BWIA Race
No ratings yet
BWIA Race
12 pages
Detention Volume Estimating Workbook (PDF) - 201404301105510967
No ratings yet
Detention Volume Estimating Workbook (PDF) - 201404301105510967
300 pages
PM - L5 - SP2 - Learner WorkBook
No ratings yet
PM - L5 - SP2 - Learner WorkBook
42 pages
Mediated Memories in The Digital Age 1st Edition Jose Van Dijck Instant Download
No ratings yet
Mediated Memories in The Digital Age 1st Edition Jose Van Dijck Instant Download
56 pages
Final Monsoon Report 2015 Punjab
No ratings yet
Final Monsoon Report 2015 Punjab
31 pages
Exetastai-The Discourses of Identity in Hellenistic Erythrai
100% (1)
Exetastai-The Discourses of Identity in Hellenistic Erythrai
34 pages
Loctite PC 9462 en GL
No ratings yet
Loctite PC 9462 en GL
7 pages
RRL
No ratings yet
RRL
20 pages
Punjab PET Syllabus
No ratings yet
Punjab PET Syllabus
4 pages
Nonlinear Dynamics and Machine Learning For Roboti
No ratings yet
Nonlinear Dynamics and Machine Learning For Roboti
23 pages
Unit2.5 Compoundsand Solutions
No ratings yet
Unit2.5 Compoundsand Solutions
17 pages
Overview Schedule of Weighted Assessment 2025
No ratings yet
Overview Schedule of Weighted Assessment 2025
2 pages
Safari 8
No ratings yet
Safari 8
8 pages
PD 11 - 12 Q2 1201 My Goals PS
No ratings yet
PD 11 - 12 Q2 1201 My Goals PS
13 pages
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
No ratings yet
Newton's Laws of Motion at Work Science Presentation in Beige Charcoal Hand Drawn Style
18 pages
Transcripts
No ratings yet
Transcripts
3 pages
Rokka Archive Translation - Part 2
No ratings yet
Rokka Archive Translation - Part 2
63 pages
Chapter 4 and Appendxeix
No ratings yet
Chapter 4 and Appendxeix
11 pages
Exp - S5 - Vapour Liquid Equilibrium - Corrected
No ratings yet
Exp - S5 - Vapour Liquid Equilibrium - Corrected
6 pages
Memento CRT
No ratings yet
Memento CRT
4 pages
Houston Stuart 2001 PDF
No ratings yet
Houston Stuart 2001 PDF
33 pages
Ged Basics in Mathematics
From Everand
Ged Basics in Mathematics
Henry Varela
5/5 (1)
Graph Theory
From Everand
Graph Theory
Ronald Gould
No ratings yet
Fundamental Math
From Everand
Fundamental Math
Russell Pead
No ratings yet
Mathematical Thinking for Primary
From Everand
Mathematical Thinking for Primary
Pervaiz Salik
No ratings yet