0% found this document useful (0 votes)
60 views24 pages

Moon Instructions

The document outlines the specifications for annotators involved in the Laurelin Moon project, which aims to enhance AI reasoning and response quality. It details a five-step workflow for task attempts, including prompt evaluation, testing, solving, and providing hints to guide the model. Additionally, it includes guidelines for prompt clarity, answerability, and a grading rubric for assessing task quality.

Uploaded by

johnwere742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views24 pages

Moon Instructions

The document outlines the specifications for annotators involved in the Laurelin Moon project, which aims to enhance AI reasoning and response quality. It details a five-step workflow for task attempts, including prompt evaluation, testing, solving, and providing hints to guide the model. Additionally, it includes guidelines for prompt clarity, answerability, and a grading rubric for assessing task quality.

Uploaded by

johnwere742
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 24

Laurelin Moon Reasoning

Annotator Specifications
Table of Contents
Project Overview
Task Attempt Workflow
Task Specifications
Step 1: Prompt Set Up
1a: Evaluate “Synthetic Signature” of the original prompt
1b: Evaluate “Answerability” of the original prompt
1c: Prompt Fixing/Submission
Step 2: Test the prompt in the model
Step 3: Solve the Prompt and Note Your Thought Process
Step 4: Provide the Ground Truth Final Answer (GTFA)
Step 5: Write a Hint to Guide the Model
Good & Bad Hint Examples
Appendix
Grading Rubric
Common Errors Examples
Error Categories Table
Useful Tools
Grammar and Spelling Checkers:

Project Overview
Welcome to Laurelin Moon! This project focuses on improving AI reasoning and response
quality by teaching it different ways to solve one same problem.
Your task involves the most crucial part of a series of projects that will produce the entire data
required to do this.
It starts by evaluating and fixing the given prompt, testing it with a model, and ensuring the
model’s initial response is incorrect. After this you will be required to solve the problem yourself,
annotating your thought process as you solve it (do not waste too much time in this step tho,
don’t let it slow you down too much), reaching the correct final answer, and finally write a hint to
guide the model so that it tries again.
The hint must now guide the model to reach the correct final answer
In a nutshell:
Evaluate and fix the given prompt to ensure clarity, answerability, and correctness.
Test the prompt with the model.
Identify reasoning or calculation errors in the model’s response.
Provide a corrected final answer and guide the model with a well-crafted hint.

Task Attempt Workflow


This task is made up of the following 5 steps:
Here is a high level overview of the Task Steps:
Step 1: Evaluate and Fix the Prompt:
Ensure the prompt meets all requirements:
Clarity
Answerability
Verifiability
Single Answer
No Multi-Steps
Not Proof-Based
Not Large Number Calculations based prompts
Fixing Prompts:
Remove “synthetic signatures”
Ensure the prompt is natural, grammatically correct, and free of
contradictions or ambiguities.
DO NOT REUSE PROMPTS. Prompts must be unique per task.
Step 2: Test the Prompt
Confirm that the final prompt is able to stump the model
Re-write the prompt if it does not
If the only error is a rounding error and nothing else, it does not count.
Step 3: Solve the Prompt and Note Your Thought Process
Write down your chain of thought.
Use the text box as a notepad to document your reasoning.
Capture every thought, even incorrect approaches.
Do not spend too much time on this, as it is not the main focus of the task.
Step 4: Provide the Ground Truth Final Answer (GTFA)
Reach the correct final answer to the final prompt.
The GTFA should not include full sentences or explanations.
Allowed formats:
Numbers
Intervals
Equations
Booleans
Sets of values
Vectors or matrices
Keep the same LaTeX format used in the prompt
Step 5: Write a Hint to Guide the Model
Objective: Create a hint that helps the model reach the correct final answer.
The hint CANNOT include the final answer
The hint should work as a guide to reach the correct final answer.
Test the prompt + the hint up to three times with the model.
The model MUST reach the correct final answer.
If the only error is a rounding error and the reasoning is sound, then we can
count this as reaching the correct answer.

Helpful Links

Calculator.Net - Includes a number of useful calculators including quadratic formula,


LCM, GCF, prime factorization, permutations, combinations, triangles, volume,
hex, and much more.

GeoGebra - Desmos but with more functionality for geometry. Would recommend
testing outside of tasking hours to get familiarity with the tool as there is a
learning curve

Key Notes
If you are uncomfortable with the given prompt’s subject, please skip the task.
Reach out on the Discourse channel if you have any questions or need support.

Task Specifications

Step 1: Prompt Set Up


Prompt set up involves these steps:
Evaluate “Synthetic Signature”.
Evaluate Answerability.
Prompt Fixing/Submission.

1a: Evaluate “Synthetic Signature” of the original prompt


We need to clean the prompt so that it sounds natural, as well as removing
any odd statements common in LLM generated prompts

Phrases such as:


“Answer the following question:”
“Consider the following math problem:”
Explain what was the synthetic signature you found
Example of a prompt with synthetic signatures
1b: Evaluate “Answerability” of the original prompt
Guidelines for Answerable Prompt
Clarity: Make sure the prompt is clear, direct, understandable and unambiguous.
It should be easy for the model to understand what needs to be done without
unnecessary complexity.
Answerability: It must have enough information to be solvable. No contradictions.
We’re looking for genuine reasoning errors, so we don’t want to “trick” the
model.
Verifiability: There should be a clear, definite, correct answer.
When the customer tests their model, they need to make sure there is a
single, clear final correct answer that they need to aim for.
Single Answer: It must lead to one unique final answer.
No Multi-Steps: It should ask one direct question or request.
Not Proof-Based: Avoid prompts requiring proofs.
Not Large Number Calculations based prompts: The “challenge” of the prompt
must not be arithmetic based, but reasoning based.
When the customer tests their model, they will be assisted by calculation
tools.
This is mainly for Math prompts
Examples of bad prompts with their justification

Example 1

Example 2
Example 3
Example 4 (Big number based/Arithmetic based) - These are also bad prompts!
What is -25403251 (base 6) in base 8?

What is the product of -9.876 and 543210?

Solve 3*r^3 + 97464*r^2 + 223830000*r - 2700000000 = 0 for r.

Is 32416190071 a composite number?

1c: Prompt Fixing/Submission


After evaluating the original prompt, you must ensure that the tested prompt does
meet the requirements of the task.
Make any modifications to the prompt if necessary.

Try your best to keep the essence of the original prompt.

If the original prompt works and meets the criteria, you may keep as it is.

Examples of bad prompts getting fixed

Example 1: The prompt was re-written in a way where it is only requesting for one
thing, while keeping the overall complexity of the original prompt. Because of the
way the prompt was re-written, Mercury’s orbital radius still needs to be
calculated, so we did not lose “implicit” steps.

Example 2: The prompt was re-written in a way where a single concrete answer is
requested. It also adds clarity and removes ambiguity regarding the location of
the fountain, as it was not clear if the fountain must be inside the park in its
entirety. By stating that it is the center of the fountain that is placed in the corner
of the park, this is more clear. Asking for the absolute difference between both
scenarios involves getting the prices from step d) and e) in the original prompt,
as well as ensuring we always get a positive number. It adds a bit of complexity
by adding the cost of “grass”.
Example 3: The rewritten prompt fixes all issues by restricting the set of (a,b) pairs
to a finite range, ensuring you can actually count how many ellipses fit a certain
property. It also replaces the unclear “a divides b” rule with a clearer gcd-based
relation and asks for just one final number (the size of a specific equivalence
class), so there’s no ambiguity or infinite search.

Step 2: Test the prompt in the model


Review the initial response to see how well the model answered the provided
reasoning problem and identify all the errors.
Test the prompt up to three times with the model.
Stumping the Model: The prompt must produce a reasoning or calculation
error in at least one attempt.
Rewriting: If the model provides a correct response in all three tries, rewrite
the prompt and test again until an error is produced.
Key Consideration: The prompt should not aim to trick the model but
expose genuine weaknesses in reasoning or calculation.
Explain why you believe the model made a mistake in its response. This step is
important so that the reviewer and quality control may understand your
reasoning. Weak justifications may be interpreted as a low effort attempt.

Step 3: Solve the Prompt and Note Your Thought


Process
Write down your chain of thought.
Use the text box as a notepad to document your reasoning.
Capture every thought, even incorrect approaches.
Accuracy of these notes is not graded; focus on transparency.
Do not spend too much time on this, as it is not the main focus of the task.
Do not worry much about LaTeX.
Step 4: Provide the Ground Truth Final Answer
(GTFA)
Reach the correct final answer to the final prompt.
The GTFA should not include full sentences or explanations.
Allowed formats:
Numbers
Intervals
Equations
Booleans
Sets of values
Vectors or matrices
Keep the same LaTeX format used in the prompt
This step is KEY to ensuring a passing task.
For context, the final answer that you provide will then serve as a guide to other
contributors to write a hint of their own. If the answer you provide is wrong, the
work of the 4 following contributors will be wasted. Please ensure that you reach
the correct final answer
Step 5: Write a Hint to Guide the Model
Create a hint that helps the model reach the correct final answer.
The hint CANNOT include the final answer
You can be as creative, concise, or detailed as desired, however we do still
prefer more complete hints.
We want to teach the model different ways of thinking.
Multiple contributors will end up writing a hint for this same prompt.
Diversity in the hints is the main goal.
Very simple hints may fall under “repetitive”, “not creative enough” or
“low effort”, and prevent you from getting a high quality score (4-
5).
Test the prompt + the hint up to three times with the model.
The model will not have access to its previous response, so do not mention
the model’s errors in your hint.
No “You did not consider…”
Non-sense and unrelated hints will lead to a low quality task
An awesome hint meets the following:
Hint Writing Style
The hint should be written in natural language (including
formulas if really needed) as if you were a professor
collaboratively trying to help a student solve a problem that
they haven't seen before.
The hint should not be vague.
The hint should be clear for someone that has a sound
mathematical background.
Hint Format
The hint should be a plan/strategy on how to solve the task.
The ideal format should be a short set of instructions to the
model so that it can answer the prompt.
What to Include in a Hint
The hint should include a set of key steps to solve the
problem.
If a particular theorem is needed, the hint should mention the
theorem.
If a particular equation/identity/formula is needed, the hint
should mention the equation/identity/formula.
The model MUST reach the correct final answer with the hint.

Examples of hints

Example 1
Example 2
While both hints are valid, and may guide the model to the correct final answer, the diversity
added by example 1 provides more value and therefore will likely receive a higher quality score.
Remember: There is more than one way to render Latex in the task (using $...$ or \( … \)) Across
the task PROMPT, GTFA AND HINT MUST HAVE THE SAME LATEX FORMAT
(Example: if the prompt uses $ .. $, you need to use that same symbol across every step of the
task.)

Good & Bad Hint Examples


Bad Hint Example Better Hint Example
Too vague Outlines steps for proof
May guide models to Includes key theorems
unnecessary or ideas
calculations

Find all integers such that Remember the modulo operation Find if there are any trivial solutions.
Consider the equation (mod 7). What
possible values (mod 7) can , have. A
technique such as reductio ad infinitum may
be needed.

be the field with elements.Try


Is to compute the structure of the 18-th Consider the non-zero
every element of a sum of 18-th powers? elements of which
form a cyclic group of
Show that the 18th-powers are a proper order 24
subfield of . Check that the non-zero 18-
th powers are also the
non zero 6-th powers,
and that these are a
subgroup of of
order 4.
Including the zero element,
there are 5 18-th
powers, and they form
a subfield of order 5.
Use the fact that a field is
closed under addition.
be a square matrix that is filledGenerate
with the matrix with the numbers
Thefrom
cube of a diagonal matrix is just the
zeros, except for the coordinates where the on its diagonal. Sort the rows ofcubes
the of its diagonal elements, the trace of
row number equals the column number. matrix
In according to the alphabetical order of the sum of the cubes of
is just
those cells, the numbers from to the appear
names of the numbers. Calculate , , regardless of the order. No
in alphabetical order (depending on and howsum the elements of its diagonal. need to sort the matrix. Use the formula of
each number is written in English, such as 1- the sum of the cubes.
3-2 because of 'one' - 'three' - 'two'). Find the
.

Appendix

Grading Rubric
This section outlines how your work will be graded. Our internal reviewers use this rubric to
evaluate your tasks and assign a quality score on a scale of 1-5.
Note: You do not need to use this rubric while completing your tasks. It is provided purely for
your reference to understand how quality is assessed.
Important Reminder:
Some egregious errors—such as completely ignoring instructions, providing irrelevant
responses, or submitting blatantly low-effort work—will result in an automatic 1 or 2 rating. Be
mindful and follow the guidelines carefully to avoid these pitfalls.
Context:
As the project expands, many new contributors will join. While this growth is exciting, it also
comes with challenges. Unfortunately, some workers may join the platform with low effort, bad
intentions, or a lack of the required skills. These contributors often fail to read instructions
thoroughly or to meet the project’s quality standards.
To maintain the integrity and efficiency of the project:
New contributors will have a limited number of tasks to complete before being
reviewed and approved. Only contributors who demonstrate the required quality
will be allowed to continue tasking.
Your first few tasks are critical for determining your success and continuation in
the project. Treat them as your opportunity to make a strong impression.
As the project matures, we’ll transition to a super attempt pipeline:
Only the highest-quality contributors will remain in the project.
The review process will shift to focus on sampling tasks from active
contributors, reducing the layer of full reviews.
To help you succeed, we’ll host multiple office hours and training sessions to guide you and
ensure high-quality work is consistently produced. Take advantage of these opportunities to
refine your skills and excel in the project!
Criteria 1-2 (Fail) 3 (Okay) 4-5 (Good/Perfect)

Prompt The contributor The contributor The contributor


submitted a final submitted a final submitted a clear
prompt that is prompt that is and concise final
unanswerable, has answerable, only prompt with a clear
multiple solutions, minor ambiguities answer.
unclear, that can be solved
contradicting or with common
ambiguous. knowledge or
standard practices.

Initial Response The model did not The final prompt was able to stump the
produce the wrong model in turn 1.
final answer in turn
1.

Ground Truth The contributor did The contributor The contributor


Final Answer - not reach the correct reached the correct reached the correct
GTFA final answer to the final answer, but did final answer and
final prompt. not put the simplest wrote it down in its
form (Added extra simplest form.
words, sentences or
decorations like
“boxed”).

Final Response The model was not The model was able to reach the correct
able to reach the final answer to the prompt + the hint in turn
correct final answer 2.
with the given hint
Hint The hint is vague, The given hint is
The hint provides no very simple, or with detailed, clear, and
value. The model little guidance. serves as a set of
reached the correct The hint mentions instructions to solve
final answer purely previous errors. the prompt.
by luck, and the hint The hint is worded
contributed nothing weirdly.
to the solution.

LLM Usage Obvious use of


external LLMs for
either prompt fixing
or hint generation.
Contributor cannot
be trusted.

LaTeX The LaTeX use is so The contributor The LaTeX use is


Consistency bad that it the text is messes up the impeccable and
no longer readable LaTeX, or mixes consistent
nor understandable LaTeX styles, but it throughout the task
is still (Prompt – GTFA –
understandable. Hint)

“Notes Pad” The note pad


Thoughts thoughts are none
sense and show
serious low effort

Common Errors Examples


Coming Soon!

Error Categories Table


You may use this table to familiarize yourself with the types of errors you may encounter in the
model’s response that would lead to an incorrect final answer.

Main Reasonin Error Definition Example


Reasonin g Sub- Category
g Category
Category
Incorrect Common Incorrect The model cites The model claims "All
Reasoning Sense Common common prime numbers are odd,"
Sense knowledge that is ignoring that 2 is a prime
incorrect. number and is even.

Logical Gaps in Model skips The model jumps from


Reasoning Logical important steps in x2−5x+6=0x2−5x+6
Reasoning the logical flow of =0 to x=3x=3,
Steps the solution. without showing
factoring or solving for
x=2x=2 as well.

Circular The model's Model states, "The


Reasoning conclusion solution to the equation is
restates the correct because it
premise, offering satisfies the equation,"
no new insight or without showing the
logical steps to verify the
progression. solution.

Inconsistent Providing The model states that the


Reasoning conflicting or sum of angles in a
contradictory triangle is 180°, then
information within claims that a triangle can
the same have angles summing to
response. 200°.

Analytical Incorrect Mistake in The model incorrectly


Thinking Decompositio breaking down a decomposes
n problem into ∫(x2+3x)dx∫(x2+3x)d
smaller parts. x as
∫x2dx+∫3dx∫x2dx+∫3
dx, neglecting
∫3xdx∫3xdx.

Deductive Unsupported Drawing broad "Because 2 + 2 = 4, all


Reasoning Generalization conclusions from even numbers added to
s insufficient even numbers must
reasoning or result in prime numbers."
insufficient
observations.
Inductive Incorrect Model makes an Assuming that
Reasoning Assumption assumption or a x2≥xx2≥x for all xx,
general rule that is without accounting for
incorrect, 0≤x<10≤x<1 where
inappropriate, or this does not hold
unnecessary. true.

Comparativ Faulty The model makes The model compares the


e Analysis Comparison incorrect behavior of polynomial
comparisons. functions to logarithmic
functions and concludes
that both must have the
same rate of growth.

Causal False The model "Since x+3=10x+3=10,


Reasoning Causality incorrectly subtracting 3 from both
assumes one sides always solves any
event causes equation," failing to
another based on account for equations
coincidence. where subtraction does
not apply.

Weak The model The model claims


Causality concludes an "Because multiplying by 2
effect from increases the value of an
insufficient or integer, multiplying by
weak causes. any number greater than
1 will always increase the
value," ignoring
exceptions like 1.5.

Pattern Incorrect The model "The first three Fibonacci


Recognitio Pattern identifies an numbers are increasing,
n incorrect pattern so the entire Fibonacci
from observations. sequence is strictly
increasing," which is
incorrect.

Statistical Incorrect The model uses The model computes the


Reasoning Conclusion statistical tools mean and concludes that
from Statistics correctly but all values in the dataset
draws an are close to the mean,
unsupported ignoring possible outliers.
conclusion.
Wrong The model uses Using the mean instead
Statistics the wrong of the median for skewed
statistical tool for data such as income
the problem. distributions.

Temporal Faulty The model makes The model claims "The


Reasoning Temporal incorrect area of a circle always
Reasoning reasoning based grows at a constant rate
on time-related as the radius increases,"
logic. ignoring that the rate of
area growth depends on
r2r2.

Abstract Incorrect Proposing an Treating a system of


Thinking Abstraction abstraction level nonlinear equations as if
that is not it were a linear system
appropriate for the and attempting to solve it
problem. with matrix inversion.

Incorrect Misstated Correct formula The model writes the


Formula Formula name but stated quadratic formula as
incorrectly. −b±b2−4acbb−b±b2
−4ac instead of
−b±b2−4ac2a2a−b±
b2−4ac.

Incorrect Correct formula Using the quadratic


Usage of the and form, but formula to solve
Formula applied x+5=10x+5=10 when a
incorrectly. simple linear solution
would suffice.

Wrong Correct form of The model says it used


Correspondin the formula but Vieta’s formulas when it
g Formula wrong name. should have applied the
quadratic formula.

Incorrect Incorrect Arithmetic Basic calculation The model states that


Calculation Calculation Errors mistakes, 6×4=286×4=28 instead of
substitution of 24.
values, and term
simplification
errors.

Order of Incorrect The model incorrectly


Operations application of calculates 2+3×42+3×4
Errors PEMDAS/BODMA as
S rules. (2+3)×4=20(2+3)×4=20,
instead of
2+(3×4)=142+(3×4)=14.

Incorrect Inaccurate Rounding 4.5674.567 to


Rounding rounding of 4.64.6 instead of
values. correctly rounding to
4.574.57.

Incorrect Unit Mishandling units The model converts 10


Errors of measurement. meters to 10 centimeters
instead of 1000
centimeters.

Incorrect Use Incorrect or The model concludes


of Signs missing sign in an x=−5x=−5 when it
expression. should be x=5x=5.

Useful Tools
Wolfram Alpha | You may use Wolfram Alpha to solve math problems by entering
equations or questions directly into its search bar. It will provide solutions,
graphs, and step-by-step explanations.

Desmos | Advanced graphing calculator, beneficial for various algebraic, geometric,


or calculus-based problems.

Calculator.Net | This site includes many helpful calculators, including quadratic


formulas, LCM, GCF, prime factorization, permutations, combinations, triangles,
volume, hex, and more.

GeoGebra | Similar to Desmos but with more functionality for geometry. You should
test this tool outside of task hours to gain familiarity, as there is a learning curve.

Grammar and Spelling Checkers:


Quillbot:

Google Chrome

Microsoft Edge

Grammarly:

Google Chrome
LanguageTool:

Safari

Firefox

Google Chrome

You might also like