Moon Instructions
Moon Instructions
Annotator Specifications
Table of Contents
Project Overview
Task Attempt Workflow
Task Specifications
Step 1: Prompt Set Up
1a: Evaluate “Synthetic Signature” of the original prompt
1b: Evaluate “Answerability” of the original prompt
1c: Prompt Fixing/Submission
Step 2: Test the prompt in the model
Step 3: Solve the Prompt and Note Your Thought Process
Step 4: Provide the Ground Truth Final Answer (GTFA)
Step 5: Write a Hint to Guide the Model
Good & Bad Hint Examples
Appendix
Grading Rubric
Common Errors Examples
Error Categories Table
Useful Tools
Grammar and Spelling Checkers:
Project Overview
Welcome to Laurelin Moon! This project focuses on improving AI reasoning and response
quality by teaching it different ways to solve one same problem.
Your task involves the most crucial part of a series of projects that will produce the entire data
required to do this.
It starts by evaluating and fixing the given prompt, testing it with a model, and ensuring the
model’s initial response is incorrect. After this you will be required to solve the problem yourself,
annotating your thought process as you solve it (do not waste too much time in this step tho,
don’t let it slow you down too much), reaching the correct final answer, and finally write a hint to
guide the model so that it tries again.
The hint must now guide the model to reach the correct final answer
In a nutshell:
Evaluate and fix the given prompt to ensure clarity, answerability, and correctness.
Test the prompt with the model.
Identify reasoning or calculation errors in the model’s response.
Provide a corrected final answer and guide the model with a well-crafted hint.
Helpful Links
GeoGebra - Desmos but with more functionality for geometry. Would recommend
testing outside of tasking hours to get familiarity with the tool as there is a
learning curve
Key Notes
If you are uncomfortable with the given prompt’s subject, please skip the task.
Reach out on the Discourse channel if you have any questions or need support.
Task Specifications
Example 1
Example 2
Example 3
Example 4 (Big number based/Arithmetic based) - These are also bad prompts!
What is -25403251 (base 6) in base 8?
If the original prompt works and meets the criteria, you may keep as it is.
Example 1: The prompt was re-written in a way where it is only requesting for one
thing, while keeping the overall complexity of the original prompt. Because of the
way the prompt was re-written, Mercury’s orbital radius still needs to be
calculated, so we did not lose “implicit” steps.
Example 2: The prompt was re-written in a way where a single concrete answer is
requested. It also adds clarity and removes ambiguity regarding the location of
the fountain, as it was not clear if the fountain must be inside the park in its
entirety. By stating that it is the center of the fountain that is placed in the corner
of the park, this is more clear. Asking for the absolute difference between both
scenarios involves getting the prices from step d) and e) in the original prompt,
as well as ensuring we always get a positive number. It adds a bit of complexity
by adding the cost of “grass”.
Example 3: The rewritten prompt fixes all issues by restricting the set of (a,b) pairs
to a finite range, ensuring you can actually count how many ellipses fit a certain
property. It also replaces the unclear “a divides b” rule with a clearer gcd-based
relation and asks for just one final number (the size of a specific equivalence
class), so there’s no ambiguity or infinite search.
Examples of hints
Example 1
Example 2
While both hints are valid, and may guide the model to the correct final answer, the diversity
added by example 1 provides more value and therefore will likely receive a higher quality score.
Remember: There is more than one way to render Latex in the task (using $...$ or \( … \)) Across
the task PROMPT, GTFA AND HINT MUST HAVE THE SAME LATEX FORMAT
(Example: if the prompt uses $ .. $, you need to use that same symbol across every step of the
task.)
Find all integers such that Remember the modulo operation Find if there are any trivial solutions.
Consider the equation (mod 7). What
possible values (mod 7) can , have. A
technique such as reductio ad infinitum may
be needed.
Appendix
Grading Rubric
This section outlines how your work will be graded. Our internal reviewers use this rubric to
evaluate your tasks and assign a quality score on a scale of 1-5.
Note: You do not need to use this rubric while completing your tasks. It is provided purely for
your reference to understand how quality is assessed.
Important Reminder:
Some egregious errors—such as completely ignoring instructions, providing irrelevant
responses, or submitting blatantly low-effort work—will result in an automatic 1 or 2 rating. Be
mindful and follow the guidelines carefully to avoid these pitfalls.
Context:
As the project expands, many new contributors will join. While this growth is exciting, it also
comes with challenges. Unfortunately, some workers may join the platform with low effort, bad
intentions, or a lack of the required skills. These contributors often fail to read instructions
thoroughly or to meet the project’s quality standards.
To maintain the integrity and efficiency of the project:
New contributors will have a limited number of tasks to complete before being
reviewed and approved. Only contributors who demonstrate the required quality
will be allowed to continue tasking.
Your first few tasks are critical for determining your success and continuation in
the project. Treat them as your opportunity to make a strong impression.
As the project matures, we’ll transition to a super attempt pipeline:
Only the highest-quality contributors will remain in the project.
The review process will shift to focus on sampling tasks from active
contributors, reducing the layer of full reviews.
To help you succeed, we’ll host multiple office hours and training sessions to guide you and
ensure high-quality work is consistently produced. Take advantage of these opportunities to
refine your skills and excel in the project!
Criteria 1-2 (Fail) 3 (Okay) 4-5 (Good/Perfect)
Initial Response The model did not The final prompt was able to stump the
produce the wrong model in turn 1.
final answer in turn
1.
Final Response The model was not The model was able to reach the correct
able to reach the final answer to the prompt + the hint in turn
correct final answer 2.
with the given hint
Hint The hint is vague, The given hint is
The hint provides no very simple, or with detailed, clear, and
value. The model little guidance. serves as a set of
reached the correct The hint mentions instructions to solve
final answer purely previous errors. the prompt.
by luck, and the hint The hint is worded
contributed nothing weirdly.
to the solution.
Useful Tools
Wolfram Alpha | You may use Wolfram Alpha to solve math problems by entering
equations or questions directly into its search bar. It will provide solutions,
graphs, and step-by-step explanations.
GeoGebra | Similar to Desmos but with more functionality for geometry. You should
test this tool outside of task hours to gain familiarity, as there is a learning curve.
Google Chrome
Microsoft Edge
Grammarly:
Google Chrome
LanguageTool:
Safari
Firefox
Google Chrome