Instructions 22
Instructions 22
🏓 Table of Contents
1️⃣ Section 1: Welcome to Plowman RLHF!
2️⃣ Section 2: Ground Rules!
3️⃣ Section 3: Writing Prompts!
4️⃣ Section 4: Proofreading and Fact-Checking
5️⃣ Section 5: Rating Responses
6️⃣ Section 6: Fixing + Improving Responses
In this project, your work will help improve some of the world’s leading AI models!
Here, we’ll go through the steps of your tasks and explain what you’ll be doing in
each one.
TASK OVERVIEW
1️⃣ 2️⃣ 3️⃣ 4️⃣
Write Proofread the AI Rate each model’s (If applicable)
a coding prompt in responses and performance and Improve/rewrite one
your native check their write justifications of the responses
language solutions
Code-Checking:
Proofreading:
Fix code errors: correct the incorrect step(s) clearly and concisely.
The rewrite should be self-contained and understandable, following a
logical sequence of reasoning.
Correct Language Errors: Fix any grammar mistakes, spelling
errors, or any fluency issues.
Follow Instructions Precisely: Ensure that the AI’s response
matches all instructions perfectly.
All Claims are Accurate: Make sure everything the model says is
true and accurate.
Following these steps carefully will make a real difference in improving AI quality.
Your attention to detail, accuracy, and fluency help create a model that is more
helpful for everyone!
This instructions document will explain every step above in detail to make sure your
tasks are high quality!
To stay on this project, please follow these important rules. Not following them may
lead to removal:
By following these ground rules, you’ll ensure high-quality work and avoid any
quality flags. Thank you for helping improve the AI model!
What it is: The background or main topic that tells the model what
the response should answer.
The prompt must be clear about what needs to be solved. You must
ask the model something that another human would also be able to
understand.
Solvable:
Important Notes:
Refactoring vs. Bug Fixing: Refactoring works with functional but suboptimal
code, while Bug Fixing focuses on broken code.
Tests Reasoning vs. Solution Reasoning: Tests Reasoning explains test case
outputs, while Solution Reasoning evaluates how a solution works.
Dimension Ratings
Side-by-side Rating
Any task that doesn’t perform correct ratings risks a low quality score, so please do not
rush through this section!
🔎 Dimension Ratings
Once you’ve identified that one of the responses has errors, you will need
to rate both responses on a scale of 1-3 for each criterion, with 3 being the
highest score. Here are the categories to focus on:
Instruction Following
Localization
Truthfulness
Verbosity
Stick to the Evidence: Focus on the main differences between the two
responses. No need to mention criteria that don’t have issues.
Focus on Key Criteria: Only discuss the dimensions that affected your
choice (e.g., if Truthfulness was a big difference, mention that).
Be Concise: Avoid flowery language or extra details that aren’t needed.
Depth and Completeness Matter: It’s better to focus on the quality and
accuracy of information over writing style or formatting.
Don’t Use LLMs: Write your justification independently.
Step 1: Truthfulness 🎯
What to Look For: Check that all the facts presented are correct.